HYPE-C: Evaluating Image Completion Models Through Standardized Crowdsourcing
A significant obstacle to the development of new image completion models is the lack of a standardized evaluation metric that reflects human judgement. Recent work has proposed the use of human evaluation for image synthesis models, allowing for a reliable method to evaluate the visual quality of generated images. However, there does not yet exist a standardized human evaluation protocol for image completion. In this work, we propose such a protocol. We also provide experimental results of our evaluation method applied to many of the current state-of-the-art generative image models and compare these results to various automated metrics. Our evaluation yields a number of interesting findings. Notably, GAN-based image completion models are outperformed by autoregressive approaches.
PDF Abstract