An Energy-Efficient FPGA-based Deconvolutional Neural Networks Accelerator for Single Image Super-Resolution

18 Dec 2018  ·  Chang Jung-Woo, Kang Keon-Woo, Kang Suk-Ju ·

Convolutional neural networks (CNNs) demonstrate excellent performance in various computer vision applications. In recent years, FPGA-based CNN accelerators have been proposed for optimizing performance and power efficiency... Most accelerators are designed for object detection and recognition algorithms that are performed on low-resolution (LR) images. However, real-time image super-resolution (SR) cannot be implemented on a typical accelerator because of the long execution cycles required to generate high-resolution (HR) images, such as those used in ultra-high-definition (UHD) systems. In this paper, we propose a novel CNN accelerator with efficient parallelization methods for SR applications. First, we propose a new methodology for optimizing the deconvolutional neural networks (DCNNs) used for increasing feature maps. Secondly, we propose a novel method to optimize CNN dataflow so that the SR algorithm can be driven at low power in display applications. Finally, we quantize and compress a DCNN-based SR algorithm into an optimal model for efficient inference using on-chip memory. We present an energy-efficient architecture for SR and validate our architecture on a mobile panel with quad-high-definition (QHD) resolution. Our experimental results show that, with the same hardware resources, the proposed DCNN accelerator achieves a throughput up to 108 times greater than that of a conventional DCNN accelerator. In addition, our SR system achieves an energy efficiency of 144.9 GOPS/W, 293.0 GOPS/W, and 500.2 GOPS/W at SR scale factors of 2, 3, and 4, respectively. Furthermore, we demonstrate that our system can restore HR images to a high quality while greatly reducing the data bit-width and the number of parameters compared to conventional SR algorithms. read more

PDF Abstract
No code implementations yet. Submit your code now

Categories


Distributed, Parallel, and Cluster Computing Hardware Architecture

Datasets


  Add Datasets introduced or used in this paper