DeCoILFNet: Depth Concatenation and Inter-Layer Fusion based ConvNet Accelerator

1 Dec 2018 · Akanksha Baranwal, Ishan Bansal, Roopal Nahar, K. Madhava Krishna ·

Convolutional Neural Networks (CNNs) are rapidly gaining popularity in varied fields. Due to their increasingly deep and computationally heavy structures, it is difficult to deploy them on energy constrained mobile applications. Hardware accelerators such as FPGAs have come up as an attractive alternative. However, with the limited on-chip memory and computation resources of FPGA, meeting the high memory throughput requirement and exploiting the parallelism of CNNs is a major challenge. We propose a high-performance FPGA based architecture - Depth Concatenation and Inter-Layer Fusion based ConvNet Accelerator - DeCoILFNet which exploits the intra-layer parallelism of CNNs by flattening across depth and combines it with a highly pipelined data flow across the layers enabling inter-layer fusion. This architecture significantly reduces off-chip memory accesses and maximizes the throughput. Compared to a 3.5GHz hexa-core Intel Xeon E7 caffe-implementation, our 120MHz FPGA accelerator is 30X faster. In addition, our design reduces external memory access by 11.5X along with a speedup of more than 2X in the number of clock cycles compared to state-of-the-art FPGA accelerators.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Datasets

Add Datasets introduced or used in this paper

Edit Social Preview

DeCoILFNet: Depth Concatenation and Inter-Layer Fusion based ConvNet Accelerator

Code Edit Add Remove Mark official

Categories

Datasets Edit

Code

Add Remove Mark official

Datasets