Efficient Computation of Quantized Neural Networks by {−1, +1} Encoding Decomposition
Deep neural networks require extensive computing resources, and can not be efficiently applied to embedded devices such as mobile phones, which seriously limits their applicability. To address this problem, we propose a novel encoding scheme by using {-1,+1} to decompose quantized neural networks (QNNs) into multi-branch binary networks, which can be efficiently implemented by bitwise operations (xnor and bitcount) to achieve model compression, computational acceleration and resource saving. Our method can achieve at most ~59 speedup and ~32 memory saving over its full-precision counterparts. Therefore, users can easily achieve different encoding precisions arbitrarily according to their requirements and hardware resources. Our mechanism is very suitable for the use of FPGA and ASIC in terms of data storage and computation, which provides a feasible idea for smart chips. We validate the effectiveness of our method on both large-scale image classification (e.g., ImageNet) and object detection tasks.
PDF Abstract