TY - GEN
T1 - Energy-Efficient Architecture for FPGA-based Deep Convolutional Neural Networks with Binary Weights
AU - Duan, Yunzhi
AU - Li, Shuai
AU - Zhang, Ruipeng
AU - Wang, Qi
AU - Chen, Jienan
AU - Sobelman, Gerald E.
PY - 2019/1/31
Y1 - 2019/1/31
N2 - This paper presents an energy-efficient, deep parallel Convolutional Neural Network (CNN) accelerator. By adopting a recently proposed binary weight method, the CNN computations are converted into multiplication-free processing. To allow parallel accessing and storing of data, we use two RAM banks, where each bank is composed of NRAM blocks corresponding to N-parallel processing. We also design a reconfigurable CNN computing unit in a divide-and-reuse to support a variable-size convolutional filter. Compared with full-precision computing on the MNIST and CIFAR-10 classification tasks, the inference Top-1 accuracy of the binary weight CNN has dropped by 1.21% and 1.34%, respectively. The hardware implementation results show that the proposed design can achieve 2100 GOPs with a 4.6 millisecond processing latency. The deep parallel accelerator exhibits 3X energy efficiency compared to a GPU-based design.
AB - This paper presents an energy-efficient, deep parallel Convolutional Neural Network (CNN) accelerator. By adopting a recently proposed binary weight method, the CNN computations are converted into multiplication-free processing. To allow parallel accessing and storing of data, we use two RAM banks, where each bank is composed of NRAM blocks corresponding to N-parallel processing. We also design a reconfigurable CNN computing unit in a divide-and-reuse to support a variable-size convolutional filter. Compared with full-precision computing on the MNIST and CIFAR-10 classification tasks, the inference Top-1 accuracy of the binary weight CNN has dropped by 1.21% and 1.34%, respectively. The hardware implementation results show that the proposed design can achieve 2100 GOPs with a 4.6 millisecond processing latency. The deep parallel accelerator exhibits 3X energy efficiency compared to a GPU-based design.
KW - Convolutional Neural Network (CNN)
KW - deep neural network
KW - energy efficiency
KW - parallel implementation
UR - http://www.scopus.com/inward/record.url?scp=85062797447&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85062797447&partnerID=8YFLogxK
U2 - 10.1109/ICDSP.2018.8631596
DO - 10.1109/ICDSP.2018.8631596
M3 - Conference contribution
AN - SCOPUS:85062797447
T3 - International Conference on Digital Signal Processing, DSP
BT - 2018 IEEE 23rd International Conference on Digital Signal Processing, DSP 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 23rd IEEE International Conference on Digital Signal Processing, DSP 2018
Y2 - 19 November 2018 through 21 November 2018
ER -