TY - JOUR
T1 - On-Chip Sparse Learning Acceleration With CMOS and Resistive Synaptic Devices
AU - Seo, Jae Sun
AU - Lin, Binbin
AU - Kim, Minkyu
AU - Chen, Pai Yu
AU - Kadetotad, Deepak
AU - Xu, Zihan
AU - Mohanty, Abinash
AU - Vrudhula, Sarma
AU - Yu, Shimeng
AU - Ye, Jieping
AU - Cao, Yu
N1 - Publisher Copyright:
© 2002-2012 IEEE.
PY - 2015/11
Y1 - 2015/11
N2 - Many recent advances in sparse coding led its wide adoption in signal processing, pattern classification, and object recognition applications. Even with improved performance in state-of-the-art algorithms and the hardware platform of CPUs/GPUs, solving a sparse coding problem still requires expensive computations, making real-time large-scale learning a very challenging problem. In this paper, we cooptimize algorithm, architecture, circuit, and device for real-time energy-efficient on-chip hardware acceleration of sparse coding. The principle of hardware acceleration is to recognize the properties of learning algorithms, which involve many parallel operations of data fetch and matrix/vector multiplication/addition. Today's von Neumann architecture, however, is not suitable for such parallelization, due to the separation of memory and the computing unit that makes sequential operations inevitable. Such principle drives both the selection of algorithms and the design evolution from CPU to CMOS application-specific integrated circuits (ASIC) to parallel architecture with resistive crosspoint array (PARCA) that we propose. The CMOS ASIC scheme implements sparse coding with SRAM dictionaries and all-digital circuits, and PARCA employs resistive-RAM dictionaries with special read and write circuits. We show that 65 nm implementation of the CMOS ASIC and PARCA scheme accelerates sparse coding computation by 394 and 2140x, respectively, compared to software running on a eight-core CPU. Simulated power for both hardware schemes lie in the milli-Watt range, making it viable for portable single-chip learning applications.
AB - Many recent advances in sparse coding led its wide adoption in signal processing, pattern classification, and object recognition applications. Even with improved performance in state-of-the-art algorithms and the hardware platform of CPUs/GPUs, solving a sparse coding problem still requires expensive computations, making real-time large-scale learning a very challenging problem. In this paper, we cooptimize algorithm, architecture, circuit, and device for real-time energy-efficient on-chip hardware acceleration of sparse coding. The principle of hardware acceleration is to recognize the properties of learning algorithms, which involve many parallel operations of data fetch and matrix/vector multiplication/addition. Today's von Neumann architecture, however, is not suitable for such parallelization, due to the separation of memory and the computing unit that makes sequential operations inevitable. Such principle drives both the selection of algorithms and the design evolution from CPU to CMOS application-specific integrated circuits (ASIC) to parallel architecture with resistive crosspoint array (PARCA) that we propose. The CMOS ASIC scheme implements sparse coding with SRAM dictionaries and all-digital circuits, and PARCA employs resistive-RAM dictionaries with special read and write circuits. We show that 65 nm implementation of the CMOS ASIC and PARCA scheme accelerates sparse coding computation by 394 and 2140x, respectively, compared to software running on a eight-core CPU. Simulated power for both hardware schemes lie in the milli-Watt range, making it viable for portable single-chip learning applications.
KW - Application specific integrated circuits
KW - CMOS integrated circuits
KW - Dictionaries
KW - Hardware
KW - Unsupervised learning
KW - Very large scale integration
UR - http://www.scopus.com/inward/record.url?scp=84947777810&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84947777810&partnerID=8YFLogxK
U2 - 10.1109/TNANO.2015.2478861
DO - 10.1109/TNANO.2015.2478861
M3 - Article
AN - SCOPUS:84947777810
SN - 1536-125X
VL - 14
SP - 969
EP - 979
JO - IEEE Transactions on Nanotechnology
JF - IEEE Transactions on Nanotechnology
IS - 6
M1 - 7268884
ER -