TY - JOUR
T1 - High-Throughput Training of Deep CNNs on ReRAM-Based Heterogeneous Architectures via Optimized Normalization Layers
AU - Joardar, Biresh Kumar
AU - Deshwal, Aryan
AU - Doppa, Janardhan Rao
AU - Pande, Partha Pratim
AU - Chakrabarty, Krishnendu
N1 - Publisher Copyright:
© 1982-2012 IEEE.
PY - 2022/5/1
Y1 - 2022/5/1
N2 - Resistive random-access memory (ReRAM)-based architectures can be used to accelerate convolutional neural network (CNN) training. However, existing architectures either do not support normalization at all or they support only a limited version of it. Moreover, it is common practice for CNNs to add normalization layers after every convolution layer. In this work, we show that while normalization layers are necessary to train deep CNNs, only a few such layers are sufficient for effective training. A large number of normalization layers do not improve prediction accuracy; it necessitates additional hardware and gives rise to performance bottlenecks. To address this problem, we propose DeepTrain, a heterogeneous architecture enabled by a Bayesian optimization (BO) methodology; together, they provide adequate hardware and software support for normalization operations. The proposed BO methodology determines the minimum number of normalization operations necessary for a given CNN. Experimental evaluation indicates that the BO-enabled DeepTrain architecture achieves up to 15× speedup compared to a conventional GPU for training CNNs with no accuracy loss while utilizing only a few normalization layers.
AB - Resistive random-access memory (ReRAM)-based architectures can be used to accelerate convolutional neural network (CNN) training. However, existing architectures either do not support normalization at all or they support only a limited version of it. Moreover, it is common practice for CNNs to add normalization layers after every convolution layer. In this work, we show that while normalization layers are necessary to train deep CNNs, only a few such layers are sufficient for effective training. A large number of normalization layers do not improve prediction accuracy; it necessitates additional hardware and gives rise to performance bottlenecks. To address this problem, we propose DeepTrain, a heterogeneous architecture enabled by a Bayesian optimization (BO) methodology; together, they provide adequate hardware and software support for normalization operations. The proposed BO methodology determines the minimum number of normalization operations necessary for a given CNN. Experimental evaluation indicates that the BO-enabled DeepTrain architecture achieves up to 15× speedup compared to a conventional GPU for training CNNs with no accuracy loss while utilizing only a few normalization layers.
KW - 3-D
KW - convolutional neural networks (CNNs)
KW - GPU
KW - normalization
KW - resistive random-access memory (ReRAM)
UR - http://www.scopus.com/inward/record.url?scp=85107231564&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85107231564&partnerID=8YFLogxK
U2 - 10.1109/TCAD.2021.3083684
DO - 10.1109/TCAD.2021.3083684
M3 - Article
AN - SCOPUS:85107231564
SN - 0278-0070
VL - 41
SP - 1537
EP - 1549
JO - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
JF - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
IS - 5
ER -