TY - JOUR
T1 - InterGrad
T2 - Energy-Efficient Training of Convolutional Neural Networks via Interleaved Gradient Scheduling
AU - Unnikrishnan, Nanda K.
AU - Parhi, Keshab K.
N1 - Publisher Copyright:
IEEE
PY - 2023/5/1
Y1 - 2023/5/1
N2 - This paper addresses the design of accelerators using systolic architectures to train convolutional neural networks using a novel gradient interleaving approach. Training the neural network involves computation and backpropagation of gradients of error with respect to the activation functions and weights. It is shown that the gradient with respect to the activation function can be computed using a weight-stationary systolic array, while the gradient with respect to the weights can be computed using an output-stationary systolic array. The novelty of the proposed approach lies in interleaving the computations of these two gradients on the same configurable systolic array. This results in the reuse of the variables from one computation to the other and eliminates unnecessary memory accesses and energy consumption associated with these memory accesses. The proposed approach leads to 1.4-2.2 × savings in terms of the number of cycles and 1.9 × savings in terms of memory accesses in the fully-connected layer. Furthermore, the proposed method uses up to 25% fewer cycles and memory accesses, and 16% less energy than baseline implementations for state-of-the-art CNNs. Under iso-area comparisons, for Inception-v4, compared to weight-stationary (WS), Intergrad achieves 12% savings in energy, 17% savings in memory, and 4% savings in cycles. Savings for Densenet-264 are 18%, 26%, and 27% with respect to energy, memory, and cycles, respectively. Thus, the proposed novel accelerator architecture reduces the latency and energy consumption for training deep neural networks.
AB - This paper addresses the design of accelerators using systolic architectures to train convolutional neural networks using a novel gradient interleaving approach. Training the neural network involves computation and backpropagation of gradients of error with respect to the activation functions and weights. It is shown that the gradient with respect to the activation function can be computed using a weight-stationary systolic array, while the gradient with respect to the weights can be computed using an output-stationary systolic array. The novelty of the proposed approach lies in interleaving the computations of these two gradients on the same configurable systolic array. This results in the reuse of the variables from one computation to the other and eliminates unnecessary memory accesses and energy consumption associated with these memory accesses. The proposed approach leads to 1.4-2.2 × savings in terms of the number of cycles and 1.9 × savings in terms of memory accesses in the fully-connected layer. Furthermore, the proposed method uses up to 25% fewer cycles and memory accesses, and 16% less energy than baseline implementations for state-of-the-art CNNs. Under iso-area comparisons, for Inception-v4, compared to weight-stationary (WS), Intergrad achieves 12% savings in energy, 17% savings in memory, and 4% savings in cycles. Savings for Densenet-264 are 18%, 26%, and 27% with respect to energy, memory, and cycles, respectively. Thus, the proposed novel accelerator architecture reduces the latency and energy consumption for training deep neural networks.
KW - Neural network training
KW - accelerator architectures
KW - backpropagation
KW - convolutional neural networks
KW - gradient interleaving
KW - interleaved scheduling
KW - systolic array
UR - http://www.scopus.com/inward/record.url?scp=85149395263&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85149395263&partnerID=8YFLogxK
U2 - 10.1109/tcsi.2023.3246468
DO - 10.1109/tcsi.2023.3246468
M3 - Article
AN - SCOPUS:85149395263
SN - 1549-8328
VL - 70
SP - 1949
EP - 1962
JO - IEEE Transactions on Circuits and Systems I: Regular Papers
JF - IEEE Transactions on Circuits and Systems I: Regular Papers
IS - 5
ER -