Abstract
Recurrent Neural Networks (RNNs) are becoming increasingly important for time series-related applications which require efficient and real-time implementations. The two major types are Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks. It is a challenging task to have real-time, efficient, and accurate hardware RNN implementations because of the high sensitivity to imprecision accumulation and the requirement of special activation function implementations. Recently two works have focused on FPGA implementation of inference phase of LSTM RNNs with model compression. First, ESE uses a weight pruning based compressed RNN model but suffers from irregular network structure after pruning. The second work C-LSTM mitigates the irregular network limitation by incorporating block-circulant matrices for weight matrix representation in RNNs, thereby achieving simultaneous model compression and acceleration. A key limitation of the prior works is the lack of a systematic design optimization framework of RNN model and hardware implementations, especially when the block size (or compression ratio) should be jointly optimized with RNN type, layer size, etc. In this paper, we adopt the block-circulant matrixbased framework, and present the Efficient RNN (E-RNN) framework for FPGA implementations of the Automatic Speech Recognition (ASR) application. The overall goal is to improve performance/energy efficiency under accuracy requirement.We use the alternating direction method of multipliers (ADMM) technique for more accurate block-circulant training, and present two design explorations providing guidance on block size and reducing RNN training trials. Based on the two observations, we decompose E-RNN in two phases: Phase I on determining RNN model to reduce computation and storage subject to accuracy requirement, and Phase II on hardware implementations given RNN model, including processing element design/optimization, quantization, activation implementation, etc. 1 Experimental results on actual FPGA deployments show that E-RNN achieves a maximum energy efficiency improvement of 37.4× compared with ESE, and more than 2× compared with C-LSTM, under the same accuracy.
Original language | English (US) |
---|---|
Title of host publication | Proceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 69-80 |
Number of pages | 12 |
ISBN (Electronic) | 9781728114446 |
DOIs | |
State | Published - Mar 26 2019 |
Externally published | Yes |
Event | 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019 - Washington, United States Duration: Feb 16 2019 → Feb 20 2019 |
Publication series
Name | Proceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019 |
---|
Conference
Conference | 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019 |
---|---|
Country/Territory | United States |
City | Washington |
Period | 2/16/19 → 2/20/19 |
Bibliographical note
Publisher Copyright:© 2019 IEEE.
Keywords
- Block-circulant matrix
- Design optimization
- FPGAS
- RNN