Abstract
We present SmartExchange, an algorithm-hardware co-design framework to trade higher-cost memory storage/access for lower-cost computation, for energy-efficient inference of deep neural networks (DNNs). We develop a novel algorithm to enforce a specially favorable DNN weight structure, where each layerwise weight matrix can be stored as the product of a small basis matrix and a large sparse coefficient matrix whose non-zero elements are all power-of-2. To our best knowledge, this algorithm is the first formulation that integrates three mainstream model compression ideas: sparsification or pruning, decomposition, and quantization, into one unified framework. The resulting sparse and readily-quantized DNN thus enjoys greatly reduced energy consumption in data movement as well as weight storage. On top of that, we further design a dedicated accelerator to fully utilize the SmartExchange-enforced weights to improve both energy efficiency and latency performance. Extensive experiments show that 1) on the algorithm level, SmartExchange outperforms state-of-the-art compression techniques, including merely sparsification or pruning, decomposition, and quantization, in various ablation studies based on nine models and four datasets; and 2) on the hardware level, SmartExchange can boost the energy efficiency by up to $6.7 × and reduce the latency by up to $19.2 × over four state-of-the-art DNN accelerators, when benchmarked on seven DNN models (including four standard DNNs, two compact DNN models, and one segmentation model) and three datasets.
| Original language | English (US) |
|---|---|
| Title of host publication | Proceedings - 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture, ISCA 2020 |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 954-967 |
| Number of pages | 14 |
| ISBN (Electronic) | 9781728146614 |
| DOIs | |
| State | Published - May 2020 |
| Externally published | Yes |
| Event | 47th ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2020 - Virtual, Online, Spain Duration: May 30 2020 → Jun 3 2020 |
Publication series
| Name | Proceedings - International Symposium on Computer Architecture |
|---|---|
| Volume | 2020-May |
| ISSN (Print) | 1063-6897 |
Conference
| Conference | 47th ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2020 |
|---|---|
| Country/Territory | Spain |
| City | Virtual, Online |
| Period | 5/30/20 → 6/3/20 |
Bibliographical note
Publisher Copyright:© 2020 IEEE.
Keywords
- Neural network compression
- neural network inference accelerator
- pruning
- quantization
- weight decomposition