SmartExchange: Trading Higher-cost Memory Storage/Access for Lower-cost Computation

Yang Zhao, Xiaohan Chen, Yue Wang, Chaojian Li, Haoran You, Yonggan Fu, Yuan Xie, Zhangyang Wang, Yingyan Lin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

35 Scopus citations

Abstract

We present SmartExchange, an algorithm-hardware co-design framework to trade higher-cost memory storage/access for lower-cost computation, for energy-efficient inference of deep neural networks (DNNs). We develop a novel algorithm to enforce a specially favorable DNN weight structure, where each layerwise weight matrix can be stored as the product of a small basis matrix and a large sparse coefficient matrix whose non-zero elements are all power-of-2. To our best knowledge, this algorithm is the first formulation that integrates three mainstream model compression ideas: sparsification or pruning, decomposition, and quantization, into one unified framework. The resulting sparse and readily-quantized DNN thus enjoys greatly reduced energy consumption in data movement as well as weight storage. On top of that, we further design a dedicated accelerator to fully utilize the SmartExchange-enforced weights to improve both energy efficiency and latency performance. Extensive experiments show that 1) on the algorithm level, SmartExchange outperforms state-of-the-art compression techniques, including merely sparsification or pruning, decomposition, and quantization, in various ablation studies based on nine models and four datasets; and 2) on the hardware level, SmartExchange can boost the energy efficiency by up to $6.7 × and reduce the latency by up to $19.2 × over four state-of-the-art DNN accelerators, when benchmarked on seven DNN models (including four standard DNNs, two compact DNN models, and one segmentation model) and three datasets.

Original languageEnglish (US)
Title of host publicationProceedings - 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture, ISCA 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages954-967
Number of pages14
ISBN (Electronic)9781728146614
DOIs
StatePublished - May 2020
Externally publishedYes
Event47th ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2020 - Virtual, Online, Spain
Duration: May 30 2020Jun 3 2020

Publication series

NameProceedings - International Symposium on Computer Architecture
Volume2020-May
ISSN (Print)1063-6897

Conference

Conference47th ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2020
Country/TerritorySpain
CityVirtual, Online
Period5/30/206/3/20

Bibliographical note

Publisher Copyright:
© 2020 IEEE.

Keywords

  • Neural network compression
  • neural network inference accelerator
  • pruning
  • quantization
  • weight decomposition

Fingerprint

Dive into the research topics of 'SmartExchange: Trading Higher-cost Memory Storage/Access for Lower-cost Computation'. Together they form a unique fingerprint.

Cite this