Reusing GEMM Hardware for Efficient Execution of Depthwise Separable Convolution on ASIC-Based DNN Accelerators

Susmita Dey Manasi, Suvadeep Banerjee, Abhijit Davare, Anton A. Sorokin, Steven M. Burns, Desmond A. Kirkpatrick, Sachin S. Sapatnekar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Scopus citations

Abstract

Deep learning (DL) accelerators are optimized for standard convolution. However, lightweight convolutional neural networks (CNNs) use depthwise convolution (DwC) in key layers, and the structural difference between DwC and standard convolution leads to significant performance bottleneck in executing lightweight CNNs on such platforms. This work reuses the fast general matrix-vector multiplication (GEMM) core of DL accelerators by mapping DwC to channel-wise parallel matrix-vector multiplications. An analytical framework is developed to guide pre-RTL hardware choices, and new hardware modules and software support are developed for end-to-end evaluation of the solution. This GEMM-based DwC execution strategy offers substantial performance gains for lightweight CNNs: 7× speedup and 1.8× lower off-chip communication for MobileNet-v1 over a conventional DL accelerator, and 74× speedup over a CPU, and even 1.4× speedup over a power-hungry GPU.

Original languageEnglish (US)
Title of host publicationASP-DAC 2023 - 28th Asia and South Pacific Design Automation Conference, Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages475-482
Number of pages8
ISBN (Electronic)9781450397834
DOIs
StatePublished - Jan 16 2023
Event28th Asia and South Pacific Design Automation Conference, ASP-DAC 2023 - Tokyo, Japan
Duration: Jan 16 2023Jan 19 2023

Publication series

NameProceedings of the 28th Asia and South Pacific Design Automation Conference

Conference

Conference28th Asia and South Pacific Design Automation Conference, ASP-DAC 2023
Country/TerritoryJapan
CityTokyo
Period1/16/231/19/23

Bibliographical note

Funding Information:
This work is supported in part by AFRL under the DARPA RTML program under award FA8650-20-2-7009 and internship at Intel Strategic CAD Labs. The U. S. government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of AFRL, DARPA, or the U. S. government. The authors would like to acknowledge the contribution of Zhiang Wang from UCSD.

Publisher Copyright:
© 2023 Copyright held by the owner/author(s).

Keywords

  • deep learning accelerator
  • depthwise convolution
  • lightweight CNN

Fingerprint

Dive into the research topics of 'Reusing GEMM Hardware for Efficient Execution of Depthwise Separable Convolution on ASIC-Based DNN Accelerators'. Together they form a unique fingerprint.

Cite this