Abstract
Scheduling computations in each layer of a convolutional neural network on a deep learning (DL) accelerator involves a large number of choices, each of which involves a different set of memory reuse and memory access patterns. Since memory transactions are the primary bottleneck in DL acceleration, these choices can strongly impact the energy and throughput of the accelerator. This work proposes an optimization framework, DeepOpt, for general ASIC-based systolic hardware accelerators for layer-specific and hardware-specific scheduling strategy for each layer of a CNN to optimize energy and latency. Optimal hardware allocation significantly reduces execution cost as compared to generic static hardware resource allocation, e.g., improvements of up to 50 in the energy-delay product for VGG-16 and 41 for GoogleNet-v1.
Original language | English (US) |
---|---|
Title of host publication | Proceedings of the 26th Asia and South Pacific Design Automation Conference, ASP-DAC 2021 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 235-241 |
Number of pages | 7 |
ISBN (Electronic) | 9781450379991 |
DOIs | |
State | Published - Jan 18 2021 |
Event | 26th Asia and South Pacific Design Automation Conference, ASP-DAC 2021 - Virtual, Online, Japan Duration: Jan 18 2021 → Jan 21 2021 |
Publication series
Name | Proceedings of the 26th Asia and South Pacific Design Automation Conference |
---|
Conference
Conference | 26th Asia and South Pacific Design Automation Conference, ASP-DAC 2021 |
---|---|
Country/Territory | Japan |
City | Virtual, Online |
Period | 1/18/21 → 1/21/21 |
Bibliographical note
Funding Information:We thank Z. Wang and A. B. Kahng (UCSD) for helping in modeling SRAM area. This work is supported in part by NSF (CCF-1763761).
Publisher Copyright:
© 2021 Association for Computing Machinery.
Keywords
- CNN
- hardware accelerator
- scheduling