Scheduling computations in each layer of a convolutional neural network on a deep learning (DL) accelerator involves a large number of choices, each of which involves a different set of memory reuse and memory access patterns. Since memory transactions are the primary bottleneck in DL acceleration, these choices can strongly impact the energy and throughput of the accelerator. This work proposes an optimization framework, DeepOpt, for general ASIC-based systolic hardware accelerators for layer-specific and hardware-specific scheduling strategy for each layer of a CNN to optimize energy and latency. Optimal hardware allocation significantly reduces execution cost as compared to generic static hardware resource allocation, e.g., improvements of up to 50 in the energy-delay product for VGG-16 and 41 for GoogleNet-v1.
|Original language||English (US)|
|Title of host publication||Proceedings of the 26th Asia and South Pacific Design Automation Conference, ASP-DAC 2021|
|Publisher||Institute of Electrical and Electronics Engineers Inc.|
|Number of pages||7|
|State||Published - Jan 18 2021|
|Event||26th Asia and South Pacific Design Automation Conference, ASP-DAC 2021 - Virtual, Online, Japan|
Duration: Jan 18 2021 → Jan 21 2021
|Name||Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC|
|Conference||26th Asia and South Pacific Design Automation Conference, ASP-DAC 2021|
|Period||1/18/21 → 1/21/21|
Bibliographical noteFunding Information:
We thank Z. Wang and A. B. Kahng (UCSD) for helping in modeling SRAM area. This work is supported in part by NSF (CCF-1763761).
© 2021 Association for Computing Machinery.
- hardware accelerator