TY - JOUR
T1 - AutoAI2C
T2 - An Automated Hardware Generator for DNN Acceleration on Both FPGA and ASIC
AU - Zhang, Yongan
AU - Zhang, Xiaofan
AU - Xu, Pengfei
AU - Zhao, Yang
AU - Hao, Cong
AU - Chen, Deming
AU - Lin, Yingyan
N1 - Publisher Copyright:
© 1982-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - Recent advancements in deep neural networks (DNNs) and the slowing of Moore's law have made domain-specific hardware accelerators for DNNs (i.e., DNN chips) a promising means for enabling more extensive DNN applications. However, designing DNN chips is challenging due to: 1) the vast and nonstandardized design space and 2) different DNN models' varying performance preferences regarding hardware micro-architecture and dataflows. Therefore, designing a DNN chip often takes a large team of interdisciplinary experts months to years. To enable flexible and efficient DNN chip design, we propose AutoAI2C: a DNN chip generator that can automatically generate both FPGA- and ASIC-based DNN accelerator implementation (i.e., synthesizable hardware and deployment code) with optimized algorithm-to-hardware mapping, given the DNN model specification from mainstream machine learning frameworks (e.g., PyTorch). Specifically, AutoAI2C consists of two major components: 1) a Chip Predictor, which can efficiently and reliably predict a DNN accelerator's energy, latency, and resource consumption using the proposed graph-based intermediate accelerator representation and 2) a Chip Builder, which can generate and optimize DNN accelerator designs by automatically exploring the design space based on targeting metrics and the Chip Predictor's performance feedback. Extensive experiments show that our Chip Predictor's predictions differ by <10% from real-measured ones. Furthermore, AutoAI2C generated accelerators can achieve performance comparable to or better than state-of-the-art accelerators, achieving up to a 2.12\times $ throughput improvements or 2.4\times $ latency reduction with the same level of hardware resource usage, or reducing energy consumption by up to 1.6\times $ , when running the same DNN workloads.
AB - Recent advancements in deep neural networks (DNNs) and the slowing of Moore's law have made domain-specific hardware accelerators for DNNs (i.e., DNN chips) a promising means for enabling more extensive DNN applications. However, designing DNN chips is challenging due to: 1) the vast and nonstandardized design space and 2) different DNN models' varying performance preferences regarding hardware micro-architecture and dataflows. Therefore, designing a DNN chip often takes a large team of interdisciplinary experts months to years. To enable flexible and efficient DNN chip design, we propose AutoAI2C: a DNN chip generator that can automatically generate both FPGA- and ASIC-based DNN accelerator implementation (i.e., synthesizable hardware and deployment code) with optimized algorithm-to-hardware mapping, given the DNN model specification from mainstream machine learning frameworks (e.g., PyTorch). Specifically, AutoAI2C consists of two major components: 1) a Chip Predictor, which can efficiently and reliably predict a DNN accelerator's energy, latency, and resource consumption using the proposed graph-based intermediate accelerator representation and 2) a Chip Builder, which can generate and optimize DNN accelerator designs by automatically exploring the design space based on targeting metrics and the Chip Predictor's performance feedback. Extensive experiments show that our Chip Predictor's predictions differ by <10% from real-measured ones. Furthermore, AutoAI2C generated accelerators can achieve performance comparable to or better than state-of-the-art accelerators, achieving up to a 2.12\times $ throughput improvements or 2.4\times $ latency reduction with the same level of hardware resource usage, or reducing energy consumption by up to 1.6\times $ , when running the same DNN workloads.
KW - AI chips
KW - design automation
KW - genetic algorithms
KW - neural network hardware
UR - http://www.scopus.com/inward/record.url?scp=85191288984&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85191288984&partnerID=8YFLogxK
U2 - 10.1109/tcad.2024.3393428
DO - 10.1109/tcad.2024.3393428
M3 - Article
AN - SCOPUS:85191288984
SN - 0278-0070
VL - 43
SP - 3143
EP - 3156
JO - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
JF - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
IS - 10
ER -