Abstract
Despite its popularity, deploying Convolutional Neural Networks (CNNs) on a portable system is still challenging due to large data volume, intensive computation and frequent memory access. Although previous FPGA acceleration schemes generated by high-level synthesis tools (i.e., HLS, OpenCL) have allowed for fast design optimization, hardware inefficiency still exists when allocating FPGA resources to maximize parallelism and throughput. A direct hardware-level design (i.e., RTL) can improve the efficiency and achieve greater acceleration. However, this requires an in-depth understanding of both the algorithm structure and the FPGA system architecture. In this work, we present a scalable solution that integrates the flexibility of high-level synthesis and the finer level optimization of an RTL implementation. The cornerstone is a compiler that analyzes the CNN structure and parameters, and automatically generates a set of modular and scalable computing primitives that can accelerate various deep learning algorithms. Integrating these modules together for end-to-end CNN implementations, this work quantitatively analyzes the complier's design strategy to optimize the throughput of a given CNN model with the FPGA resource constraints. The proposed methodology is demonstrated on Altera Stratix-V GXA7 FPGA for AlexNet and NIN CNN models, achieving 114.5 GOPS and 117.3 GOPS, respectively. This represents a 1.9× improvement in throughput when compared to the OpenCL-based design. The results illustrate the promise of the automatic compiler solution for modularized and scalable hardware acceleration of deep learning.
Original language | English (US) |
---|---|
Title of host publication | FPL 2016 - 26th International Conference on Field-Programmable Logic and Applications |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
ISBN (Electronic) | 9782839918442 |
DOIs | |
State | Published - Sep 26 2016 |
Externally published | Yes |
Event | 26th International Conference on Field-Programmable Logic and Applications, FPL 2016 - Lausanne, Switzerland Duration: Aug 29 2016 → Sep 2 2016 |
Publication series
Name | FPL 2016 - 26th International Conference on Field-Programmable Logic and Applications |
---|
Conference
Conference | 26th International Conference on Field-Programmable Logic and Applications, FPL 2016 |
---|---|
Country/Territory | Switzerland |
City | Lausanne |
Period | 8/29/16 → 9/2/16 |
Bibliographical note
Publisher Copyright:© 2016 EPFL.
Keywords
- Convolutional neural networks
- FPGA
- hardware acceleration