Abstract
As deep learning models grow in size to achieve state-of-the-art accuracy, there is a pressing need for compact models. To address this challenge, we introduce a novel operation called Personal Self-Attention (PSA). It is specifically designed to learn non-linear 1D functions, enhancing existing spline-based methods while remaining compatible with gradient backpropagation. By integrating these non-linear functions with linear transformations, we can achieve the accuracy of larger models but with significantly smaller hidden dimensions, which is crucial for FPGA implementations. We evaluate PSA by implementing it in a Multi-Layer Perceptron (MLP)-based vision model, ResMLP, and testing it on the CIFAR-10 classification task. MLP is gaining increasing popularity due to its widespread use in large-language models. Our results confirm that PSA achieves equivalent accuracy with a 2× smaller hidden size compared to conventional MLPs. Furthermore, by quantizing our non-linear function into a simple Lookup Table (LUT), we reduce the number of operations required by 45–28%, which offers significant benefits for hardware accelerators. To showcase this, we design an end-to-end unrolled streaming accelerator for ResMLP, demonstrating that our compressed model maintains an 88% accuracy while reducing LUT + DSP resource requirements by 25%, and doubling throughput to 32 kFPS. Additionally, we implement a fixed-size SIMD accelerator for the same compressed model that achieves a 62.1% improvement in throughput while only consuming 3.5% extra LUTs.
Original language | English (US) |
---|---|
Article number | 21 |
Journal | ACM Transactions on Reconfigurable Technology and Systems |
Volume | 18 |
Issue number | 2 |
DOIs | |
State | Published - Mar 22 2025 |
Bibliographical note
Publisher Copyright:© 2025 Copyright held by the owner/author(s).
Keywords
- FPGA accelerator
- lookup table
- model compression
- neural networks
- non-linear functions
- parameterized activation function