Compressing Neural Networks using Learnable 1D Non-Linear Functions

Gaurav Singh, Kia Bazargan

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

As deep learning models grow in size to achieve state-of-the-art accuracy, there is a pressing need for compact models. To address this challenge, we introduce a novel operation called Personal Self-Attention (PSA). It is specifically designed to learn non-linear 1D functions, enhancing existing spline-based methods while remaining compatible with gradient backpropagation. By integrating these non-linear functions with linear transformations, we can achieve the accuracy of larger models but with significantly smaller hidden dimensions, which is crucial for FPGA implementations. We evaluate PSA by implementing it in a Multi-Layer Perceptron (MLP)-based vision model, ResMLP, and testing it on the CIFAR-10 classification task. MLP is gaining increasing popularity due to its widespread use in large-language models. Our results confirm that PSA achieves equivalent accuracy with a 2× smaller hidden size compared to conventional MLPs. Furthermore, by quantizing our non-linear function into a simple Lookup Table (LUT), we reduce the number of operations required by 45–28%, which offers significant benefits for hardware accelerators. To showcase this, we design an end-to-end unrolled streaming accelerator for ResMLP, demonstrating that our compressed model maintains an 88% accuracy while reducing LUT + DSP resource requirements by 25%, and doubling throughput to 32 kFPS. Additionally, we implement a fixed-size SIMD accelerator for the same compressed model that achieves a 62.1% improvement in throughput while only consuming 3.5% extra LUTs.

Original languageEnglish (US)
Article number21
JournalACM Transactions on Reconfigurable Technology and Systems
Volume18
Issue number2
DOIs
StatePublished - Mar 22 2025

Bibliographical note

Publisher Copyright:
© 2025 Copyright held by the owner/author(s).

Keywords

  • FPGA accelerator
  • lookup table
  • model compression
  • neural networks
  • non-linear functions
  • parameterized activation function

Fingerprint

Dive into the research topics of 'Compressing Neural Networks using Learnable 1D Non-Linear Functions'. Together they form a unique fingerprint.

Cite this