Skip to main navigation Skip to search Skip to main content

MAD-HiSpMV: Matrix Adaptive Design with Hybrid Row Distribution for Imbalanced SpMV Acceleration on FPGAs

  • Manoj Bheemasandra Rajashekar
  • , Akhil Raj Baranwal
  • , Xingyu Tian
  • , Zhenman Fang

Research output: Contribution to journalArticlepeer-review

Abstract

Sparse Matrix–Vector Multiplication (SpMV) is fundamental in numerous applications such as scientific computing, Machine Learning (ML), and graph analytics. While recent studies have made tremendous progress in accelerating SpMV on HBM-equipped FPGAs, there are still multiple remaining challenges to accelerate imbalanced SpMV where the distribution of nonzeros in the sparse matrix is imbalanced across different rows. These include (1) imbalanced workload distribution among the parallel Processing Elements (PEs), (2) long-distance dependency for floating-point accumulation on the output vector, (3) a new bottleneck due to the often-overlooked dense vectors’ off-chip access after the SpMV acceleration, and (4) sub-optimal performance of generic accelerators for various types of sparse matrices. (5) Additionally, ML workloads often consist of both SpMV and General Matrix–Vector Multiplication (GeMV), which suffer from kernel switching inefficiencies. To address those challenges, we propose MAD-HiSpMV to accelerate imbalanced SpMV on HBM-equipped FPGAs with the following novel solutions: (1) a hybrid row distribution network to enable both inter-row and intra-row distribution for better balance, (2) a fully pipelined floating-point accumulation on the output vector using a combination of an adder chain and register-based circular buffer, (3) matrix adaptive design configurations generated by our automation framework via Design Space Exploration (DSE) to maximize performance for the given matrix, and (4) a GeMV overlay built into the same kernel for efficient acceleration of mixed workloads. Experimental results demonstrate that the DSE-picked configuration of MAD-HiSpMV achieves a geomean speedup of 1.3× (up to 2.12×) for the SpMV benchmark matrices and achieves a geomean 1.15× (up to 1.54×) better performance per watt, when compared to state-of-the-art generic designs. For the SpMV benchmark matrices, compared to Intel MKL running on a 24-core Xeon Silver 4214 CPU, MAD-HiSpMV achieves a geomean speedup of 8.80×. Compared to cuSparse running on an Nvidia GTX 1080ti GPU, MAD-HiSpMV achieves a geomean of 2.57× better performance per watt. Additionally, a GeMV overlay built into MAD-HiSpMV achieves a peak throughput of 156.7 GFLOPS, which is 2.64× better than the Vitis L2 GeMV benchmark on U280, and performs 2.7× better for an end-to-end mixed workload, when compared to Intel MKL running on a 24-core Xeon Silver 4214 CPU. MAD-HiSpMV is available at https://github.com/SFU-HiAccel/HiSpMV.

Original languageEnglish (US)
Article number47
JournalACM Transactions on Reconfigurable Technology and Systems
Volume18
Issue number4
DOIs
StatePublished - Nov 21 2025
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2025 Copyright held by the owner/author(s).

Keywords

  • Design Space Exploration
  • FPGA Accelerator
  • High Level Synthesis
  • Imbalanced Workload
  • Input Specific
  • SpMV

Fingerprint

Dive into the research topics of 'MAD-HiSpMV: Matrix Adaptive Design with Hybrid Row Distribution for Imbalanced SpMV Acceleration on FPGAs'. Together they form a unique fingerprint.

Cite this