High Throughput FPGA-Based Object Detection via Algorithm-Hardware Co-Design

Anupreetham Anupreetham, Mohamed Ibrahim, Mathew Hall, Andrew Boutros, Ajay Kuzhively, Abinash Mohanty, Eriko Nurvitadhi, Vaughn Betz, Yu Cao, Jae Sun Seo

Research output: Contribution to journalArticlepeer-review

7 Scopus citations

Abstract

Object detection and classification is a key task in many computer vision applications such as smart surveillance and autonomous vehicles. Recent advances in deep learning have significantly improved the quality of results achieved by these systems, making them more accurate and reliable in complex environments. Modern object detection systems make use of lightweight convolutional neural networks (CNNs) for feature extraction, coupled with single-shot multi-box detectors (SSDs) that generate bounding boxes around the identified objects along with their classification confidence scores. Subsequently, a non-maximum suppression (NMS) module removes any redundant detection boxes from the final output. Typical NMS algorithms must wait for all box predictions to be generated by the SSD-based feature extractor before processing them. This sequential dependency between box predictions and NMS results in a significant latency overhead and degrades the overall system throughput, even if a high-performance CNN accelerator is used for the SSD feature extraction component. In this paper, we present a novel pipelined NMS algorithm that eliminates this sequential dependency and associated NMS latency overhead. We then use our novel NMS algorithm to implement an end-to-end fully pipelined FPGA system for low-latency SSD-MobileNet-V1 object detection. Our system, implemented on an Intel Stratix 10 FPGA, runs at 400 MHz and achieves a throughput of 2,167 frames per second with an end-to-end batch-1 latency of 2.13 ms. Our system achieves 5.3× higher throughput and 5× lower latency compared to the best prior FPGA-based solution with comparable accuracy.

Original languageEnglish (US)
Article number1
JournalACM Transactions on Reconfigurable Technology and Systems
Volume17
Issue number1
DOIs
StatePublished - Jan 15 2024
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.

Keywords

  • FPGA accelerator
  • algorithm-hardware co-design
  • neural networks
  • object detection

Fingerprint

Dive into the research topics of 'High Throughput FPGA-Based Object Detection via Algorithm-Hardware Co-Design'. Together they form a unique fingerprint.

Cite this