An Automatic and Efficient BERT Pruning for Edge AI Systems

Shaoyi Huang, Ning Liu, Yueying Liang, Hongwu Peng, Hongjia Li, Dongkuan Xu, Mimi Xie, Caiwen Ding

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Scopus citations

Abstract

With the yearning for deep learning democratization, there are increasing demands to implement Transformer-based natural language processing (NLP) models on resource-constrained devices for low-latency and high accuracy. Existing BERT pruning methods require domain experts to heuristically handcraft hyperparameters to strike a balance among model size, latency, and accuracy. In this work, we propose AE-BERT, an automatic and efficient BERT pruning framework with efficient evaluation to select a "good"sub-network candidate (with high accuracy) given the overall pruning ratio constraints. Our proposed method requires no human experts experience and achieves a better accuracy performance on many NLP tasks. Our experimental results on General Language Understanding Evaluation (GLUE) benchmark show that AE-BERT outperforms the state-of-the-art (SOTA) hand-crafted pruning methods on BERT. On QNLI and RTE, we obtain 75% and 42.8% more overall pruning ratio while achieving higher accuracy. On MRPC, we obtain a 4.6 higher score than the SOTA at the same overall pruning ratio of 0.5. On STS-B, we can achieve a 40% higher pruning ratio with a very small loss in Spearman correlation compared to SOTA hand-crafted pruning methods. Experimental results also show that after model compression, the inference time of a single BERTBASE encoder on Xilinx Alveo U200 FPGA board has a 1.83× speedup compared to Intel(R) Xeon(R) Gold 5218 (2.30GHz) CPU, which shows the reasonableness of deploying the proposed method generated sub-networks of BERTBASE model on computation restricted devices.

Original languageEnglish (US)
Title of host publicationProceedings of the 23rd International Symposium on Quality Electronic Design, ISQED 2022
PublisherIEEE Computer Society
ISBN (Electronic)9781665494663
DOIs
StatePublished - 2022
Externally publishedYes
Event23rd International Symposium on Quality Electronic Design, ISQED 2022 - Santa Jose, United States
Duration: Apr 6 2022Apr 7 2022

Publication series

NameProceedings - International Symposium on Quality Electronic Design, ISQED
Volume2022-April
ISSN (Print)1948-3287
ISSN (Electronic)1948-3295

Conference

Conference23rd International Symposium on Quality Electronic Design, ISQED 2022
Country/TerritoryUnited States
CitySanta Jose
Period4/6/224/7/22

Bibliographical note

Publisher Copyright:
© 2022 IEEE.

Keywords

  • Transformer
  • acceleration
  • deep learning
  • pruning

Fingerprint

Dive into the research topics of 'An Automatic and Efficient BERT Pruning for Edge AI Systems'. Together they form a unique fingerprint.

Cite this