Abstract
With the yearning for deep learning democratization, there are increasing demands to implement Transformer-based natural language processing (NLP) models on resource-constrained devices for low-latency and high accuracy. Existing BERT pruning methods require domain experts to heuristically handcraft hyperparameters to strike a balance among model size, latency, and accuracy. In this work, we propose AE-BERT, an automatic and efficient BERT pruning framework with efficient evaluation to select a "good"sub-network candidate (with high accuracy) given the overall pruning ratio constraints. Our proposed method requires no human experts experience and achieves a better accuracy performance on many NLP tasks. Our experimental results on General Language Understanding Evaluation (GLUE) benchmark show that AE-BERT outperforms the state-of-the-art (SOTA) hand-crafted pruning methods on BERT. On QNLI and RTE, we obtain 75% and 42.8% more overall pruning ratio while achieving higher accuracy. On MRPC, we obtain a 4.6 higher score than the SOTA at the same overall pruning ratio of 0.5. On STS-B, we can achieve a 40% higher pruning ratio with a very small loss in Spearman correlation compared to SOTA hand-crafted pruning methods. Experimental results also show that after model compression, the inference time of a single BERTBASE encoder on Xilinx Alveo U200 FPGA board has a 1.83× speedup compared to Intel(R) Xeon(R) Gold 5218 (2.30GHz) CPU, which shows the reasonableness of deploying the proposed method generated sub-networks of BERTBASE model on computation restricted devices.
Original language | English (US) |
---|---|
Title of host publication | Proceedings of the 23rd International Symposium on Quality Electronic Design, ISQED 2022 |
Publisher | IEEE Computer Society |
ISBN (Electronic) | 9781665494663 |
DOIs | |
State | Published - 2022 |
Externally published | Yes |
Event | 23rd International Symposium on Quality Electronic Design, ISQED 2022 - Santa Jose, United States Duration: Apr 6 2022 → Apr 7 2022 |
Publication series
Name | Proceedings - International Symposium on Quality Electronic Design, ISQED |
---|---|
Volume | 2022-April |
ISSN (Print) | 1948-3287 |
ISSN (Electronic) | 1948-3295 |
Conference
Conference | 23rd International Symposium on Quality Electronic Design, ISQED 2022 |
---|---|
Country/Territory | United States |
City | Santa Jose |
Period | 4/6/22 → 4/7/22 |
Bibliographical note
Publisher Copyright:© 2022 IEEE.
Keywords
- Transformer
- acceleration
- deep learning
- pruning