Abstract
Pruning has been extensively studied in Transformer-based language models to improve efficiency. Typically, we zero (prune) unimportant model weights and train a derived compact model to improve final accuracy. For pruned weights, we treat them as useless and discard them. This usually leads to significant model accuracy degradation. In this paper, we focus on attention head pruning as head attention is a key component of the transformer-based language models and provides interpretable knowledge meaning. We reveal the relationship between pruned attention heads and retained heads and provide a solution to recycle the discarded knowledge from the pruned heads, named peer distillation. We also develop an automatic framework to locate the to-be-pruned attention heads in each layer, freeing the time-consuming human labor in tuning hyperparameters. Experimental results on the General Language Understanding Evaluation (GLUE) benchmark are provided using BERT model. By recycling discarded knowledge from pruned heads, the proposed method maintains model performance across all nine tasks while reducing heads by over 58% on average and outperforms state-of-the-art techniques (e.g., Random, HISP, L0 Norm, SMP).
Original language | English (US) |
---|---|
Title of host publication | Proceedings of the 32nd International Joint Conference on Artificial Intelligence, IJCAI 2023 |
Editors | Edith Elkind |
Publisher | International Joint Conferences on Artificial Intelligence |
Pages | 5113-5121 |
Number of pages | 9 |
ISBN (Electronic) | 9781956792034 |
State | Published - 2023 |
Externally published | Yes |
Event | 32nd International Joint Conference on Artificial Intelligence, IJCAI 2023 - Macao, China Duration: Aug 19 2023 → Aug 25 2023 |
Publication series
Name | IJCAI International Joint Conference on Artificial Intelligence |
---|---|
Volume | 2023-August |
ISSN (Print) | 1045-0823 |
Conference
Conference | 32nd International Joint Conference on Artificial Intelligence, IJCAI 2023 |
---|---|
Country/Territory | China |
City | Macao |
Period | 8/19/23 → 8/25/23 |
Bibliographical note
Publisher Copyright:© 2023 International Joint Conferences on Artificial Intelligence. All rights reserved.