Towards Lossless Head Pruning through Automatic Peer Distillation for Language Models

Bingbing Li, Zigeng Wang, Shaoyi Huang, Mikhail Bragin, Ji Li, Caiwen Ding

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Pruning has been extensively studied in Transformer-based language models to improve efficiency. Typically, we zero (prune) unimportant model weights and train a derived compact model to improve final accuracy. For pruned weights, we treat them as useless and discard them. This usually leads to significant model accuracy degradation. In this paper, we focus on attention head pruning as head attention is a key component of the transformer-based language models and provides interpretable knowledge meaning. We reveal the relationship between pruned attention heads and retained heads and provide a solution to recycle the discarded knowledge from the pruned heads, named peer distillation. We also develop an automatic framework to locate the to-be-pruned attention heads in each layer, freeing the time-consuming human labor in tuning hyperparameters. Experimental results on the General Language Understanding Evaluation (GLUE) benchmark are provided using BERT model. By recycling discarded knowledge from pruned heads, the proposed method maintains model performance across all nine tasks while reducing heads by over 58% on average and outperforms state-of-the-art techniques (e.g., Random, HISP, L0 Norm, SMP).

Original languageEnglish (US)
Title of host publicationProceedings of the 32nd International Joint Conference on Artificial Intelligence, IJCAI 2023
EditorsEdith Elkind
PublisherInternational Joint Conferences on Artificial Intelligence
Pages5113-5121
Number of pages9
ISBN (Electronic)9781956792034
StatePublished - 2023
Externally publishedYes
Event32nd International Joint Conference on Artificial Intelligence, IJCAI 2023 - Macao, China
Duration: Aug 19 2023Aug 25 2023

Publication series

NameIJCAI International Joint Conference on Artificial Intelligence
Volume2023-August
ISSN (Print)1045-0823

Conference

Conference32nd International Joint Conference on Artificial Intelligence, IJCAI 2023
Country/TerritoryChina
CityMacao
Period8/19/238/25/23

Bibliographical note

Publisher Copyright:
© 2023 International Joint Conferences on Artificial Intelligence. All rights reserved.

Fingerprint

Dive into the research topics of 'Towards Lossless Head Pruning through Automatic Peer Distillation for Language Models'. Together they form a unique fingerprint.

Cite this