Abstract
Conventional wisdom in pruning Transformer-based language models is that pruning reduces the model expressiveness and thus is more likely to underfit rather than overfit. However, under the trending pretrain-and-finetune paradigm, we postulate a counter-traditional hypothesis, that is: pruning increases the risk of overfitting when performed at the fine-tuning phase. In this paper, we aim to address the overfitting problem and improve pruning performance via progressive knowledge distillation with error-bound properties. We show for the first time that reducing the risk of overfitting can help the effectiveness of pruning under the pretrain-and-finetune paradigm. Ablation studies and experiments on the GLUE benchmark show that our method outperforms the leading competitors across different tasks.
| Original language | English (US) |
|---|---|
| Title of host publication | ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers) |
| Editors | Smaranda Muresan, Preslav Nakov, Aline Villavicencio |
| Publisher | Association for Computational Linguistics (ACL) |
| Pages | 190-200 |
| Number of pages | 11 |
| ISBN (Electronic) | 9781955917216 |
| DOIs | |
| State | Published - 2022 |
| Externally published | Yes |
| Event | 60th Annual Meeting of the Association for Computational Linguistics, ACL 2022 - Dublin, Ireland Duration: May 22 2022 → May 27 2022 |
Publication series
| Name | Proceedings of the Annual Meeting of the Association for Computational Linguistics |
|---|---|
| Volume | 1 |
| ISSN (Print) | 0736-587X |
Conference
| Conference | 60th Annual Meeting of the Association for Computational Linguistics, ACL 2022 |
|---|---|
| Country/Territory | Ireland |
| City | Dublin |
| Period | 5/22/22 → 5/27/22 |
Bibliographical note
Publisher Copyright:© 2022 Association for Computational Linguistics.