Abstract
Although distributed learning has increasingly gained attention in terms of effectively utilizing local devices for data privacy enhancement, recent studies show that publicly shared gradients in the training process can reveal the private training data (gradient leakage) to a third party. However, so far there hasn't been any systematic study of the gradient leakage mechanism of the Transformer based language models. In this paper, as the first attempt, we formulate the gradient attack problem on the Transformer-based language models and propose a gradient attack algorithm, TAG, to recover the local training data. Experimental results on Transformer, TinyBERT4, TinyBERT6, BERTBASE, and BERTLARGE using GLUE benchmark show that compared with DLG (Zhu et al., 2019), TAG works well on more weight distributions in recovering private training data and achieves 1.5× Recover Rate and 2.5× ROUGE-2 over prior methods without the need of ground truth label. TAG can obtain up to 88.9% tokens and up to 0.93 cosine similarity in token embeddings from private training data by attacking gradients on CoLA dataset. In addition, TAG is stronger than previous approaches on larger models, smaller dictionary size, and smaller input length.
Original language | English (US) |
---|---|
Title of host publication | Findings of the Association for Computational Linguistics, Findings of ACL |
Subtitle of host publication | EMNLP 2021 |
Editors | Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-Tau Yih |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 3600-3610 |
Number of pages | 11 |
ISBN (Electronic) | 9781955917100 |
State | Published - 2021 |
Externally published | Yes |
Event | 2021 Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021 - Punta Cana, Dominican Republic Duration: Nov 7 2021 → Nov 11 2021 |
Publication series
Name | Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021 |
---|
Conference
Conference | 2021 Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021 |
---|---|
Country/Territory | Dominican Republic |
City | Punta Cana |
Period | 11/7/21 → 11/11/21 |
Bibliographical note
Publisher Copyright:© 2021 Association for Computational Linguistics.