Abstract
Transformer-based deep learning models have become a ubiquitous vehicle to drive a variety of Natural Language Processing (NLP) related tasks beyond their accuracy ceiling. However, these models also suffer from two pronounced challenges, that is, gigantic model size and prolonged turnaround time. To this end, we introduce E.T.That rE-Thinks self-Attention computation for Transformer models on GPUs with the following contributions: First, we introduce a novel self-Attention architecture, which encompasses two tailored self-Attention operators with corresponding sequence length-Aware optimizations, and operation reordering optimizations. Second, we present an attention-Aware pruning design which judiciously uses various pruning algorithms to reduce more computations hence achieves significantly shorter turnaround time. For the pruning algorithms, we not only revamp the existing pruning algorithms, but also tailor new ones for transformer models. Taken together, we evaluate E.T. across a variety of benchmarks for Transformer, BERTBASE and DistilBERT, where E.T. presents superior performance over the mainstream projects, including the popular Nvidia Enterprise solutions, i.e., TensorRT and FasterTransformer.
Original language | English (US) |
---|---|
Title of host publication | Proceedings of SC 2021 |
Subtitle of host publication | The International Conference for High Performance Computing, Networking, Storage and Analysis: Science and Beyond |
Publisher | IEEE Computer Society |
ISBN (Electronic) | 9781450384421 |
DOIs | |
State | Published - Nov 14 2021 |
Externally published | Yes |
Event | 33rd International Conference for High Performance Computing, Networking, Storage and Analysis: Science and Beyond, SC 2021 - Virtual, Online, United States Duration: Nov 14 2021 → Nov 19 2021 |
Publication series
Name | International Conference for High Performance Computing, Networking, Storage and Analysis, SC |
---|---|
ISSN (Print) | 2167-4329 |
ISSN (Electronic) | 2167-4337 |
Conference
Conference | 33rd International Conference for High Performance Computing, Networking, Storage and Analysis: Science and Beyond, SC 2021 |
---|---|
Country/Territory | United States |
City | Virtual, Online |
Period | 11/14/21 → 11/19/21 |
Bibliographical note
Publisher Copyright:© 2021 IEEE Computer Society. All rights reserved.