Abstract
Natural language models often fall short when understanding and generating mathematical notation. What is not clear is whether these shortcomings are due to fundamental limitations of the models, or the absence of appropriate tasks. In this paper, we explore the extent to which natural language models can learn semantics between mathematical notation and their surrounding text. We propose two notation prediction tasks, and train a model that selectively masks notation tokens and encodes left and/or right sentences as context. Compared to baseline models trained by masked language modeling, our method achieved significantly better performance at the two tasks, showing that this approach is a good first step towards modeling mathematical texts. However, the current models rarely predict unseen symbols correctly, and token-level predictions are more accurate than symbol-level predictions, indicating more work is needed to represent structural patterns. Based on the results, we suggest future works toward modeling mathematical texts.
Original language | English (US) |
---|---|
Title of host publication | Findings of the Association for Computational Linguistics, Findings of ACL |
Subtitle of host publication | EMNLP 2021 |
Editors | Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-Tau Yih |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 3102-3115 |
Number of pages | 14 |
ISBN (Electronic) | 9781955917100 |
State | Published - 2021 |
Event | 2021 Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021 - Punta Cana, Dominican Republic Duration: Nov 7 2021 → Nov 11 2021 |
Publication series
Name | Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021 |
---|
Conference
Conference | 2021 Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021 |
---|---|
Country/Territory | Dominican Republic |
City | Punta Cana |
Period | 11/7/21 → 11/11/21 |
Bibliographical note
Funding Information:usage across papers. This research was supported in part by funding Guidance for future work: One way to enhance from the Alfred P. Sloan Foundation.
Publisher Copyright:
© 2021 Association for Computational Linguistics.