Modeling Mathematical Notation Semantics in Academic Papers

Hwiyeol Jo, Dongyeop Kang, Andrew Head, Marti A. Hearst

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Scopus citations

Abstract

Natural language models often fall short when understanding and generating mathematical notation. What is not clear is whether these shortcomings are due to fundamental limitations of the models, or the absence of appropriate tasks. In this paper, we explore the extent to which natural language models can learn semantics between mathematical notation and their surrounding text. We propose two notation prediction tasks, and train a model that selectively masks notation tokens and encodes left and/or right sentences as context. Compared to baseline models trained by masked language modeling, our method achieved significantly better performance at the two tasks, showing that this approach is a good first step towards modeling mathematical texts. However, the current models rarely predict unseen symbols correctly, and token-level predictions are more accurate than symbol-level predictions, indicating more work is needed to represent structural patterns. Based on the results, we suggest future works toward modeling mathematical texts.

Original languageEnglish (US)
Title of host publicationFindings of the Association for Computational Linguistics, Findings of ACL
Subtitle of host publicationEMNLP 2021
EditorsMarie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-Tau Yih
PublisherAssociation for Computational Linguistics (ACL)
Pages3102-3115
Number of pages14
ISBN (Electronic)9781955917100
StatePublished - 2021
Event2021 Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021 - Punta Cana, Dominican Republic
Duration: Nov 7 2021Nov 11 2021

Publication series

NameFindings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021

Conference

Conference2021 Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021
Country/TerritoryDominican Republic
CityPunta Cana
Period11/7/2111/11/21

Bibliographical note

Funding Information:
usage across papers. This research was supported in part by funding Guidance for future work: One way to enhance from the Alfred P. Sloan Foundation.

Publisher Copyright:
© 2021 Association for Computational Linguistics.

Fingerprint

Dive into the research topics of 'Modeling Mathematical Notation Semantics in Academic Papers'. Together they form a unique fingerprint.

Cite this