Shallow Parsing for Nepal Bhasa Complement Clauses

Borui Zhang, Abe Kazemzadeh, Brian Reese

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Accelerating the process of data collection, annotation, and analysis is an urgent need for linguistic fieldwork and documentation of endangered languages (Bird, 2009). Our experiments describe how we maximize the quality for the Nepal Bhasa syntactic complement structure chunking model. Native speaker language consultants were trained to annotate a minimally selected raw data set (Suarez et al, 2019). Embedded clauses, matrix verbs, and embedded verbs were annotated. We apply both statistical training algorithms and transfer learning in our training, including Naive Bayes, MaxEnt, and fine-tuning the pre-trained mBERT model (Devlin et al, 2018). We show that with limited annotated data, the model is already sufficient for the task. The modeling resources we used are largely available for many other endangered languages. The practice is easy to duplicate for training a shallow parser for other endangered languages.

Original languageEnglish (US)
Title of host publicationCOMPUTEL 2022 - 5th Workshop on the Use of Computational Methods in the Study of Endangered Languages, Proceedings of the Workshop
EditorsSarah Moeller, Antonios Anastasopoulos, Antti Arppe, Aditi Chaudhary, Atticus Harrigan, Josh Holden, Jordan Lachler, Alexis Palmer, Shruti Rijhwani, Lane Schwartz
PublisherAssociation for Computational Linguistics (ACL)
Pages61-67
Number of pages7
ISBN (Electronic)9781955917308
StatePublished - 2022
Event5th Workshop on the Use of Computational Methods in the Study of Endangered Languages, COMPUTEL 2022 - Dublin, Ireland
Duration: May 26 2022May 27 2022

Publication series

NameCOMPUTEL 2022 - 5th Workshop on the Use of Computational Methods in the Study of Endangered Languages, Proceedings of the Workshop

Conference

Conference5th Workshop on the Use of Computational Methods in the Study of Endangered Languages, COMPUTEL 2022
Country/TerritoryIreland
CityDublin
Period5/26/225/27/22

Bibliographical note

Funding Information:
We thank our Nepal Bhasa native speaker consultants for their time and efforts with providing us the annotation help.

Publisher Copyright:
© 2022 Association for Computational Linguistics.

Fingerprint

Dive into the research topics of 'Shallow Parsing for Nepal Bhasa Complement Clauses'. Together they form a unique fingerprint.

Cite this