Abstract
Social media has become a crucial open-access platform for individuals to express opinions and share experiences. However, leveraging low-resource language data from Twitter is challenging due to scarce, poor-quality content and the major variations in language use, such as slang and code-switching. Identifying tweets in these languages can be difficult as Twitter primarily supports high-resource languages. We analyze Kenyan code-switched data and evaluate four state-of-the-art (SOTA) transformer-based pretrained models for sentiment and emotion classification, using supervised and semi-supervised methods. We detail the methodology behind data collection and annotation, and the challenges encountered during the data cu-ration phase. Our results show that XLM-R outperforms other models; for sentiment analysis, XLM-R supervised model achieves the highest accuracy (69.2%) and F1 score (66.1%), XLM-R semi-supervised (67.2% accuracy, 64.1% F1 score). In emotion analysis, DistilBERT supervised leads in accuracy (59.8%) and F1 score (31%), mBERT semi-supervised (accuracy (59% and F1 score 26.5%). AfriBERTa models show the lowest accuracy and F1 scores. All models tend to predict neutral sentiment, with Afri-BERT showing the highest bias and unique sensitivity to empathy emotion.
| Original language | English (US) |
|---|---|
| Title of host publication | WASSA 2024 - 14th Workshop on Computational Approaches to Subjectivity, Sentiment, and Social Media Analysis, Proceedings of the Workshop |
| Editors | Orphee De Clercq, Valentin Barriere, Jeremy Barnes, Roman Klinger, Joao Sedoc, Shabnam Tafreshi |
| Publisher | Association for Computational Linguistics (ACL) |
| Pages | 234-249 |
| Number of pages | 16 |
| ISBN (Electronic) | 9798891761568 |
| State | Published - 2024 |
| Externally published | Yes |
| Event | 14th Workshop on Computational Approaches to Subjectivity, Sentiment, and Social Media Analysis, WASSA 2024 - Bangkok, Thailand Duration: Aug 15 2024 → … |
Publication series
| Name | WASSA 2024 - 14th Workshop on Computational Approaches to Subjectivity, Sentiment, and Social Media Analysis, Proceedings of the Workshop |
|---|
Conference
| Conference | 14th Workshop on Computational Approaches to Subjectivity, Sentiment, and Social Media Analysis, WASSA 2024 |
|---|---|
| Country/Territory | Thailand |
| City | Bangkok |
| Period | 8/15/24 → … |
Bibliographical note
Publisher Copyright:© 2024 Association for Computational Linguistics.
Fingerprint
Dive into the research topics of 'RideKE: Leveraging Low-Resource, User-Generated Twitter Content for Sentiment and Emotion Detection in Kenyaǹ Code-Switched Dataset'. Together they form a unique fingerprint.Cite this
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS