Skip to main navigation Skip to search Skip to main content

RideKE: Leveraging Low-Resource, User-Generated Twitter Content for Sentiment and Emotion Detection in Kenyaǹ Code-Switched Dataset

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Social media has become a crucial open-access platform for individuals to express opinions and share experiences. However, leveraging low-resource language data from Twitter is challenging due to scarce, poor-quality content and the major variations in language use, such as slang and code-switching. Identifying tweets in these languages can be difficult as Twitter primarily supports high-resource languages. We analyze Kenyan code-switched data and evaluate four state-of-the-art (SOTA) transformer-based pretrained models for sentiment and emotion classification, using supervised and semi-supervised methods. We detail the methodology behind data collection and annotation, and the challenges encountered during the data cu-ration phase. Our results show that XLM-R outperforms other models; for sentiment analysis, XLM-R supervised model achieves the highest accuracy (69.2%) and F1 score (66.1%), XLM-R semi-supervised (67.2% accuracy, 64.1% F1 score). In emotion analysis, DistilBERT supervised leads in accuracy (59.8%) and F1 score (31%), mBERT semi-supervised (accuracy (59% and F1 score 26.5%). AfriBERTa models show the lowest accuracy and F1 scores. All models tend to predict neutral sentiment, with Afri-BERT showing the highest bias and unique sensitivity to empathy emotion.

Original languageEnglish (US)
Title of host publicationWASSA 2024 - 14th Workshop on Computational Approaches to Subjectivity, Sentiment, and Social Media Analysis, Proceedings of the Workshop
EditorsOrphee De Clercq, Valentin Barriere, Jeremy Barnes, Roman Klinger, Joao Sedoc, Shabnam Tafreshi
PublisherAssociation for Computational Linguistics (ACL)
Pages234-249
Number of pages16
ISBN (Electronic)9798891761568
StatePublished - 2024
Externally publishedYes
Event14th Workshop on Computational Approaches to Subjectivity, Sentiment, and Social Media Analysis, WASSA 2024 - Bangkok, Thailand
Duration: Aug 15 2024 → …

Publication series

NameWASSA 2024 - 14th Workshop on Computational Approaches to Subjectivity, Sentiment, and Social Media Analysis, Proceedings of the Workshop

Conference

Conference14th Workshop on Computational Approaches to Subjectivity, Sentiment, and Social Media Analysis, WASSA 2024
Country/TerritoryThailand
CityBangkok
Period8/15/24 → …

Bibliographical note

Publisher Copyright:
© 2024 Association for Computational Linguistics.

Fingerprint

Dive into the research topics of 'RideKE: Leveraging Low-Resource, User-Generated Twitter Content for Sentiment and Emotion Detection in Kenyaǹ Code-Switched Dataset'. Together they form a unique fingerprint.

Cite this