Abstract
Social media, especially Twitter, is being increasingly used for research with predictive analytics. In social media studies, natural language processing (NLP) techniques are used in conjunction with expert-based, manual and qualitative analyses. However, social media data are unstructured and must undergo complex manipulation for research use. The manual annotation is the most resource and time-consuming process that multiple expert raters have to reach consensus on every item, but is essential to create gold-standard datasets for training NLP-based machine learning classifiers. To reduce the burden of the manual annotation, yet maintaining its reliability, we devised a crowdsourcing pipeline combined with active learning strategies. We demonstrated its effectiveness through a case study that identifies job loss events from individual tweets. We used Amazon Mechanical Turk platform to recruit annotators from the Internet and designed a number of quality control measures to assure annotation accuracy. We evaluated 4 different active learning strategies (i.e., least confident, entropy, vote entropy, and Kullback-Leibler divergence). The active learning strategies aim at reducing the number of tweets needed to reach a desired performance of automated classification. Results show that crowdsourcing is useful to create high-quality annotations and active learning helps in reducing the number of required tweets, although there was no substantial difference among the strategies tested.
| Original language | English (US) |
|---|---|
| Title of host publication | Trends in Artificial Intelligence Theory and Applications. Artificial Intelligence Practices - 33rd International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2020, Proceedings |
| Editors | Hamido Fujita, Jun Sasaki, Philippe Fournier-Viger, Moonis Ali |
| Publisher | Springer Science and Business Media Deutschland GmbH |
| Pages | 333-344 |
| Number of pages | 12 |
| ISBN (Print) | 9783030557881 |
| DOIs | |
| State | Published - 2020 |
| Event | 33rd International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2020 - Kitakyushu, Japan Duration: Sep 22 2020 → Sep 25 2020 |
Publication series
| Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
|---|---|
| Volume | 12144 LNAI |
| ISSN (Print) | 0302-9743 |
| ISSN (Electronic) | 1611-3349 |
Conference
| Conference | 33rd International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2020 |
|---|---|
| Country/Territory | Japan |
| City | Kitakyushu |
| Period | 9/22/20 → 9/25/20 |
Bibliographical note
Funding Information:This study was supported by NSF Award #1734134.
Publisher Copyright:
© Springer Nature Switzerland AG 2020.
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 4 Quality Education
Keywords
- Active learning
- Crowdsourcing
- Social media
Fingerprint
Dive into the research topics of 'Integrating crowdsourcing and active learning for classification of work-life events from tweets'. Together they form a unique fingerprint.Cite this
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS