TY - GEN
T1 - Cross-language domain adaptation for classifying crisis-related short messages
AU - Imran, Muhammad
AU - Mitra, Prasenjit
AU - Srivastava, Jaideep
PY - 2016
Y1 - 2016
N2 - Rapid crisis response requires real-time analysis of messages. After a disaster happens, volunteers attempt to classify tweets to determine needs, e.g., supplies, infrastructure damage, etc. Given labeled data, supervised machine learning can help classify these messages. Scarcity of labeled data causes poor performance in machine training. Can we reuse old tweets to train classifiers? How can we choose labeled tweets for training? Specifically, we study the usefulness of labeled data of past events. Do labeled tweets in different language help? We observe the performance of our classifiers trained using different combinations of training sets obtained from past disasters. We perform extensive experimentation on real crisis datasets and show that the past labels are useful when both source and target events are of the same type (e.g. both earthquakes). For similar languages (e.g., Italian and Spanish), cross-language domain adaptation was useful, however, when for different languages (e.g., Italian and English), the performance decreased.
AB - Rapid crisis response requires real-time analysis of messages. After a disaster happens, volunteers attempt to classify tweets to determine needs, e.g., supplies, infrastructure damage, etc. Given labeled data, supervised machine learning can help classify these messages. Scarcity of labeled data causes poor performance in machine training. Can we reuse old tweets to train classifiers? How can we choose labeled tweets for training? Specifically, we study the usefulness of labeled data of past events. Do labeled tweets in different language help? We observe the performance of our classifiers trained using different combinations of training sets obtained from past disasters. We perform extensive experimentation on real crisis datasets and show that the past labels are useful when both source and target events are of the same type (e.g. both earthquakes). For similar languages (e.g., Italian and Spanish), cross-language domain adaptation was useful, however, when for different languages (e.g., Italian and English), the performance decreased.
KW - Domain adaptation
KW - Social media
KW - Tweets classification
UR - http://www.scopus.com/inward/record.url?scp=85015720266&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85015720266&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85015720266
T3 - Proceedings of the International ISCRAM Conference
BT - ISCRAM 2016 Conference Proceedings - 13th International Conference on Information Systems for Crisis Response and Management
A2 - Antunes, Pedro
A2 - Banuls Silvera, Victor Amadeo
A2 - Porto de Albuquerque, Joao
A2 - Moore, Kathleen Ann
A2 - Tapia, Andrea H.
PB - Information Systems for Crisis Response and Management, ISCRAM
T2 - 13th International Conference on Information Systems for Crisis Response and Management, ISCRAM 2016
Y2 - 22 May 2016 through 25 May 2016
ER -