Age and gender prediction on health forum data

Prasha Shrestha, Steven Bethard, Ted Pedersen, Nicolas Rey-Villamizar, Farig Sadeque, Thamar Solorio

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Scopus citations

Abstract

Health support forums have become a rich source of data that can be used to improve health care outcomes. A user profile, including information such as age and gender, can support targeted analysis of forum data. But users might not always disclose their age and gender. It is desirable then to be able to automatically extract this information from users' content. However, to the best of our knowledge there is no such resource for author profiling of health forum data. Here we present a large corpus, with close to 85,000 users, for profiling and also outline our approach and benchmark results to automatically detect a user's age and gender from their forum posts. We use a mix of features from a user's text as well as forum specific features to obtain accuracy well above the baseline, thus showing that both our dataset and our method are useful and valid.

Original languageEnglish (US)
Title of host publicationProceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016
EditorsNicoletta Calzolari, Khalid Choukri, Helene Mazo, Asuncion Moreno, Thierry Declerck, Sara Goggi, Marko Grobelnik, Jan Odijk, Stelios Piperidis, Bente Maegaard, Joseph Mariani
PublisherEuropean Language Resources Association (ELRA)
Pages3394-3401
Number of pages8
ISBN (Electronic)9782951740891
StatePublished - 2016
Externally publishedYes
Event10th International Conference on Language Resources and Evaluation, LREC 2016 - Portoroz, Slovenia
Duration: May 23 2016May 28 2016

Publication series

NameProceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016

Other

Other10th International Conference on Language Resources and Evaluation, LREC 2016
CountrySlovenia
CityPortoroz
Period5/23/165/28/16

Bibliographical note

Funding Information:
This project was partially supported by NSF award No. 1462141.

Keywords

  • Age
  • Author profiling
  • Gender
  • Medical forums

Fingerprint Dive into the research topics of 'Age and gender prediction on health forum data'. Together they form a unique fingerprint.

Cite this