Health support forums have become a rich source of data that can be used to improve health care outcomes. A user profile, including information such as age and gender, can support targeted analysis of forum data. But users might not always disclose their age and gender. It is desirable then to be able to automatically extract this information from users' content. However, to the best of our knowledge there is no such resource for author profiling of health forum data. Here we present a large corpus, with close to 85,000 users, for profiling and also outline our approach and benchmark results to automatically detect a user's age and gender from their forum posts. We use a mix of features from a user's text as well as forum specific features to obtain accuracy well above the baseline, thus showing that both our dataset and our method are useful and valid.
|Original language||English (US)|
|Title of host publication||Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016|
|Editors||Nicoletta Calzolari, Khalid Choukri, Helene Mazo, Asuncion Moreno, Thierry Declerck, Sara Goggi, Marko Grobelnik, Jan Odijk, Stelios Piperidis, Bente Maegaard, Joseph Mariani|
|Publisher||European Language Resources Association (ELRA)|
|Number of pages||8|
|State||Published - 2016|
|Event||10th International Conference on Language Resources and Evaluation, LREC 2016 - Portoroz, Slovenia|
Duration: May 23 2016 → May 28 2016
|Name||Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016|
|Other||10th International Conference on Language Resources and Evaluation, LREC 2016|
|Period||5/23/16 → 5/28/16|
Bibliographical noteFunding Information:
This project was partially supported by NSF award No. 1462141.
- Author profiling
- Medical forums