Automated Extraction of Substance Use Information from Clinical Texts

Yan Wang, Elizabeth S. Chen, Serguei Pakhomov, Elliot Arsoniadis, Elizabeth W. Carter, Elizabeth Lindemann, Indra Neil Sarkar, Genevieve B. Melton

Research output: Contribution to journalArticlepeer-review

24 Scopus citations


Within clinical discourse, social history (SH) includes important information about substance use (alcohol, drug, and nicotine use) as key risk factors for disease, disability, and mortality. In this study, we developed and evaluated a natural language processing (NLP) system for automated detection of substance use statements and extraction of substance use attributes (e.g., temporal and status) based on Stanford Typed Dependencies. The developed NLP system leveraged linguistic resources and domain knowledge from a multi-site social history study, Propbank and the MiPACQ corpus. The system attained F-scores of 89.8, 84.6 and 89.4 respectively for alcohol, drug, and nicotine use statement detection, as well as average F-scores of 82.1, 90.3, 80.8, 88.7, 96.6, and 74.5 respectively for extraction of attributes. Our results suggest that NLP systems can achieve good performance when augmented with linguistic resources and domain knowledge when applied to a wide breadth of substance use free text clinical notes.

Original languageEnglish (US)
Pages (from-to)2121-2130
Number of pages10
JournalAMIA ... Annual Symposium proceedings. AMIA Symposium
StatePublished - 2015


Dive into the research topics of 'Automated Extraction of Substance Use Information from Clinical Texts'. Together they form a unique fingerprint.

Cite this