TY - JOUR
T1 - Automated Extraction of Substance Use Information from Clinical Texts
AU - Wang, Yan
AU - Chen, Elizabeth S.
AU - Pakhomov, Serguei
AU - Arsoniadis, Elliot
AU - Carter, Elizabeth W.
AU - Lindemann, Elizabeth
AU - Sarkar, Indra Neil
AU - Melton, Genevieve B.
PY - 2015
Y1 - 2015
N2 - Within clinical discourse, social history (SH) includes important information about substance use (alcohol, drug, and nicotine use) as key risk factors for disease, disability, and mortality. In this study, we developed and evaluated a natural language processing (NLP) system for automated detection of substance use statements and extraction of substance use attributes (e.g., temporal and status) based on Stanford Typed Dependencies. The developed NLP system leveraged linguistic resources and domain knowledge from a multi-site social history study, Propbank and the MiPACQ corpus. The system attained F-scores of 89.8, 84.6 and 89.4 respectively for alcohol, drug, and nicotine use statement detection, as well as average F-scores of 82.1, 90.3, 80.8, 88.7, 96.6, and 74.5 respectively for extraction of attributes. Our results suggest that NLP systems can achieve good performance when augmented with linguistic resources and domain knowledge when applied to a wide breadth of substance use free text clinical notes.
AB - Within clinical discourse, social history (SH) includes important information about substance use (alcohol, drug, and nicotine use) as key risk factors for disease, disability, and mortality. In this study, we developed and evaluated a natural language processing (NLP) system for automated detection of substance use statements and extraction of substance use attributes (e.g., temporal and status) based on Stanford Typed Dependencies. The developed NLP system leveraged linguistic resources and domain knowledge from a multi-site social history study, Propbank and the MiPACQ corpus. The system attained F-scores of 89.8, 84.6 and 89.4 respectively for alcohol, drug, and nicotine use statement detection, as well as average F-scores of 82.1, 90.3, 80.8, 88.7, 96.6, and 74.5 respectively for extraction of attributes. Our results suggest that NLP systems can achieve good performance when augmented with linguistic resources and domain knowledge when applied to a wide breadth of substance use free text clinical notes.
UR - http://www.scopus.com/inward/record.url?scp=85021683928&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85021683928&partnerID=8YFLogxK
M3 - Article
C2 - 26958312
AN - SCOPUS:85021683928
SN - 1559-4076
VL - 2015
SP - 2121
EP - 2130
JO - AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium
JF - AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium
ER -