TY - JOUR
T1 - Electronic Health Record Phenotypes for Precision Medicine
T2 - Perspectives and Caveats From Treatment of Breast Cancer at a Single Institution
AU - Breitenstein, Matthew K.
AU - Liu, Hongfang
AU - Maxwell, Kara N.
AU - Pathak, Jyotishman
AU - Zhang, Rui
N1 - Publisher Copyright:
© 2017 The Authors. Clinical and Translational Science published by Wiley Periodicals, Inc. on behalf of American Society for Clinical Pharmacology and Therapeutics.
PY - 2018/1
Y1 - 2018/1
N2 - Precision medicine is at the forefront of biomedical research. Cancer registries provide rich perspectives and electronic health records (EHRs) are commonly utilized to gather additional clinical data elements needed for translational research. However, manual annotation is resource-intense and not readily scalable. Informatics-based phenotyping presents an ideal solution, but perspectives obtained can be impacted by both data source and algorithm selection. We derived breast cancer (BC) receptor status phenotypes from structured and unstructured EHR data using rule-based algorithms, including natural language processing (NLP). Overall, the use of NLP increased BC receptor status coverage by 39.2% from 69.1% with structured medication information alone. Using all available EHR data, estrogen receptor-positive BC cases were ascertained with high precision (P = 0.976) and recall (R = 0.987) compared with gold standard chart-reviewed patients. However, status negation (R = 0.591) decreased 40.2% when relying on structured medications alone. Using multiple EHR data types (and thorough understanding of the perspectives offered) are necessary to derive robust EHR-based precision medicine phenotypes.
AB - Precision medicine is at the forefront of biomedical research. Cancer registries provide rich perspectives and electronic health records (EHRs) are commonly utilized to gather additional clinical data elements needed for translational research. However, manual annotation is resource-intense and not readily scalable. Informatics-based phenotyping presents an ideal solution, but perspectives obtained can be impacted by both data source and algorithm selection. We derived breast cancer (BC) receptor status phenotypes from structured and unstructured EHR data using rule-based algorithms, including natural language processing (NLP). Overall, the use of NLP increased BC receptor status coverage by 39.2% from 69.1% with structured medication information alone. Using all available EHR data, estrogen receptor-positive BC cases were ascertained with high precision (P = 0.976) and recall (R = 0.987) compared with gold standard chart-reviewed patients. However, status negation (R = 0.591) decreased 40.2% when relying on structured medications alone. Using multiple EHR data types (and thorough understanding of the perspectives offered) are necessary to derive robust EHR-based precision medicine phenotypes.
UR - http://www.scopus.com/inward/record.url?scp=85040248565&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85040248565&partnerID=8YFLogxK
U2 - 10.1111/cts.12514
DO - 10.1111/cts.12514
M3 - Article
C2 - 29084368
AN - SCOPUS:85040248565
SN - 1752-8054
VL - 11
SP - 85
EP - 92
JO - Clinical and translational science
JF - Clinical and translational science
IS - 1
ER -