TY - JOUR
T1 - Towards Comprehensive Clinical Abbreviation Disambiguation Using Machine-Labeled Training Data
AU - Finley, Gregory P
AU - Pakhomov, Serguei V.S.
AU - McEwan, Reed
AU - Melton, Genevieve B.
PY - 2016
Y1 - 2016
N2 - Abbreviation disambiguation in clinical texts is a problem handled well by fully supervised machine learning methods. Acquiring training data, however, is expensive and would be impractical for large numbers of abbreviations in specialized corpora. An alternative is a semi-supervised approach, in which training data are automatically generated by substituting long forms in natural text with their corresponding abbreviations. Most prior implementations of this method either focus on very few abbreviations or do not test on real-world data. We present a realistic use case by testing several semi-supervised classification algorithms on a large hand-annotated medical record of occurrences of 74 ambiguous abbreviations. Despite notable differences between training and test corpora, classifiers achieve up to 90% accuracy. Our tests demonstrate that semi-supervised abbreviation disambiguation is a viable and extensible option for medical NLP systems.
AB - Abbreviation disambiguation in clinical texts is a problem handled well by fully supervised machine learning methods. Acquiring training data, however, is expensive and would be impractical for large numbers of abbreviations in specialized corpora. An alternative is a semi-supervised approach, in which training data are automatically generated by substituting long forms in natural text with their corresponding abbreviations. Most prior implementations of this method either focus on very few abbreviations or do not test on real-world data. We present a realistic use case by testing several semi-supervised classification algorithms on a large hand-annotated medical record of occurrences of 74 ambiguous abbreviations. Despite notable differences between training and test corpora, classifiers achieve up to 90% accuracy. Our tests demonstrate that semi-supervised abbreviation disambiguation is a viable and extensible option for medical NLP systems.
UR - http://www.scopus.com/inward/record.url?scp=85026685803&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85026685803&partnerID=8YFLogxK
M3 - Article
C2 - 28269852
AN - SCOPUS:85026685803
SN - 1559-4076
VL - 2016
SP - 560
EP - 569
JO - AMIA ... Annual Symposium proceedings. AMIA Symposium
JF - AMIA ... Annual Symposium proceedings. AMIA Symposium
ER -