Towards Comprehensive Clinical Abbreviation Disambiguation Using Machine-Labeled Training Data

Gregory P Finley, Serguei V.S. Pakhomov, Reed McEwan, Genevieve B. Melton

Research output: Contribution to journalArticlepeer-review

25 Scopus citations


Abbreviation disambiguation in clinical texts is a problem handled well by fully supervised machine learning methods. Acquiring training data, however, is expensive and would be impractical for large numbers of abbreviations in specialized corpora. An alternative is a semi-supervised approach, in which training data are automatically generated by substituting long forms in natural text with their corresponding abbreviations. Most prior implementations of this method either focus on very few abbreviations or do not test on real-world data. We present a realistic use case by testing several semi-supervised classification algorithms on a large hand-annotated medical record of occurrences of 74 ambiguous abbreviations. Despite notable differences between training and test corpora, classifiers achieve up to 90% accuracy. Our tests demonstrate that semi-supervised abbreviation disambiguation is a viable and extensible option for medical NLP systems.

Original languageEnglish (US)
Pages (from-to)560-569
Number of pages10
JournalAMIA ... Annual Symposium proceedings. AMIA Symposium
StatePublished - 2016


Dive into the research topics of 'Towards Comprehensive Clinical Abbreviation Disambiguation Using Machine-Labeled Training Data'. Together they form a unique fingerprint.

Cite this