TY - JOUR
T1 - A comparative study of supervised learning as applied to acronym expansion in clinical reports.
AU - Joshi, Mahesh
AU - Pakhomov, Serguei V
AU - Pedersen, Ted
AU - Chute, Christopher G.
PY - 2006
Y1 - 2006
N2 - Electronic medical records (EMR) constitute a valuable resource of patient specific information and are increasingly used for clinical practice and research. Acronyms present a challenge to retrieving information from the EMR because many acronyms are ambiguous with respect to their full form. In this paper we perform a comparative study of supervised acronym disambiguation in a corpus of clinical notes, using three machine learning algorithms: the naïve Bayes classifier, decision trees and Support Vector Machines (SVMs). Our training features include part-of-speech tags, unigrams and bigrams in the context of the ambiguous acronym. We find that the combination of these feature types results in consistently better accuracy than when they are used individually, regardless of the learning algorithm employed. The accuracy of all three methods when using all features consistently approaches or exceeds 90%, even when the baseline majority classifier is below 50%.
AB - Electronic medical records (EMR) constitute a valuable resource of patient specific information and are increasingly used for clinical practice and research. Acronyms present a challenge to retrieving information from the EMR because many acronyms are ambiguous with respect to their full form. In this paper we perform a comparative study of supervised acronym disambiguation in a corpus of clinical notes, using three machine learning algorithms: the naïve Bayes classifier, decision trees and Support Vector Machines (SVMs). Our training features include part-of-speech tags, unigrams and bigrams in the context of the ambiguous acronym. We find that the combination of these feature types results in consistently better accuracy than when they are used individually, regardless of the learning algorithm employed. The accuracy of all three methods when using all features consistently approaches or exceeds 90%, even when the baseline majority classifier is below 50%.
UR - http://www.scopus.com/inward/record.url?scp=34748874639&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34748874639&partnerID=8YFLogxK
M3 - Article
C2 - 17238371
AN - SCOPUS:34748874639
SN - 1559-4076
SP - 399
EP - 403
JO - AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium
JF - AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium
ER -