Automated non-alphanumeric symbol resolution in clinical texts.

Sung Rim Moon, Serguei Pakhomov, James Ryan, Genevieve B. Melton

Research output: Contribution to journalArticlepeer-review

1 Scopus citations


Although clinical texts contain many symbols, relatively little attention has been given to symbol resolution by medical natural language processing (NLP) researchers. Interpreting the meaning of symbols may be viewed as a special case of Word Sense Disambiguation (WSD). One thousand instances of four common non-alphanumeric symbols ('+', '-', '/', and '#') were randomly extracted from a clinical document repository and annotated by experts. The symbols and their surrounding context, in addition to bag-of-Words (BoW), and heuristic rules were evaluated as features for the following classifiers: Naïve Bayes, Support Vector Machine, and Decision Tree, using 10-fold cross-validation. Accuracies for '+', '-', '/', and '#' were 80.11%, 80.22%, 90.44%, and 95.00% respectively, with Naïve Bayes. While symbol context contributed the most, BoW was also helpful for disambiguation of some symbols. Symbol disambiguation with supervised techniques can be implemented with reasonable accuracy as a module for medical NLP systems.

Original languageEnglish (US)
Pages (from-to)979-986
Number of pages8
JournalAMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium
StatePublished - 2011


Dive into the research topics of 'Automated non-alphanumeric symbol resolution in clinical texts.'. Together they form a unique fingerprint.

Cite this