Abstract
Measures of semantic similarity between concepts are widely used in Natural Language Processing. In this article, we show how six existing domain-independent measures can be adapted to the biomedical domain. These measures were originally based on WordNet, an English lexical database of concepts and relations. In this research, we adapt these measures to the SNOMED-CT® ontology of medical concepts. The measures include two path-based measures, and three measures that augment path-based measures with information content statistics from corpora. We also derive a context vector measure based on medical corpora that can be used as a measure of semantic relatedness. These six measures are evaluated against a newly created test bed of 30 medical concept pairs scored by three physicians and nine medical coders. We find that the medical coders and physicians differ in their ratings, and that the context vector measure correlates most closely with the physicians, while the path-based measures and one of the information content measures correlates most closely with the medical coders. We conclude that there is a role both for more flexible measures of relatedness based on information derived from corpora, as well as for measures that rely on existing ontological structures.
Original language | English (US) |
---|---|
Pages (from-to) | 288-299 |
Number of pages | 12 |
Journal | Journal of Biomedical Informatics |
Volume | 40 |
Issue number | 3 |
DOIs | |
State | Published - Jun 2007 |
Bibliographical note
Funding Information:We thank the Mayo Clinic Medical Index staff as well as Dr. Alexander Ruggieri, Dr. Peter Kent and Dr. Auethavekiat Paranee for their contribution to the annotation of the test pairs. Dr. Ted Pedersen’s role in this work has been partially supported by a National Science Foundation Faculty Early CAREER Development Award (#0092784). This work was also supported by the NLM Training Grant in Medical Informatics (T15 LM07041-19) and the NIH Roadmap Multidisciplinary Clinical Research Career Development Award Grant (K12/NICHD)-HD49078.
Keywords
- Context vectors
- Information content
- Path based measures
- SNOMED-CT
- Semantic similarity
Fingerprint
Dive into the research topics of 'Measures of semantic similarity and relatedness in the biomedical domain'. Together they form a unique fingerprint.Datasets
-
Semantic Relatedness and Similarity Reference Standards for Medical Terms
Pakhomov, S., Data Repository for the University of Minnesota, 2018
DOI: 10.13020/D6CX04, http://hdl.handle.net/11299/196265
Dataset