Background: Patient-based similarity metrics are important case-based reasoning tools which may assist with research and patient care applications. Ontology and information content principles may be potentially helpful tools for similarity metric development. Methods: Patient cases from 1989 through 2003 from the Columbia University Medical Center data repository were converted to SNOMED CT concepts. Five metrics were implemented: (1) percent disagreement with data as an unstructured "bag of findings," (2) average links between concepts, (3) links weighted by information content with descendants, (4) links weighted by information content with term prevalence, and (5) path distance using descendants weighted by information content with descendants. Three physicians served as gold standard for 30 cases. Results: Expert inter-rater reliability was 0.91, with rank correlations between 0.61 and 0.81, representing upper-bound performance. Expert performance compared to metrics resulted in correlations of 0.27, 0.29, 0.30, 0.30, and 0.30, respectively. Using SNOMED axis Clinical Findings alone increased correlation to 0.37. Conclusion: Ontology principles and information content provide useful information for similarity metrics but currently fall short of expert performance.
Bibliographical noteFunding Information:
This work was funded by National Library of Medicine (NLM) “Discovering and applying knowledge in clinical databases” (R01 LM06910). Drs. Melton, Morrison, and Rothschild were supported by NLM Training Grant (5T15LM007079-12). We thank Carol Friedman for the use of MedLEE (NLM support R01 LM06274 and R01 LM07659).
- Electronic medical records
- Information content
- Natural language processing
- Similarity metrics