Skip to main navigation Skip to search Skip to main content

Accelerating Medical Record Data Abstraction and Analysis in Muscular Dystrophy

  • Huixue Zhou
  • , Geeta Rajamani
  • , Jiatan Huang
  • , Magali Jorand-Fletcher
  • , Yara Mohamed
  • , Kody A. DeGolier
  • , Annette Xenopoulos-Oddsson
  • , Erjia Cui
  • , Carla D. Zingariello
  • , Rui Zhang
  • , Peter B Kang

Research output: Contribution to journalArticlepeer-review

Abstract

Background and Objectives – Muscular dystrophies are characterized by progressive muscle weakness and degeneration. Identifying cases and abstracting data from electronic medical records (EMRs) is helpful for surveillance and research. However, manual EMR abstraction is laborious. We studied 2 approaches to accelerate EMR abstraction: large language models (LLMs) and International Classification of Diseases (ICD) code meta-analysis.Methods – In our cross-sectional study, EMRs from 22 individuals with Duchenne muscular dystrophy (DMD) and 22 individuals with limb-girdle muscular dystrophy (LGMD) were exported into a data shelter and manually annotated using MedTator. Annotations were guided by a schema focused on 4 key features of muscular dystrophy: first symptoms, ambulatory status, serum creatine kinase (CK) levels, and genetic test results. Five LLMs were fed a series of prompts and examples, and then, clinic notes from each of the 44 cases were inputted for model analysis. Inter-rater agreement (IAA) and F1 scores were calculated for manual annotations, and the F1 score for LLMs compared with manual annotations was calculated. We then analyzed a separate set of 77 DMD and 59 LGMD cases to determine whether the number of health care encounters with a muscular dystrophy–related ICD code could predict diagnostic certainty based on MD STARnet criteria.Results – IAA for manual annotations varied between 80% (for annotation of symptoms) and 100% (for CK values). The highest performing LLM was Llama 3-8b, which yielded the following accuracies: 46.8% for “first symptoms, ” 56.9% for “ambulatory status, ” 69.2% for “CK values, ” and 68.4% for “genetic test results.” Among 77 individuals with DMD, all patients with 20 or more encounters linked to relevant ICD codes had definite or probable diagnoses, whereas among 59 individuals with LGMD, all patients with 25 or more encounters linked to relevant ICD codes had definite or probable diagnoses.Discussion – LLMs promise to accelerate EMR abstraction for rare diseases such as muscular dystrophy, but F1 scores for LLMs currently lag manual abstractions for unstructured data. Llama 3-8b demonstrated superior performance to the 4 other models tested. Metadata such as ICD code counts may help prioritize high-yield cases for surveillance and research purposes.

Original languageEnglish (US)
Article numbere200542
Pages (from-to)1-9
Number of pages9
JournalNeurology: Clinical Practice
Volume15
Issue number6
DOIs
StatePublished - Dec 2025

Bibliographical note

Publisher Copyright:
© 2025 American Academy of Neurology

PubMed: MeSH publication types

  • Journal Article

Fingerprint

Dive into the research topics of 'Accelerating Medical Record Data Abstraction and Analysis in Muscular Dystrophy'. Together they form a unique fingerprint.

Cite this