AI-driven synthetic data generation for accelerating hepatology research: A study of the United Network for Organ Sharing (UNOS) database

Joseph C. Ahn, Yung Kyun Noh, Mingzhao Hu, Xiaotong Shen, Douglas A. Simonetto, Patrick S. Kamath, Rohit Loomba, Vijay H. Shah

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Background and Aims: Clinical hepatology research often faces limited data availability, underrepresentation of minority groups, and complex data-sharing regulations. Synthetic data - artificially generated patient records designed to mirror real-world distributions - offers a potential solution. We hypothesized that diffusion models, a state-of-the-art generative technique, could produce synthetic liver transplant waitlist data from the United Network for Organ Sharing (UNOS) database that maintains statistical fidelity, replicates clinical correlations and survival patterns, and ensures robust privacy protection. Methods: Diffusion models were used to generate synthetic patient cohorts mirroring the UNOS liver transplant waitlist database between years 2019 and 2023. Statistical fidelity was assessed using Maximum Mean Discrepancy (MMD) and Wasserstein distance, correlation analysis, and variable-level metrics. Clinical utility was evaluated by comparing transplant-free survival via Kaplan-Meier curves and the MELD score performance. Privacy was quantified using the Distance to Closest Record (DCR) and attribute disclosure risk assessments. Results: The synthetic dataset was nearly indistinguishable from the original dataset (MMD=0.002, standardized Wasserstein distance<1.0), preserving clinically relevant correlations and survival patterns as evidenced by similar median survival times (110 vs. 101 days) and 5-year survival rates (22.2% vs. 22.8%). MELD-based 90-day mortality prediction was maintained (original AUC=0.839 vs. synthetic AUC=0.844). Privacy metrics indicated no identifiable patient matches, and mean DCR values ensured that synthetic individuals were not direct replicas of real patients. Conclusion: AI-generated synthetic data derived from diffusion models can faithfully replicate complex hepatology datasets, maintain key clinical signals, and ensure strong privacy safeguards. This approach can help address data scarcity, enhance model generalizability, foster multi-institutional collaboration, and accelerate progress in hepatology research.

Original languageEnglish (US)
Article number1299
JournalHepatology
DOIs
StateAccepted/In press - 2025

Bibliographical note

Publisher Copyright:
© 2025 American Association for the Study of Liver Diseases. Published by Wolters Kluwer Health, Inc.

Keywords

  • artificial intelligence
  • diffusion models
  • liver transplantation
  • privacy-preserving healthcare data
  • synthetic data

PubMed: MeSH publication types

  • Journal Article

Fingerprint

Dive into the research topics of 'AI-driven synthetic data generation for accelerating hepatology research: A study of the United Network for Organ Sharing (UNOS) database'. Together they form a unique fingerprint.

Cite this