Risk-Specific Training Cohorts to Address Class Imbalance in Surgical Risk Prediction

  • Jeremy A. Balch
  • , Matthew M. Ruppert
  • , Ziyuan Guan
  • , Timothy R. Buchanan
  • , Kenneth L. Abbott
  • , Benjamin Shickel
  • , Azra Bihorac
  • , Muxuan Liang
  • , Gilbert R. Upchurch
  • , Christopher J. Tignanelli
  • , Tyler J. Loftus

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Importance: Machine learning tools are increasingly deployed for risk prediction and clinical decision support in surgery. Class imbalance adversely impacts predictive performance, especially for low-incidence complications. Objective: To evaluate risk-prediction model performance when trained on risk-specific cohorts. Design, Setting, and Participants: This cross-sectional study performed from February 2024 to July 2024 deployed a deep learning model, which generated risk scores for common postoperative complications. A total of 109445 inpatient operations performed at 2 University of Florida Health hospitals from June 1, 2014, to May 5, 2021 were examined. Exposures: The model was trained de novo on separate cohorts for high-risk, medium-risk, and low-risk Common Procedure Terminology codes defined empirically by incidence of 5 postoperative complications: (1) in-hospital mortality; (2) prolonged intensive care unit (ICU) stay (≥48 hours); (3) prolonged mechanical ventilation (≥48 hours); (4) sepsis; and (5) acute kidney injury (AKI). Low-risk and high-risk cutoffs for complications were defined by the lower-Third and upper-Third prevalence in the dataset, except for mortality, cutoffs for which were set at 1% or less and greater than 3%, respectively. Main Outcomes and Measures: Model performance metrics were assessed for each risk-specific cohort alongside the baseline model. Metrics included area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), F1 scores, and accuracy for each model. Results: A total of 109445 inpatient operations were examined among patients treated at 2 University of Florida Health hospitals in Gainesville (77921 procedures [71.2%]) and Jacksonville (31524 procedures [28.8%]). Median (IQR) patient age was 58 (43-68) years, and median (IQR) Charlson Comorbidity Index score was 2 (0-4). Among 109445 operations, 55646 patients were male (50.8%), and 66495 patients (60.8%) underwent a nonemergent, inpatient operation. Training on the high-risk cohort had variable impact on AUROC, but significantly improved AUPRC (as assessed by nonoverlapping 95% confidence intervals) for predicting mortality (0.53; 95% CI, 0.43-0.64), AKI (0.61; 95% CI, 0.58-0.65), and prolonged ICU stay (0.91; 95% CI, 0.89-0.92). It also significantly improved F1 score for mortality (0.42; 95% CI, 0.36-0.49), prolonged mechanical ventilation (0.55; 95% CI, 0.52-0.58), sepsis (0.46; 95% CI, 0.43-0.49), and AKI (0.57; 95% CI, 0.54-0.59). After controlling for baseline model performance on high-risk cohorts, AUPRC increased significantly for in-hospital mortality only (0.53; 95% CI, 0.42-0.65 vs 0.29; 95% CI, 0.21-0.40). Conclusion and Relevance: In this cross-sectional study, by training separate models using a priori knowledge for procedure-specific risk classes, improved performance in standard evaluation metrics was observed, especially for low-prevalence complications like in-hospital mortality. Used cautiously, this approach may represent an optimal training strategy for surgical risk-prediction models.

Original languageEnglish (US)
JournalJAMA Surgery
Volume159
Issue number12
DOIs
StatePublished - Dec 11 2024

Bibliographical note

Publisher Copyright:
© 2024 American Medical Association. All rights reserved.

Fingerprint

Dive into the research topics of 'Risk-Specific Training Cohorts to Address Class Imbalance in Surgical Risk Prediction'. Together they form a unique fingerprint.

Cite this