Purpose: To conduct a prospective observational study across 12 U.S. hospitals to evaluate real-time performance of an interpretable artificial intelligence (AI) model to detect COVID-19 on chest radiographs.
Materials and Methods: A total of 95 363 chest radiographs were included in model training, external validation, and real-time validation. The model was deployed as a clinical decision support system, and performance was prospectively evaluated. There were 5335 total real-time predictions and a COVID-19 prevalence of 4.8% (258 of 5335). Model performance was assessed with use of receiver operating characteristic analysis, precision-recall curves, and F1 score. Logistic regression was used to evaluate the association of race and sex with AI model diagnostic accuracy. To compare model accuracy with the performance of board-certified radiologists, a third dataset of 1638 images was read independently by two radiologists.
Results: Participants positive for COVID-19 had higher COVID-19 diagnostic scores than participants negative for COVID-19 (median, 0.1 [IQR, 0.0-0.8] vs 0.0 [IQR, 0.0-0.1], respectively; P < .001). Real-time model performance was unchanged over 19 weeks of implementation (area under the receiver operating characteristic curve, 0.70; 95% CI: 0.66, 0.73). Model sensitivity was higher in men than women ( P = .01), whereas model specificity was higher in women ( P = .001). Sensitivity was higher for Asian ( P = .002) and Black ( P = .046) participants compared with White participants. The COVID-19 AI diagnostic system had worse accuracy (63.5% correct) compared with radiologist predictions (radiologist 1 = 67.8% correct, radiologist 2 = 68.6% correct; McNemar P < .001 for both).
Conclusion: AI-based tools have not yet reached full diagnostic potential for COVID-19 and underperform compared with radiologist prediction. Keywords: Diagnosis, Classification, Application Domain, Infection, Lung Supplemental material is available for this article.. © RSNA, 2022.
Bibliographical noteFunding Information:
Supported by the Agency for Healthcare Research and Quality (AHRQ) and Patient-Centered Outcomes Research Institute (PCORI), grant K12HS026379 (C.J.T.); the National Institutes of Health (NIH) National Center for Advancing Translational Sciences, grants KL2TR002492 (C.J.T.) and UL1TR002494 (E.K.); NIH National Heart, Lung, and Blood Institute, grant T32HL07741 (N.E.I.); NIH National Institute of Biomedical Imaging and Bioengineering, grants 75N92020D00018/75N92020F00001 (J.W.G.); National Institute of Biomedical Imaging and Bioengineering MIDRC grant of the National Institutes of Health under contracts 75N92020C00008 and 75N92020C00021 (Z.Z., J.W.G.); U.S. National Science Foundation #1928481 from the Division of Electrical, Communication and Cyber Systems (J.W.G.); and the University of Minnesota Office of the Vice President of Research (OVPR) COVID-19 Impact Grant (J.S., E.K., C.J.T.).
Disclosures of conflicts of interest: J.S. No relevant relationships. L.P. No relevant relationships. T.L. No relevant relationships. D.A. No relevant relationships. Z.Z. No relevant relationships. G.B.M.M. Payments to institution: Fairview, University of Minnesota, NIH, AHRQ; payment or honoraria for lectures, presentations, or speakers’ bureaus from AMIA, ACMI, Washington University Informatics, and NIH; support for attending meetings and/or travel from AMIA, ACMI, Washington University Informatics, and NIH; patent planned, issued, or pending for application 17/302,373; participation on a data safety monitoring board or advisory board not related to this article; leadership or fiduciary role at AMIA and ACMI. N.E.I. No relevant relationships. E.M. No relevant relationships. D.B. No relevant relationships. S.S. No relevant relationships. J.L.B. No relevant relationships. K.H. Internal funding from Indiana University Precision Health Initiative; author received research grants from NIH; author was principal investigator (PI) for a research contract with Merck (not relevant to this manuscript); author is PI for a research contract with Eli Lilly (not relevant to this manuscript). T.A. Speakers’ bureau for Genentech and Boehringer Ingelheim; gives nonbranded lectures on interstitial lung disease unrelated to this manuscript. S.D.S. No relevant relationships. J.W.G. Medical Imaging and Data Resource Center grant from NIH; Nightingale Open Science grant; NSF Future of Work grant; Kheiron Breast AI Validation grant; BDC Data Anonymization grant; Radiology: Artificial Intelligence trainee editorial board lead. E.K. Grant UL1TR002494; this technology submitted for a U.S. patent. C.J.T. Microsoft AI for Health provided GPU resources via a COVID-19 Research Grant (no payments, other support, or any role in the actual conduct of the research was had by Microsoft); technology was submitted for a U.S. patent.
This study was supported in part by an AI for Health COVID-19 grant (Microsoft). This grant provided graphical processing unit support for this project. No additional support was provided by that grantor, and the authors had control of the data and information submitted for publication. This study was approved by the University of Minnesota institutional review board, and the requirement for written informed consent was waived (STUDY 00011158). External validation at Indiana University (IU) was deemed exempt by the institutional review board because all secondary data were fully de-identified and remained within IU (STUDY 2010169012). External validation of the model at Emory University was approved by the institutional review board (STUDY 00000506).
© RSNA, 2022.
- Application Domain
PubMed: MeSH publication types
- Journal Article