TY - JOUR
T1 - Towards long-tailed, multi-label disease classification from chest X-ray
T2 - Overview of the CXR-LT challenge
AU - Holste, Gregory
AU - Zhou, Yiliang
AU - Wang, Song
AU - Jaiswal, Ajay
AU - Lin, Mingquan
AU - Zhuge, Sherry
AU - Yang, Yuzhe
AU - Kim, Dongkyun
AU - Nguyen-Mau, Trong Hieu
AU - Tran, Minh Triet
AU - Jeong, Jaehyup
AU - Park, Wongi
AU - Ryu, Jongbin
AU - Hong, Feng
AU - Verma, Arsh
AU - Yamagishi, Yosuke
AU - Kim, Changhyun
AU - Seo, Hyeryeong
AU - Kang, Myungjoo
AU - Celi, Leo Anthony
AU - Lu, Zhiyong
AU - Summers, Ronald M.
AU - Shih, George
AU - Wang, Zhangyang
AU - Peng, Yifan
N1 - Publisher Copyright:
© 2024 Elsevier B.V.
PY - 2024/10
Y1 - 2024/10
N2 - Many real-world image recognition problems, such as diagnostic medical imaging exams, are “long-tailed” – there are a few common findings followed by many more relatively rare conditions. In chest radiography, diagnosis is both a long-tailed and multi-label problem, as patients often present with multiple findings simultaneously. While researchers have begun to study the problem of long-tailed learning in medical image recognition, few have studied the interaction of label imbalance and label co-occurrence posed by long-tailed, multi-label disease classification. To engage with the research community on this emerging topic, we conducted an open challenge, CXR-LT, on long-tailed, multi-label thorax disease classification from chest X-rays (CXRs). We publicly release a large-scale benchmark dataset of over 350,000 CXRs, each labeled with at least one of 26 clinical findings following a long-tailed distribution. We synthesize common themes of top-performing solutions, providing practical recommendations for long-tailed, multi-label medical image classification. Finally, we use these insights to propose a path forward involving vision-language foundation models for few- and zero-shot disease classification.
AB - Many real-world image recognition problems, such as diagnostic medical imaging exams, are “long-tailed” – there are a few common findings followed by many more relatively rare conditions. In chest radiography, diagnosis is both a long-tailed and multi-label problem, as patients often present with multiple findings simultaneously. While researchers have begun to study the problem of long-tailed learning in medical image recognition, few have studied the interaction of label imbalance and label co-occurrence posed by long-tailed, multi-label disease classification. To engage with the research community on this emerging topic, we conducted an open challenge, CXR-LT, on long-tailed, multi-label thorax disease classification from chest X-rays (CXRs). We publicly release a large-scale benchmark dataset of over 350,000 CXRs, each labeled with at least one of 26 clinical findings following a long-tailed distribution. We synthesize common themes of top-performing solutions, providing practical recommendations for long-tailed, multi-label medical image classification. Finally, we use these insights to propose a path forward involving vision-language foundation models for few- and zero-shot disease classification.
KW - Chest X-ray
KW - Computer-aided diagnosis
KW - Long-tailed learning
UR - http://www.scopus.com/inward/record.url?scp=85195287452&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85195287452&partnerID=8YFLogxK
U2 - 10.1016/j.media.2024.103224
DO - 10.1016/j.media.2024.103224
M3 - Article
C2 - 38850624
AN - SCOPUS:85195287452
SN - 1361-8415
VL - 97
JO - Medical Image Analysis
JF - Medical Image Analysis
M1 - 103224
ER -