Name: CXR-LT: Multi-Label Long-Tailed Classification on Chest X-Rays
Published: March 19, 2025
License: https://github.com/MIT-LCP/license-and-dua/tree/master/drafts

Challenge Credentialed Access

Gregory Holste , Mingquan Lin , Song Wang , Yiliang Zhou , Yishu Wei , Hao Chen , Atlas Wang , Yifan Peng

Published: March 19, 2025. Version: 2.0.0

When using this resource, please cite: (show more options)
Holste, G., Lin, M., Wang, S., Zhou, Y., Wei, Y., Chen, H., Wang, A., & Peng, Y. (2025). CXR-LT: Multi-Label Long-Tailed Classification on Chest X-Rays (version 2.0.0). PhysioNet. https://doi.org/10.13026/ryj9-x506.

MLA	Holste, Gregory, et al. "CXR-LT: Multi-Label Long-Tailed Classification on Chest X-Rays" (version 2.0.0). PhysioNet (2025), https://doi.org/10.13026/ryj9-x506.
APA	Holste, G., Lin, M., Wang, S., Zhou, Y., Wei, Y., Chen, H., Wang, A., & Peng, Y. (2025). CXR-LT: Multi-Label Long-Tailed Classification on Chest X-Rays (version 2.0.0). PhysioNet. https://doi.org/10.13026/ryj9-x506.
Chicago	Holste, Gregory, Lin, Mingquan, Wang, Song, Zhou, Yiliang, Wei, Yishu, Chen, Hao, Wang, Atlas, and Yifan Peng. "CXR-LT: Multi-Label Long-Tailed Classification on Chest X-Rays" (version 2.0.0). PhysioNet (2025). https://doi.org/10.13026/ryj9-x506.
Harvard	Holste, G., Lin, M., Wang, S., Zhou, Y., Wei, Y., Chen, H., Wang, A., and Peng, Y. (2025) 'CXR-LT: Multi-Label Long-Tailed Classification on Chest X-Rays' (version 2.0.0), PhysioNet. Available at: https://doi.org/10.13026/ryj9-x506.
Vancouver	Holste G, Lin M, Wang S, Zhou Y, Wei Y, Chen H, Wang A, Peng Y. CXR-LT: Multi-Label Long-Tailed Classification on Chest X-Rays (version 2.0.0). PhysioNet. 2025. Available from: https://doi.org/10.13026/ryj9-x506.

Additionally, please cite the original publication:

Holste, Gregory, Yiliang Zhou, Song Wang, Ajay Jaiswal, Mingquan Lin, Sherry Zhuge, Yuzhe Yang et al. "Towards long-tailed, multi-label disease classification from chest X-ray: Overview of the CXR-LT challenge." Medical Image Analysis (2024): 103224.

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

APA	Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
MLA	Goldberger, A., et al. "PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220." (2000).
CHICAGO	Goldberger, A., L. Amaral, L. Glass, J. Hausdorff, P. C. Ivanov, R. Mark, J. E. Mietus, G. B. Moody, C. K. Peng, and H. E. Stanley. "PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220." (2000).
HARVARD	Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P.C., Mark, R., Mietus, J.E., Moody, G.B., Peng, C.K. and Stanley, H.E., 2000. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
VANCOUVER	Goldberger A, Amaral L, Glass L, Hausdorff J, Ivanov PC, Mark R, Mietus JE, Moody GB, Peng CK, Stanley HE. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

Abstract

Chest radiography presents a "long-tailed" distribution of findings, where a few diseases are common, but most are rare. Diagnosis is further complicated by its multi-label nature, as patients often exhibit multiple co-occurring findings. While recent research has attempted to address the long-tailed medical image classification problem, the interplay between class imbalance and label co-occurrence remains underexplored. The CXR-LT 2024 challenge builds on the success of CXR-LT 2023, expanding the dataset of 377,110 chest X-rays (CXRs) to 45 disease labels, including 19 new rare disease findings. This year’s challenge introduces three tasks: (i) long-tailed classification on a large, noisy test set, (ii) long-tailed classification on a manually annotated "gold standard" subset, and (iii) zero-shot generalization to five previously unseen disease findings. CXR-LT 2024 addresses critical challenges in long-tailed, multi-label, and zero-shot learning for medical imaging by synthesizing state-of-the-art solutions from the international research community. Further, our dataset contributions — expanding disease coverage to better reflect real-world clinical settings — offer a valuable resource for future research. This project contains labels from the CXR-LT 2024 and CXR-LT 2023 challenges, as well as a related subset used in the MICCAI 2023 paper, "How Does Pruning Impact Multi-Label Long-Tailed Learning?"

Objective

Background

Chest radiography, like many diagnostic medical exams, produces a long-tailed distribution of clinical findings; while a small subset of diseases are routinely observed, the vast majority of diseases are relatively rare [1]. This poses a challenge for standard deep learning methods, which exhibit bias toward the most common classes at the expense of the important, but rare, “tail” classes [2]. Many existing methods [3] have been proposed to tackle this specific type of imbalance, though only recently with attention to long-tailed medical image recognition problems [4-6]. Diagnosis on chest X-ray (CXR) is also a multi-label problem, as patients often present with multiple disease findings simultaneously; however, only a select few studies incorporate knowledge of label co-occurrence into the learning process [7-10].

The CXR-LT series marks a community-driven initiative to improve lung disease classification using chest X-ray that addresses challenges in long-tailed lung disease classification and advances the measurability of state-of-the-art techniques [2]. These goals were pursued during the first event, CXR-LT 2023 [11], by offering high-quality benchmark CXR data for model development and conducting detailed evaluations to identify persistent issues affecting lung disease classification performance. CXR-LT 2023 attracted significant attention, with 59 teams yielding over 500 unique submissions. Since then, the task setup and data have provided a foundation for numerous studies [12-15].

As the second event in the series, CXR-LT 2024 maintains the general design and goals of its predecessor while introducing a new emphasis on zero-shot learning, which addresses a limitation identified in CXR-LT 2023. The number of unique radiological findings is estimated to exceed 4,500 [16], suggesting that the actual distribution of clinical findings on CXR is at least two orders of magnitude greater than what current benchmarks can offer. Therefore, effectively addressing the long tail of radiological abnormal findings necessitates the development of models that can generalize to new classes in a "zero-shot" manner.

CXR-LT 2024 challenge tasks

CXR-LT 2024 is split into three tasks, leveraging labels for 19 new rare disease findings and a manually annotated "gold standard" set:

Task 1: Long-tailed classification on a large, noisy test set (40 training labels)
Task 2: Long-tailed classification on a small, manually annotated test set (26 training labels)
Task 3: Zero-shot generalization to previously unseen diseases (5 new labels)

Each task adheres to the general framework established by CXR-LT 2023, providing participants with a large, automatically labeled training set consisting of 377,110 CXR images with 40 binary disease labels. The final submissions from participants are evaluated against a separate held-out test set prepared in a similar manner.

MICCAI 2024 challenge event

The top 9 teams across the three tasks were invited to present their challenge solutions at MICCAI 2024 [17]. These top-performing solutions will be described in an upcoming challenge summary paper.

Participation

This MICCAI 2024 shared challenge task was conducted on CodaLab [18-20]. Participants and those wishing to access the challenge data must have credentialed access to MIMIC-CXR-JPG v2.0.0 (or higher) [21,22].

Challenge timeline

05/01/2024: Development Phase begins. Training data released and challenge (Development Phase) begins.
08/01/2024: Test Phase begins. Unlabeled test data released and final evaluation (Test Phase) begins. The leaderboard was kept private for this phase
08/04/2024: Challenge ends. Test Phase ends and the challenge is closed.
08/15/2024: Top-performing teams invited to present at MICCAI 2024.
10/10/2024: MICCAI 2024 CXR-LT Challenge event.

Data Description

CXR-LT 2024 challenge data

This challenge used an expanded version of MIMIC-CXR-JPG [21,22]. Following CXR-LT 2023 [11], each CXR study was automatically labeled with 19 new rare disease findings parsed from the associated reports. The dataset contains 377,110 CXRs, each labeled with at least one of 45 clinical findings (including a "Normal" class, indicating no cardiopulmonary disease). This challenge also used a small "gold standard" subset of 406 CXR reports, which were manually annotated with consensus labels as described in the CXR-LT 2023 overview [11]. In addition to the 12 rare diseases added in CXR-LT 2023, the labels for CXR-LT 2024 challenge data now include:

Adenopathy
Azygos Lobe
Clavicle Fracture
Fissure
Hydropneumothorax
Infarction
Kyphosis
Lobar Atelectasis
Pleural Other
Pulmonary Embolism
Pulmonary Hypertension
Rib Fracture
Round Atelectasis
Tuberculosis
Bulla
Cardiomyopathy
Hilum
Osteopenia
Scoliosis.

CXR-LT 2024 data access

Within the cxr-lt-2024/ directory, all labels for all images can be found in labels.csv. For the official CXR-LT 2024 labeled training set, see train_labeled.csv (same across all tasks). The task-specific development sets can be found in development_labeled_task1.csv, development_labeled_task2.csv, and development_labeled_task3.csv; similarly, the task-specific test sets can be found in test_labeled_task1.csv, test_labeled_task2.csv, and test_labeled_task3.csv.

CXR-LT 2023 challenge data

This challenge used an expanded version of MIMIC-CXR-JPG v2.0.0 [21,22], a large benchmark dataset for automated thorax disease classification. Following Holste et al. [2], each CXR study in the dataset was labeled with 12 newly added disease findings extracted from the associated radiology reports. The resulting long-tailed dataset contains 377,110 CXRs, each labeled with at least one of 26 clinical findings (including a "No Finding" class). In addition to the 13 clinical findings in the original MIMIC-CXR-JPG v2.0.0 dataset, the following 12 new findings were added:

Calcification of the Aorta
Emphysema
Fibrosis
Hernia
Infiltration
Mass
Nodule
Pleural Thickening
Pneumomediastinum
Pneumoperitoneum
Subcutaneous Emphysema
Tortuous Aorta

Within the cxr-lt-2023/ directory, training set and validation set image IDs, metadata, and labels can be found in train.csv and development.csv, respectively. Alternatively, these files are available to registered participants on CodaLab [15] under "Participate" -> "Files". Test set image IDs, metadata, and labels can be found in test.csv after the conclusion of the competition.

MICCAI 2023 PruneCXR data

Additionally, a subset of this dataset used in the MICCAI 2023 paper, "How Does Pruning Impact Long-Tailed Multi-Label Medical Image Classifiers?" [23] is provided here to ensure reproducibility; please see the accompanying Github repository [24] for full details on implementation and reproducibility. This study consists of 257,018 frontal CXRS, each labeled with one of 19 clinical findings (including a "No Findings" class). The following 5 findings are included in the label set in addition to the original 13 found in MIMIC-CXR-JPG v2.0.0:

Calcification of the Aorta
Pneumomediastinum
Pneumoperitoneum
Subcutaneous Emphysema
Tortuous Aorta

Within the miccai-2023_mimic-cxr-lt/ directory, the training, validation, and test set image IDs and labels for this study can be found, respectively, in miccai2023_mimic-cxr-lt_labels_train.csv, miccai2023_mimic-cxr-lt_labels_val.csv, and miccai2023_mimic-cxr-lt_labels_test.csv.

Evaluation

Participants uploaded image-level predictions on the provided test sets for evaluation. Since this is a multi-label classification problem with severe imbalance, the primary evaluation metric was mean Average Precision (mAP) (i.e., "macro-averaged" AP across the 26 classes). While Area Under the Receiver Operating Characteristic Curve (AUC) is a standard metric for related datasets, AUC can be heavily inflated in the presence of strong imbalance. Instead, mAP is more appropriate for the long-tailed, multi-label setting since it both (i) measures performance across decision thresholds and (ii) does not degrade under class imbalance. For thoroughness, mean AUC (mAUC) and mean F1 score (mF1) — using a threshold of 0.5 for each class — will be calculated and appear on the leaderboard, but not contribute to team rankings. Mean expected calibration error (ECE) [25] was also computed to quantify model calibration and bias.

Submission file structure

All CodaLab submissions were required to be in .zip format. For this competition, this compressed .zip file must contain (i) a predictions .csv file and (ii) a "code/" directory with all training and inference code. The required file structure was as follows:

    
        xxx.csv  # predictions .csv file
        code/  # code directory
        ├── yyy.py
        ├── zzz.py
        ├── ...

Release Notes

Version 2.0.0: This release contains data used in the CXR-LT 2024 challenge (45 labels for 377,110 images), including 19 additional rare disease labels compared to CXR-LT 2023, and a manually annotated "gold standard" subset (26 labels for 406 images).

Ethics

This shared task uses image data from MIMIC-CXR-JPG v2.0.0 and generates labels from free-text radiology reports in MIMIC-CXR, a de-identified dataset that we gained access to through a PhysioNet Credentialed Health Data Use Agreement (v1.5.0).

Acknowledgements

We thank our steering committee for their support of this project: Leo Anthony Celi, Zhiyong Lu, George Shih, Adam Flanders, and Ronald Summers.

Conflicts of Interest

The authors have no conflicts of interest to declare.

References

Zhou SK, Greenspan H, Davatzikos C, Duncan JS, Van Ginneken B, Madabhushi A, Prince JL, Rueckert D, Summers RM. A review of deep learning in medical imaging: Imaging traits, technology trends, case studies with progress highlights, and future promises. Proceedings of the IEEE. 2021 Feb 26;109(5):820-38.
Holste G, Wang S, Jiang Z, Shen TC, Shih G, Summers RM, Peng Y, Wang Z. Long-Tailed Classification of Thorax Diseases on Chest X-Ray: A New Benchmark Study. In Data Augmentation, Labelling, and Imperfections: Second MICCAI Workshop, DALI 2022, Held in Conjunction with MICCAI 2022, Singapore, September 22, 2022, Proceedings 2022 Sep 16 (pp. 22-32). Cham: Springer Nature Switzerland.
Zhang Y, Kang B, Hooi B, Yan S, Feng J. Deep long-tailed learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2023 Apr 19.
Zhang R, Haihong E, Yuan L, He J, Zhang H, Zhang S, Wang Y, Song M, Wang L. MBNM: multi-branch network based on memory features for long-tailed medical image recognition. Computer Methods and Programs in Biomedicine. 2021 Nov 1;212:106448.
Ju L, Wang X, Wang L, Liu T, Zhao X, Drummond T, Mahapatra D, Ge Z. Relational subsets knowledge distillation for long-tailed retinal diseases recognition. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part VIII 24 2021 (pp. 3-12). Springer International Publishing.
Yang Z, Pan J, Yang Y, Shi X, Zhou HY, Zhang Z, Bian C. ProCo: Prototype-Aware Contrastive Learning for Long-Tailed Medical Image Classification. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2022: 25th International Conference, Singapore, September 18–22, 2022, Proceedings, Part VIII 2022 Sep 16 (pp. 173-182). Cham: Springer Nature Switzerland.
Chen H, Miao S, Xu D, Hager GD, Harrison AP. Deep hierarchical multi-label classification of chest X-ray images. In International Conference on Medical Imaging with Deep Learning 2019 May 24 (pp. 109-120). PMLR.
Wang G, Wang P, Cong J, Liu K, Wei B. BB-GCN: A Bi-modal Bridged Graph Convolutional Network for Multi-label Chest X-Ray Recognition. arXiv preprint arXiv:2302.11082. 2023 Feb 22.
Chen B, Li J, Lu G, Yu H, Zhang D. Label co-occurrence learning with graph convolutional networks for multi-label chest x-ray image classification. IEEE Journal of Biomedical and Health Informatics. 2020 Jan 16;24(8):2292-302.
Moukheiber D, Mahindre S, Moukheiber L, Moukheiber M, Wang S, Ma C, Shih G, Peng Y, Gao M. Few-Shot Learning Geometric Ensemble for Multi-label Classification of Chest X-Rays. In Data Augmentation, Labelling, and Imperfections: Second MICCAI Workshop, DALI 2022, Held in Conjunction with MICCAI 2022, Singapore, September 22, 2022, Proceedings 2022 Sep 16 (pp. 112-122). Cham: Springer Nature Switzerland.
Holste, Gregory, Yiliang Zhou, Song Wang, Ajay Jaiswal, Mingquan Lin, Sherry Zhuge, Yuzhe Yang et al. "Towards long-tailed, multi-label disease classification from chest X-ray: Overview of the CXR-LT challenge." Medical Image Analysis (2024): 103224.
Hong, Yuxin, Xiao Zhang, Xin Zhang, and Joey Tianyi Zhou. "Evolution-aware VAriance (EVA) Coreset Selection for Medical Image Classification." In Proceedings of the 32nd ACM International Conference on Multimedia, pp. 301-310. 2024.
Huijben, Evi MC, Josien PW Pluim, and Maureen AJM van Eijnatten. "Denoising diffusion probabilistic models for addressing data limitations in chest X-ray classification." Informatics in Medicine Unlocked 50 (2024): 101575.
Park, Wongi, and Jongbin Ryu. "Fine-Grained Self-Supervised Learning with Jigsaw puzzles for medical image classification." Computers in Biology and Medicine 174 (2024): 108460.
Li, Yuhang, Tong Liu, Wenfeng Shen, Yangguang Cui, and Weijia Lu. "Improving Generalization and Personalization in Long-Tailed Federated Learning via Classifier Retraining." In European Conference on Parallel Processing, pp. 408-423. Cham: Springer Nature Switzerland, 2024.
Budovec, Joseph J., Cesar A. Lam, and Charles E. Kahn Jr. "Informatics in radiology: radiology gamuts ontology: differential diagnosis for the Semantic Web." Radiographics 34, no. 1 (2014): 254-264.
MICCAI. MICCAI 2024 - 27. International Conference On Medical Image Computing & Computer Assisted Intervention [Internet]. Available from: https://conferences.miccai.org/2024/en/.
CodaLab. [Task 1] CXR-LT: Long-tailed, multi-label, and zero-shot classification on chest X-rays [Internet]. Available from: https://codalab.lisn.upsaclay.fr/competitions/18601.
CodaLab. [Task 2] CXR-LT: Long-tailed, multi-label, and zero-shot classification on chest X-rays [Internet]. Available from: https://codalab.lisn.upsaclay.fr/competitions/18603.
CodaLab. [Task 3] CXR-LT: Long-tailed, multi-label, and zero-shot classification on chest X-rays [Internet]. Available from: https://codalab.lisn.upsaclay.fr/competitions/18604.
Johnson AE, Pollard TJ, Greenbaum NR, Lungren MP, Deng CY, Peng Y, Lu Z, Mark RG, Berkowitz SJ, Horng S. MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs. arXiv preprint arXiv:1901.07042. 2019 Jan 21.
PhysioNet. MIMIC-CXR-JPG - chest radiographs with structured labels [Internet]. Available from: https://physionet.org/content/mimic-cxr-jpg/2.0.0/.
Holste, Gregory, Ziyu Jiang, Ajay Jaiswal, Maria Hanna, Shlomo Minkowitz, Alan C. Legasto, Joanna G. Escalon et al. "How does pruning impact long-tailed multi-label medical image classifiers?." In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 663-673. Cham: Springer Nature Switzerland, 2023.
Github. PruneCXR: How Does Pruning Impact Long-Tailed Multi-Label Medical Image Classifiers? [Internet]. Available from: https://github.com/VITA-Group/PruneCXR.
Naeini, Mahdi Pakdaman, Gregory Cooper, and Milos Hauskrecht. "Obtaining well calibrated probabilities using bayesian binning." In Proceedings of the AAAI conference on artificial intelligence, vol. 29, no. 1. 2015.