CheXmask Database: a large-scale dataset of anatomical segmentation masks for chest x-ray images 1.0.0
(2,216 bytes)
# CheXmask Database v1.0.1
A comprehensive collection of anatomical segmentation masks for chest radiographs derived from five major public databases.
## Overview
The CheXmask Database provides 657,566 anatomical segmentation masks generated from chest radiographs across multiple public databases:
- ChestX-ray8
- Chexpert
- MIMIC-CXR-JPG
- Padchest
- VinDr-CXR
All segmentation masks were generated using the HybridGNet model and include quality metrics based on Reverse Classification Accuracy (RCA) scores.
## Dataset Structure
The dataset consists of CSV files for each source database. Each CSV contains:
| Column Name | Description |
|------------|-------------|
| Image ID | Reference to original image in source dataset |
| Dice RCA (Max) | Maximum Dice Similarity Coefficient for RCA |
| Dice RCA (Mean) | Mean Dice Similarity Coefficient for RCA |
| Landmarks | Organ contour points from HybridGNet model |
| Left Lung | Left lung segmentation mask in RLE format |
| Right Lung | Right lung segmentation mask in RLE format |
| Heart | Heart segmentation mask in RLE format |
| Height | Height of segmentation mask |
| Width | Width of segmentation mask |
## Data Processing
All images were processed to maintain consistent quality:
1. Images were preprocessed to 1024x1024 resolution
2. HybridGNet model was applied for segmentation
3. Masks were restored to original image dimensions
4. RCA scores were calculated for quality assessment
## Usage Guidelines
1. **Source Images**: Users must obtain source images from original databases and comply with their respective requirements (ethics courses, training, etc.).
2. **Quality Threshold**: For analysis, use only segmentation masks with Dice RCA (Mean) >= 0.7
3. **Resolution**: Pre-processed versions (1024x1024) of masks are included for consistent resolution across datasets
## Version History
### v1.0.0
- Updated citation
- Added README file
- Added Data Dictionary
## Citation
When using this dataset, please cite:
Gaggion, N., Mosquera, C., Mansilla, L. et al. CheXmask: a large-scale dataset of anatomical segmentation masks for multi-center chest x-ray images. Sci Data 11, 511 (2024). https://doi.org/10.1038/s41597-024-03358-1