# CheXmask Database Data Dictionary ## CSV File Structure Each dataset in the CheXmask Database is provided as a separate CSV file. Below are detailed descriptions of all fields/variables present in these files. ### Image Identification **Field Name**: Image ID **Description**: Reference identifier linking to original image in source dataset **Type**: String **Notes**: Format varies by source dataset (ChestX-ray8, CheXpert, MIMIC-CXR-JPG, Padchest, VinDr-CXR) ### Quality Metrics **Field Name**: Dice RCA (Max) **Description**: Maximum Dice Similarity Coefficient from Reverse Classification Accuracy **Type**: Float **Range**: 0.0 to 1.0 **Units**: Dimensionless **Notes**: Higher values indicate better segmentation quality **Field Name**: Dice RCA (Mean) **Description**: Mean Dice Similarity Coefficient from Reverse Classification Accuracy **Type**: Float **Range**: 0.0 to 1.0 **Units**: Dimensionless **Notes**: Recommended threshold for use is >= 0.7 ### Anatomical Features **Field Name**: Landmarks **Description**: Set of points representing organ contours generated by HybridGNet model **Type**: Array of coordinates **Format**: JSON array of [x,y] coordinates **Units**: Pixels **Field Name**: Left Lung **Description**: Segmentation mask for left lung **Type**: String **Format**: Run-length encoding (RLE) **Notes**: Must be decoded using provided dimensions **Field Name**: Right Lung **Description**: Segmentation mask for right lung **Type**: String **Format**: Run-length encoding (RLE) **Notes**: Must be decoded using provided dimensions **Field Name**: Heart **Description**: Segmentation mask for heart **Type**: String **Format**: Run-length encoding (RLE) **Notes**: Must be decoded using provided dimensions ### Image Dimensions **Field Name**: Height **Description**: Height of segmentation mask **Type**: Integer **Units**: Pixels **Notes**: Required for decoding RLE masks **Field Name**: Width **Description**: Width of segmentation mask **Type**: Integer **Units**: Pixels **Notes**: Required for decoding RLE masks ## Additional Notes 1. **RLE Format**: Run-length encoding is used to compress the binary mask data. Each RLE string represents pairs of (start_position, run_length) for the mask. A decoding script is present in the GitHub repository.