Database Credentialed Access
Eye Gaze Data for Chest X-rays
Alexandros Karargyris , Satyananda Kashyap , Ismini Lourentzou , Joy Wu , Matthew Tong , Arjun Sharma , Shafiq Abedin , David Beymer , Vandana Mukherjee , Elizabeth Krupinski , Mehdi Moradi
Published: Sept. 12, 2020. Version: 1.0.0
When using this resource, please cite:
(show more options)
Karargyris, A., Kashyap, S., Lourentzou, I., Wu, J., Tong, M., Sharma, A., Abedin, S., Beymer, D., Mukherjee, V., Krupinski, E., & Moradi, M. (2020). Eye Gaze Data for Chest X-rays (version 1.0.0). PhysioNet. https://doi.org/10.13026/qfdz-zr67.
Please include the standard citation for PhysioNet:
(show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
Abstract
We created a rich multimodal dataset for the Chest X-Ray (CXR) domain. The data was collected using an eye tracking system while a radiologist interpreted and read 1,083 public CXR images. The dataset contains the following aligned modalities: image, transcribed report text, dictation audio and eye gaze data. We hope this dataset can contribute to various fields of research with applications in machine learning such as deep learning explainability, multi-modal fusion, disease classification, and automated radiology report generation to name a few. The images were selected from the MIMIC-CXR Database and were associated with studies from 1,038 subjects (female: 495, male: 543) who had age range 20 - 80 years old.
Background
CXR is the most common imaging modality in the United States. It makes up to 74% of all imaging modalities ordered by physicians [1]. In recent years with the proliferation of deep learning techniques and publicly available CXR datasets([2], [3], [4]), numerous machine learning approaches have been proposed and deployed in radiology settings for disease detection.
Eye tracking in radiology has been extensively studied for the purposes of education, perception understanding, fatigue measurement (please see literature reviews for more details: [5], [6], [7], [8]). More recently, efforts such as [9], [10], and [11] have shown use of eye gaze data to improve segmentation and disease classification in Computed Tomography (CT) radiography by combining them in deep learning techniques.
Currently, there is a lack of public datasets that capture eye gaze data in CXR space and given their promising utilization in machine learning, we are releasing the first of its kind dataset to the research community to explore and implement novel applications.
Methods
The dataset was collected using an eye tracking system (GP3 HD Eye Tracker, Gazepoint). A radiologist, American Board of Radiology (ABR) certified, with 5 years of attending experience performed interpretation/reading on 1,083 CXR images. The analysis software (Gazepoint Analysis UX Edition) allowed for recording and exporting of eye gaze data and dictation audio.
To identify the images for this study we used MIMIC-CXR Database [1], which is a large public dataset containing CXR in conjunction with the MIMIC-IV Clinical Database [12] that contains clinical outcomes. Inclusion and exclusion criteria were applied on the Emergency Department clinical noted from MIMIC-IV Clinical Database [12] resulting in a subset of 1,083 cases covering equally 3 prominent clinical conditions (i.e. Normal, Pneumonia and Congestive Heart Failure (CHF)). The corresponding CXR images of these cases were extracted from the MIMIC-CXR database [2].
The radiologist performed radiology reading on these CXR images using Gazepoint's GP3 Eye Tracker, Gazepoint Analysis UX Edition software, a headset microphone, a PC computer and a monitor (Dell S2719DGF) set at 1920x1080 resolution. Radiology reading took place in multiple sessions (i.e. 30 cases per session) over a period of 2 months (i.e. March - May 2020). The Gazepoint Analysis UX Edition exported raw and processed eye gaze fixations (.csv format) and voice dictation (audio) of radiologist's reading. The audio files were further processed with speech-to-text software (i.e. Google Speech-to-Text API) to extract text transcripts along with dictation word time-related information (.json format). Furthermore, these transcripts were manually corrected. The final dataset contained the eye gaze signal information (.csv), audio files (.wav, .mp3) and transcript files (.json).
Data Description
The dataset consists of the following data documents:
master_sheet.csv:
Master spreadsheet containingDICOM_ID
s (i.e. original MIMIC-CXR Database IDs) along with disease labelsfixations.csv
: Spreadsheet containing fixation eye gaze data as exported by Gazepoint Analysis UX Edition software containingDICOM_ID
seye_gaze.csv
: Spreadsheet containing raw eye gaze data as exported by Gazepoint Analysis UX Edition software containingDICOM_IDs
bounding_boxes.csv
: Spreadsheet containing bounding boxes coordinates for the anatomical structures containingDICOM_ID
sinclusion_exclusion_criteria_outputs
: Folder containing 3 spreadsheet files that were generated after applying inclusion/exclusion criteria. These 3 spreadsheet files can be used by the sampling script to generate themaster_sheet.csv
. This is optional and it is shared for reproducible purposes.audio_segmentation_transcripts
: Folder with i) dictation audio files (i.e. mp3, wav), ii) transcript file (i.e. json), iii) anatomy segmentation mask files (i.e. png) for eachDICOM_ID
.
The user can traverse easily between the data documents using the DICOM_ID
as well as the MIMIC-CXR Database.
NOTE: The bounding_boxes.csv
and anatomy segmentation masks files are provided as supplemental sources to help researchers for useful in-depth and correlation analysis (e.g. eye gaze vs. anatomical structures) and/or anatomical structure segmentation purposes.
Detailed Description
1) The master_sheet.csv
spreadsheet provides the following key information (detailed description found in table_descriptions.pdf
):
- The
DICOM_ID
column maps each row to the original MIMIC CXR image as well as the rest of the documents in this dataset. - Granular disease labels given by the MIMIC CXR database [2] (i.e. CheXpert NLP tool [3])
- The reason for exam sentences sectioned out from Indication section of the original MIMIC-CXR report
2) The fixations.csv
and eye_gaze.csv
spreadsheets contain the eye tracking information. They were exported by the Gazepoint Analysis UX Edition software. The difference between fixations.csv
and eye_gaze.csv
is that the former file is a subset of the latter one.
Specifically, the eye_gaze.csv
file contains one (1) row for every data sample collected from the eye tracker while fixations.csv
file contains a single data entry per fixation. Fixation is defined as the maintaining of the eye gaze on a single location (i.e. eye gaze cluster). So the Gazepoint Analysis UX Edition software generates the fixations.csv
file by post-processing (i.e. 'sweeping') the eye_gaze.csv
file and storing the last entry for each fixation. Both fixations.csv
and eye_gaze.csv
spreadsheets contain the same columns. Key columns that are found in both spreadsheets are listed below (detailed description found in table_descriptions.pdf
):
DICOM_ID
: maps rows to the original MIMIC image name.TIME (in secs)
: presents the time elapsed in seconds since the last system initialization or calibration (i.e. when a new CXR image was presented to the radiologist)FPOGX
: the X coordinates of the fixation POG, as a fraction of the screen size. (0, 0) is top left, (0.5, 0.5) is the screen center, and (1.0, 1.0) is bottom right.FPOGY
: the Y coordinates of the fixation POG, as a fraction of the screen size. (0, 0) is top left, (0.5, 0.5) is the screen center, and (1.0, 1.0) is bottom right.X_ORIGINAL
: the X coordinate of the fixation in original MIMIC DICOM image coordinates.Y_ORIGINAL
: the Y coordinate of the fixation in original MIMIC DICOM image coordinates.
3) The bounding_boxes.csv
contains the following columns:
dicom_id
: the MIMIC DICOM image namebbox_name
: the anatomy namex1
: the X coordinate of the top left corner point of the bounding box in original MIMIC DICOM image coordinatesy1
: the Y coordinate of the top left corner point of the bounding box in original MIMIC DICOM image coordinatesx2
: the X coordinate of the bottom right corner point of the bounding box in original MIMIC DICOM image coordinatesy2
: the Y coordinate of the bottom right corner point of the bounding box in original MIMIC DICOM image coordinates
4) The audio_segmentation_transcripts
folder contains subfolders named using DICOM_IDs
. Each subfolder contains the following files:
audio.wav
: the dictation audio in wav formataudio.mp3
: the dictation audio in mp3 formattranscript.json
: the transcript of the dictation audio with timestamps for each spoken phrase. Specifically,phrase
tag contains phrase text,begin_time
tag contains the starting time (in seconds) of dictation for phrase,end_time
tag contains the end time (in seconds) of dictation for phraseleft_lung.png
,right_lung.png
,mediastinum.png
andaortic_knob.png
are the manually segmentation images of four (4) key anatomies: left lung, right lung, mediastinum, aortic knob, respectively.
Usage Notes
The dataset requires access to the CXR DICOM images found in the MIMIC-CXR database [2]. In general, the user is advised to use the fixations.csv
spreadsheet for their experiments because it contains the eye gaze signal as post-processed by the Gazepoint Analysis UX Edition. However if the user wants access to the raw sampled eye gaze signal they are advised to use eye_gaze.csv
.
As mentioned in the Data Description section, the user can work on a combination of information coming from the data document by utilizing the DICOM_ID
tag found across all the data documents. Examples of data usage can be found at https://github.com/cxr-eye-gaze/eye-gaze-dataset
Release Notes
Version 1.0.0: Initial upload of dataset
Conflicts of Interest
No conflicts of interest to declare
References
- Mettler, F. A., Bhargavan, M., Faulkner, K., Gilley, D. B., Gray, J. E., Ibbott, G. S., Lipoti, J. A., Mahesh, M., McCrohan, J. L., Stabin, M. G., Thomadsen, B. R., and Yoshizumi, T. T., “Radiologic and Nuclear Medicine Studies in the United States and Worldwide: Frequency, Radiation Dose, and Comparison with Other Radiation Sources19502007,” Radiology 253, 520–531 (nov 2009)
- Johnson, A., Pollard, T., Mark, R., Berkowitz, S., & Horng, S. (2019). MIMIC-CXR Database (version 2.0.0). PhysioNet. https://doi.org/10.13026/C2JT1Q.
- Irvin, J., Rajpurkar, P., Ko, M., Yu, Y., Ciurea-Ilcus, S., Chute, C., Marklund, H., Haghgoo, B., Ball, R., Shpanskaya, K., et al., 2019. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. arXiv preprint arXiv:1901.07031
- Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM. ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. IEEE CVPR 2017,
- Stephen Anthony Waite, Arkadij Grigorian, Robert G Alexander, Stephen Louis Macknik, Marisa Carrasco, David Heeger, and Susana Martinez-Conde. 2019. Analysis of perceptual expertise in radiology–Current knowledge and a new perspective. Frontiers in human neuroscience 13 (2019), 213
- Van der Gijp, A., Ravesloot, C., Jarodzka, H., Van der Schaaf, M., Van der Schaaf, I., Van Schaik, J., & Ten Cate, T. J. (2016). How visual search relates to visual diagnostic performance: a narrative systematic review of eye-tracking research in radiology. Advances in Health Sciences Education, 1-23. doi: 10.1007/s10459-016-9698-1
- Krupinski, E. A. (2010). Current perspectives in medical image perception. Attention, Perception, & Psychophysics, 72(5), 1205–1217.
- Tourassi G, Voisin S, Paquit V, Krupinski E: Investigating the link between radiologists’ gaze, diagnostic decision, and image content. J Am Med Inform Assoc 20(6):1067–1075, 2013
- Khosravan N, Celik H, Turkbey B, Jones EC, Wood B, Bagci U: A collaborative computer aided diagnosis (C-CAD) system with eye-tracking, sparse attentional model, and deep learning. Med Image Anal 51:101–115, 2019
- Stember, J.N., Celik, H., Krupinski, E. et al. Eye Tracking for Deep Learning Segmentation Using Convolutional Neural Networks. J Digit Imaging 32, 597–604 (2019). https://doi.org/10.1007/s10278-019-00220-4
- Aresta, Guilherme, et al. "Automatic lung nodule detection combined with gaze information improves radiologists' screening performance." IEEE Journal of Biomedical and Health Informatics (2020).
- Johnson, Alistair, et al. "MIMIC-IV" (version 0.4). PhysioNet (2020), https://doi.org/10.13026/a3wn-hq05.
Parent Projects
Access
Access Policy:
Only credentialed users who sign the DUA can access the files.
License (for files):
PhysioNet Credentialed Health Data License 1.5.0
Data Use Agreement:
PhysioNet Credentialed Health Data Use Agreement 1.5.0
Required training:
CITI Data or Specimens Only Research
Discovery
DOI (version 1.0.0):
https://doi.org/10.13026/qfdz-zr67
DOI (latest version):
https://doi.org/10.13026/nrbg-cs59
Topics:
audio
convolutional network
heatmap
eye tracking
explainability
chest
cxr
multimodal
radiology
deep learning
chest x-ray
machine learning
Project Website:
https://github.com/cxr-eye-gaze/
Corresponding Author
Files
- be a credentialed user
- complete required training:
- CITI Data or Specimens Only Research You may submit your training here.
- sign the data use agreement for the project