Name: MIMIC-Ext-MIMIC-CXR-VQA: A Complex, Diverse, And Large-Scale Visual Question Answering Dataset for Chest X-ray Images
Published: July 19, 2024
License: https://github.com/MIT-LCP/license-and-dua/tree/master/drafts

Database Credentialed Access

Seongsu Bae , Daeun Kyung , Jaehee Ryu , Eunbyeol Cho , Gyubok Lee , Sunjun Kweon , Jungwoo Oh , Lei JI , Eric Chang , Tackeun Kim , Edward Choi

Published: July 19, 2024. Version: 1.0.0

When using this resource, please cite: (show more options)
Bae, S., Kyung, D., Ryu, J., Cho, E., Lee, G., Kweon, S., Oh, J., JI, L., Chang, E., Kim, T., & Choi, E. (2024). MIMIC-Ext-MIMIC-CXR-VQA: A Complex, Diverse, And Large-Scale Visual Question Answering Dataset for Chest X-ray Images (version 1.0.0). PhysioNet. https://doi.org/10.13026/deqx-d943.

MLA	Bae, Seongsu, et al. "MIMIC-Ext-MIMIC-CXR-VQA: A Complex, Diverse, And Large-Scale Visual Question Answering Dataset for Chest X-ray Images" (version 1.0.0). PhysioNet (2024), https://doi.org/10.13026/deqx-d943.
APA	Bae, S., Kyung, D., Ryu, J., Cho, E., Lee, G., Kweon, S., Oh, J., JI, L., Chang, E., Kim, T., & Choi, E. (2024). MIMIC-Ext-MIMIC-CXR-VQA: A Complex, Diverse, And Large-Scale Visual Question Answering Dataset for Chest X-ray Images (version 1.0.0). PhysioNet. https://doi.org/10.13026/deqx-d943.
Chicago	Bae, Seongsu, Kyung, Daeun, Ryu, Jaehee, Cho, Eunbyeol, Lee, Gyubok, Kweon, Sunjun, Oh, Jungwoo, JI, Lei, Chang, Eric, Kim, Tackeun, and Edward Choi. "MIMIC-Ext-MIMIC-CXR-VQA: A Complex, Diverse, And Large-Scale Visual Question Answering Dataset for Chest X-ray Images" (version 1.0.0). PhysioNet (2024). https://doi.org/10.13026/deqx-d943.
Harvard	Bae, S., Kyung, D., Ryu, J., Cho, E., Lee, G., Kweon, S., Oh, J., JI, L., Chang, E., Kim, T., and Choi, E. (2024) 'MIMIC-Ext-MIMIC-CXR-VQA: A Complex, Diverse, And Large-Scale Visual Question Answering Dataset for Chest X-ray Images' (version 1.0.0), PhysioNet. Available at: https://doi.org/10.13026/deqx-d943.
Vancouver	Bae S, Kyung D, Ryu J, Cho E, Lee G, Kweon S, Oh J, JI L, Chang E, Kim T, Choi E. MIMIC-Ext-MIMIC-CXR-VQA: A Complex, Diverse, And Large-Scale Visual Question Answering Dataset for Chest X-ray Images (version 1.0.0). PhysioNet. 2024. Available from: https://doi.org/10.13026/deqx-d943.

Additionally, please cite the original publication:

Bae, S., Kyung, D., Ryu, J., Cho, E., Lee, G., Kweon, S., ... & Choi, E. (2024). EHRXQA: A multi-modal question answering dataset for electronic health records with chest x-ray images. Advances in Neural Information Processing Systems, 36.

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

APA	Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
MLA	Goldberger, A., et al. "PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220." (2000).
CHICAGO	Goldberger, A., L. Amaral, L. Glass, J. Hausdorff, P. C. Ivanov, R. Mark, J. E. Mietus, G. B. Moody, C. K. Peng, and H. E. Stanley. "PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220." (2000).
HARVARD	Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P.C., Mark, R., Mietus, J.E., Moody, G.B., Peng, C.K. and Stanley, H.E., 2000. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
VANCOUVER	Goldberger A, Amaral L, Glass L, Hausdorff J, Ivanov PC, Mark R, Mietus JE, Moody GB, Peng CK, Stanley HE. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

Abstract

We introduce MIMIC-Ext-MIMIC-CXR-VQA (i.e., Extended from MIMIC database), a complex, diverse, and large-scale dataset designed for Visual Question Answering (VQA) tasks within the medical domain, focusing primarily on chest radiographs. This dataset includes approximately 377K entries derived from the MIMIC-CXR-JPG, MIMIC-IV, and Chest ImaGenome datasets, all sourced from Physionet. It features questions generated from 48 unique templates across seven content types: presence, anatomy, attribute, abnormality, size, plane, and gender. Each template, developed under the guidance of a board-certified medical expert to ensure clinical relevance, addresses both standard content from previous medical VQA tasks and more complex scenarios involving set and logical operations. To further enhance linguistic diversity while maintaining a medical context, we implemented a paraphrasing strategy with an average of 16.5 paraphrases per template, developed through carefully designed prompts based on GPT-4.

The primary aim of MIMIC-Ext-MIMIC-CXR-VQA is to serve as a comprehensive benchmark for evaluating medical VQA methodologies. However, the significance of this dataset extends far beyond just medical VQA benchmarking. It not only provides a foundational tool for developing and testing VQA methods but also acts as a valuable resource for instruction tuning of medical Vision-and-Language Models (VLMs), addressing the scarcity of medical instruction datasets. Furthermore, the integration of structured EHRs (i.e., MIMIC-IV) with our dataset, MIMIC-Ext-MIMIC-CXR-VQA, opens new avenues for the development of multi-modal AI frameworks that leverage both imaging and tabular modalities of patient records. By making this dataset publicly accessible, we aim to improve the understanding of medical images and stimulate further innovation within the realm of medical AI.

Background

With the success of Visual Question Answering (VQA) in the general domain [1,2], there is growing interest in adapting this AI technology to medical imaging [3-10], especially chest X-ray images [11-13]. Medical VQA is designed to support diagnostic radiologists by providing answers to image-based medical questions, thereby reducing their workload and serving as a complementary tool. The use of natural language responses allows these systems to not only streamline radiologists' workflows [14] but also make medical expertise more accessible, especially in underserved areas [15]. Furthermore, this capability to communicate complex medical information in an easily understandable manner enhances patient engagement [13,14].

Despite several existing medical VQA datasets focused on chest X-ray images, they often fall short in terms of complexity, diversity, and scalability. Many are confined to simplistic questions with a limited range of templates, lack diverse paraphrases, and contain insufficient QA samples for comprehensive evaluation. Moreover, these limitations impede the improvement of medical VQA models in performance and robustness, particularly in addressing detailed diagnostic queries crucial for medical practice. To bridge these gaps, we introduce the MIMIC-Ext-MIMIC-CXR-VQA dataset. Designed with complex set and logical operations, it includes 48 unique templates and approximately 377,000 entries, making it a comprehensive, diverse, and large-scale resource tailored for Visual Question Answering tasks within the medical domain.

Methods

Data Preprocessing

We use MIMIC-CXR [15,16] as our image source and Chest ImaGenome [17] for label information. In MIMIC-CXR, each patient can have multiple studies arranged in chronological order, and each study can contain multiple CXR images. From each study, we select one representative frontal view (i.e., AP or PA) image. We then assign labels to these images derived from the Chest ImaGenome silver/gold datasets. As a result, each CXR image features 563 distinct relations among 36 objects, each linked to several attributes from a pool of 68 attributes (across 5 categories: anatomical findings, diseases, devices, tubes/lines, and technical assessment). Each relation indicates the presence (1) or absence (0) of an attribute (e.g., lung cancer) within a category (e.g., disease), linked to an object (e.g., left lung). For data splitting, we use the machine-generated silver label dataset for training and validation, with a 95:5 split, while the human-labeled gold dataset serves as the test dataset.

Question Template Construction

We started by analyzing existing medical VQA datasets [3-8] and templated their questions to match our preprocessed data schema (i.e., object, attribute, category), thus handcrafting our initial seed templates. We drew inspiration from general VQA datasets [1,2,18,19], enhancing these seed templates using logical and set operations to create a more diverse and complex set of question templates. We further incorporated clinically relevant factors [7] into our templates, such as the patient's gender (i.e., male, female), CXR view position (i.e., PA, AP), and size-related features (i.e., width ratio between two anatomical locations). As a result, we defined a total of 48 templates, all of which were evaluated by a medical expert for clinical importance. These templates fall into 3 semantic types (defining the response required: verify for yes/no, choose for selection from options, and query for information retrieval) and 7 content types (classifying the question's focus: presence, anatomy, attribute, abnormality, size, plane, and gender).

Here are the details about semantic and content types:

Semantic Types:

Verify: Yes/no questions.
Choose: Questions that require selecting the answer from given choices.
Query: Open-ended questions.

Content Types:

Presence: Yes/No questions inquiring about the presence of attributes or categories.
Anatomy: Open-ended questions that should be answered with anatomical locations.
Attribute: Open-ended questions that ask about attributes or categories.
Abnormality: Questions related to abnormalities, defined as a superset of four categories: anatomical finding, disease, device, and tubes/lines.
Size: Questions on two clinically significant measurements: the cardiothoracic ratio (CTR) and the mediastinal-thoracic ratio (MTR).
Plane: Questions involving the determination of the view position in a radiograph (PA or AP).
Gender: Questions about identifying gender from the images (male or female).

Table 1: Full list of 48 VQA question templates in MIMIC-Ext-*MIMIC-CXR-VQA*
Index	Semantic Type	Content Type	Question Template
1	verify	presence	Are there any ${category} in the ${object}?
2	verify	presence	Is there ${attribute} in the ${object}?
3	verify	abnormality	Is the ${object} abnormal?
4	verify	presence	Are there any ${category_1} or ${category_2} in the ${object}?
5	verify	presence	Are there both ${attribute_1} and ${attribute_2} in the ${object}?
6	verify	presence	Is there either ${attribute_1} or ${attribute_2} in the ${object}?
7	query	attribute	List all ${category} in the ${object}.
8	query	abnormality	List all abnormalities in the ${object}.
9	query	attribute	List all ${category_1} and ${category_2} in the ${object}.
10	choose	attribute	Which ${category} is related to the ${object}, ${attribute_1} or ${attribute_2}?
11	verify	abnormality	Are there any abnormalities in either the ${object_1} or the ${object_2}?
12	verify	abnormality	Are there any abnormalities in both the ${object_1} and the ${object_2}?
13	query	attribute	List all ${category} in either the ${object_1} or the ${object_2}.
14	query	attribute	List all common ${category} in both the ${object_1} and the ${object_2}.
15	query	attribute	List all ${category} only in the ${object_1} but not in the ${object_2}.
16	query	abnormality	List all abnormalities in either the ${object_1} or the ${object_2}.
17	query	abnormality	List all common abnormalities in both the ${object_1} and the ${object_2}.
18	query	abnormality	List all abnormalities only in the ${object_1} but not in the ${object_2}.
19	verify	presence	Are there any ${category}?
20	verify	abnormality	Are there any abnormalities?
21	verify	presence	Are there any ${category_1} or ${category_2}?
22	verify	presence	Is there ${attribute}?
23	verify	presence	Are there both ${attribute_1} and ${attribute_2}?
24	verify	presence	Is there either ${attribute_1} or ${attribute_2}?
25	query	attribute	List all ${category}.
26	query	attribute	List all ${category_1} and ${category_2}.
27	query	abnormality	List all abnormalities.
28	choose	attribute	Which ${category} is related, ${attribute_1} or ${attribute_2}?
29	verify	presence	Are both the ${object_1} and the ${object_2} related to ${attribute}?
30	verify	presence	Is either the ${object_1} or the ${object_2} related to ${attribute}?
31	query	anatomy	List all anatomical locations related to ${attribute}.
32	choose	anatomy	Which anatomical location is related to ${attribute}, the ${object_1} or the ${object_2}?
33	choose	abnormality	Which anatomical location is abnormal, the ${object_1} or the ${object_2}?
34	query	anatomy	List all anatomical locations related to either ${attribute_1} or ${attribute_2}.
35	query	anatomy	List all common anatomical locations related to both ${attribute_1} and ${attribute_2}.
36	query	anatomy	List all anatomical locations related to ${attribute_1} but not ${attribute_2}.
37	verify	presence	Are there any ${category} related to the ${object_1} and the ${object_2}?
38	verify	presence	Are there any ${category} related to the ${object_1} or the ${object_2}?
39	query	anatomy	List all anatomical locations related to any ${category}.
40	query	anatomy	List all anatomical locations related to any ${category_1} or ${category_2}.
41	verify	plane	Is this an ${viewpos} view?
42	choose	plane	Which view is in this image, AP or PA?
43	query	plane	What is the view of this image?
44	verify	gender	Is this patient ${gender}?
45	choose	gender	What is the gender of this patient, male or female?
46	query	gender	What is the gender of this patient?
47	verify	size	Is the width of the cardiac silhouette wider than 1/2 of the thorax width?
48	verify	size	Is the width of the upper mediastinum wider than 1/3 of the thorax width?

VQA Dataset Generation

We generated our VQA dataset by sampling (image I, question Q, answer A) triples. For example, consider the template “Is there ${attribute} in the ${object}?”. We filled in this template using sampled arguments (e.g., ${object}=‘left lung’, ${attribute}=‘lung cancer’), which led to the creation of the question Q: “Is there lung cancer in the left lung?”. Next, we sampled an image I and executed a predefined program to generate an answer A. For each template, we defined a program to produce an answer A using the given question Q and relationship information from the preprocessed data of image I. To enrich linguistic diversity while preserving a focus on the medical domain [20], we devised a paraphrasing strategy (an average of 16.5 paraphrases per template) using carefully designed prompts based on GPT-4 [21].

Table 2: Sample Questions by Content Type in MIMIC-Ext-*MIMIC-CXR-VQA*
Content Type	Sample Question
presence	Does the cardiac silhouette show any evidence of diseases or devices?
anatomy	What are all anatomical locations where both infiltration and interstitial lung diseases can be found?
attribute	List all detected anatomical findings.
abnormality	Are there signs of abnormalities in both the left lung and the right lung?
size	Is the cardiac silhouette's width larger than half of the total thorax width?
plane	Is this X-ray image in the AP or PA view?
gender	Please specify the patient’s gender.

Data Description

The MIMIC-Ext-MIMIC-CXR-VQA dataset consists of 377,391 unique (Image, Question, Answer) triples. It utilizes 48 seed templates, which incorporate elements from set/logical operations. Through paraphrasing, these expand into 794 distinct templates. These templates are categorized into 3 semantic types (verify, choose, and query) and 7 content types (presence, anatomy, attribute, abnormality, size, plane, and gender). Detailed descriptions of each category are provided in the 'Question Template Construction' section.

Dataset statistics

Table 3: Overall Statistics of MIMIC-Ext-*MIMIC-CXR-VQA*
	Training	Validation	Test
# of Images	133,687	8,610	500
# of Questions	132,387	31,148	7,565
# of Answers	6,628	2,508	700
# of Samples	290,031	73,567	13,793

Table 4: Statistics of MIMIC-Ext-*MIMIC-CXR-VQA* by Semantic Type
Semantic Type	Training	Validation	Test
verify	162,689 (56.1%)	39,336 (53.5%)	6,945 (50.4%)
choose	28,560 (9.8%)	7,806 (10.6%)	1,523 (11.0%)
query	98,782 (34.1%)	26,425 (35.9%)	5,325 (38.6%)

Table 5: Statistics of MIMIC-Ext-*MIMIC-CXR-VQA* by Content Type
Content Type	Training	Validation	Test
presence	109,455 (37.7%)	26,153 (35.5%)	4,566 (33.1%)
anatomy	37,952 (13.1%)	10,210 (13.9%)	1,963 (14.2%)
attribute	49,948 (17.2%)	13,111(17.8%)	2,578(18.7%)
abnormality	60,692 (20.9%)	16,109 (21.9%)	3,199 (23.2%)
size	16,000 (5.5%)	4,000 (5.4%)	705 (5.1%)
plane	7,992 (2.8%)	1,992 (2.7%)	386 (2.8%)
gender	7,992 (2.8%)	1,992 (2.7%)	396 (2.9%)

File and Structure

The dataset is divided into training, validation, and test sets. For the training and validation sets, we use a machine-generated silver label dataset, employing a 95:5 split ratio. The test set consists of a human-labeled gold label dataset. In the process of splitting the silver dataset, we ensure an even distribution of abnormal images (studies with at least one attribute present) and normal images (studies without any attributes). Each dataset split is provided in JSON format: train.json, valid.json, and test.json.

Directory Structure

mimiccxrvqa
└── dataset
    ├── train.json
    ├── valid.json
    └── test.json

File Format and Contents

The QA samples in the MIMIC-Ext-MIMIC-CXR-VQA dataset are stored in individual .json files. Each file contains a list of Python dictionaries, with keys and corresponding data types as follows:

split: A string indicating the dataset split (train, valid, test).
idx: An integer indicating the instance index.
subject_id: A string indicating the subject's unique ID (patient ID).
study_id: A string indicating the corresponding study ID.
image_id: A string indicating the associated image ID.
image_path: A string indicating the corresponding image path.
question: A question string.
semantic_type: A string indicating its semantic type, which can be one of the following: 'verify', 'choose', 'query'.
content_type: A string indicating its content type, which can be one of the following: 'presence', 'anatomy', 'attribute', 'abnormality', 'plane', 'gender', 'size'.
template: A template string.
template_program: A string indicating its template program. Each template has a unique program to derive its answer from the database.
template_arguments: A dictionary specifying its template arguments, consisting of five sub-dictionaries that represent the sampled values for arguments in the template. When an argument needs to appear multiple times in a question template, an index is appended to the dictionary key. This includes 'object', 'attribute', 'category', 'viewpos', 'gender'.
answer: A list of strings containing the answers. If 'choose' and 'query' questions yield no answers, this list will be empty.

To be specific, here is the example instance:

{
    "split":"train",
    "idx":13280,
    "subject_id":"15569663",
    "study_id":"51014827",
    "image_id":"2b8cc8f3-d8c51958-3023a0e5-10571272-6d3d0f04",
    "image_path":"p15/p15569663/s51014827/2b8cc8f3-d8c51958-3023a0e5-10571272-6d3d0f04.jpg",
    "question":"Is there any sign of both pleural effusion and copd/emphysema present in the right lower lung zone?",
    "semantic_type":"verify",
    "content_type":"presence",
    "template":"Is there any sign of both ${attribute_1} and ${attribute_2} present in the ${object}?",
    "template_program":"program_5",
    "template_arguments":{
        "object":{
            "0":"right lower lung zone"
        },
        "attribute":{
            "0":"pleural effusion",
            "1":"copd/emphysema"
        },
        "category":{},
        "viewpos":{},
        "gender":{}
    },
    "answer": ...
}

Usage Notes

Dataset Utility

The MIMIC-Ext-MIMIC-CXR-VQA dataset is recognized for its complexity, diversity, and scale, making it suitable for a variety of applications within healthcare and artificial intelligence, such as:

Medical Visual Question Answering (VQA) Benchmark [22]: It serves as an extensive benchmark for evaluating VQA methods, providing a robust foundation for the development and testing of VQA approaches in the medical field.
Instruction Dataset for Medical Vision-and-Language Models (VLMs) [23]: This dataset serves as a resource for instruction tuning of medical VLMs, aiding in the development of advanced AI systems that pave the way for automated CXR interpretation.
Components for Image-based and Multi-modal EHR QA [22,24]: MIMIC-Ext-MIMIC-CXR-VQA provides diverse question templates associated with patients' medical images for electronic health records (EHR) question answering. By connecting these medical images with the corresponding patients' structured EHRs [25], the dataset becomes invaluable for the development of multi-modal EHR AI frameworks. These frameworks improve the integration of imaging modalities and structured EHRs, enhancing the interpretation of complex medical records in hospital databases and supporting a broad spectrum of EHR question answering applications.

Known Limitations

Despite its expansive collection of (Image, Question, Answer) triples, the dataset encounters accuracy challenges due to potential label errors from the Chest ImaGenome source dataset, despite featuring the most detailed labeling available. To mitigate these label errors, we employ gold labels from the Chest ImaGenome, which were manually refined by four radiologists, within the test dataset of MIMIC-Ext-MIMIC-CXR-VQA. However, label uncertainties or ambiguities [26,27] may still arise, especially concerning subjective findings in MIMIC-CXR reports.

GitHub Repository for this Project

The dataset's creation code is accessible on GitHub at [28].

Release Notes

This is version 1.0.0 of the MIMIC-Ext-MIMIC-CXR-VQA dataset. For any questions or concerns regarding this dataset, please feel free to reach out to us (seongsu@kaist.ac.kr or kyungdaeun@kaist.ac.kr). We appreciate your interest and are eager to assist.

Ethics

The authors have no ethical concerns to declare.

Acknowledgements

This work was (partially) supported by Microsoft Research Asia, Institute of Information & Communications Technology Planning & Evaluation (IITP) grant (No.2019-0-00075, RS-2022-00155958), National Research Foundation of Korea (NRF) grant (NRF-2020H1D3A2A03100945), and the Korea Health Industry Development Institute (KHIDI) grant (No.HR21C0198), funded by the Korea government (MSIT, MOHW).

Conflicts of Interest

The authors have no conflicts of interest to declare.

References

Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Zitnick, C. L., & Parikh, D. (2015). Vqa: Visual question answering. In Proceedings of the IEEE international conference on computer vision (pp. 2425-2433).
Hudson, D. A., & Manning, C. D. (2019). Gqa: A new dataset for real-world visual reasoning and compositional question answering. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6700-6709).
Abacha, A. B., Hasan, S. A., Datla, V. V., Liu, J., Demner-Fushman, D., & Müller, H. (2019). VQA-Med: Overview of the medical visual question answering task at ImageCLEF 2019. CLEF (working notes), 2(6).
Ben Abacha, A., Sarrouti, M., Demner-Fushman, D., Hasan, S. A., & Müller, H. (2021). Overview of the vqa-med task at imageclef 2021: Visual question answering and generation in the medical domain. In Proceedings of the CLEF 2021 Conference and Labs of the Evaluation Forum-working notes. 21-24 September 2021.
Hasan, S. A., Ling, Y., Farri, O., Liu, J., Müller, H., & Lungren, M. P. (2018, September). Overview of ImageCLEF 2018 Medical Domain Visual Question Answering Task. In CLEF (Working Notes).
He, X., Cai, Z., Wei, W., Zhang, Y., Mou, L., Xing, E., & Xie, P. (2020). Pathological visual question answering. arXiv preprint arXiv:2010.12435.
Lau, J. J., Gayen, S., Ben Abacha, A., & Demner-Fushman, D. (2018). A dataset of clinically generated visual questions and answers about radiology images. Scientific data, 5(1), 1-10.
Liu, B., Zhan, L. M., Xu, L., Ma, L., Yang, Y., & Wu, X. M. (2021, April). Slake: A semantically-labeled knowledge-enhanced dataset for medical visual question answering. In 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI) (pp. 1650-1654). IEEE.
Huang, J., Chen, Y., Li, Y., Yang, Z., Gong, X., Wang, F. L., ... & Liu, W. (2023). Medical knowledge-based network for patient-oriented visual question answering. Information Processing & Management, 60(2), 103241.
Huang, Y., Wang, X., Liu, F., & Huang, G. (2022, July). OVQA: A clinically generated visual question answering dataset. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 2924-2938).
Moon, J. H., Lee, H., Shin, W., Kim, Y. H., & Choi, E. (2022). Multi-modal understanding and generation for medical images and text via vision-language pre-training. IEEE Journal of Biomedical and Health Informatics, 26(12), 6070-6080.
Hu, X., Gu, L., Kobayashi, K., An, Q., Chen, Q., Lu, Z., ... & Zhu, Y. (2023). Interpretable medical image visual question answering via multi-modal relationship graph learning. arXiv preprint arXiv:2302.09636.
Kovaleva, O., Shivade, C., Kashyap, S., Kanjaria, K., Wu, J., Ballah, D., ... & Mukherjee, V. M. (2020, July). Towards visual dialog for radiology. In Proceedings of the 19th SIGBioMed workshop on biomedical language processing (pp. 60-69).
Lin, Z., Zhang, D., Tao, Q., Shi, D., Haffari, G., Wu, Q., ... & Ge, Z. (2023). Medical visual question answering: A survey. Artificial Intelligence in Medicine, 102611.
Johnson, A. E., Pollard, T. J., Greenbaum, N. R., Lungren, M. P., Deng, C. Y., Peng, Y., ... & Horng, S. (2019). MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs. arXiv preprint arXiv:1901.07042.
Johnson, A. E., Pollard, T. J., Berkowitz, S. J., Greenbaum, N. R., Lungren, M. P., Deng, C. Y., ... & Horng, S. (2019). MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data, 6(1), 317.
Wu, J. T., Agu, N. N., Lourentzou, I., Sharma, A., Paguio, J. A., Yao, J. S., ... & Moradi, M. (2021). Chest imagenome dataset for clinical reasoning. arXiv preprint arXiv:2108.00316.
Gokhale, T., Banerjee, P., Baral, C., & Yang, Y. (2020, August). Vqa-lol: Visual question answering under the lens of logic. In European conference on computer vision (pp. 379-396). Cham: Springer International Publishing.
Johnson, J., Hariharan, B., Van Der Maaten, L., Fei-Fei, L., Lawrence Zitnick, C., & Girshick, R. (2017). Clevr: A diagnostic dataset for compositional language and elementary visual reasoning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2901-2910).
Nori, H., King, N., McKinney, S. M., Carignan, D., & Horvitz, E. (2023). Capabilities of gpt-4 on medical challenge problems. arXiv preprint arXiv:2303.13375.
Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., ... & McGrew, B. (2023). Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
Bae, S., Kyung, D., Ryu, J., Cho, E., Lee, G., Kweon, S., ... & Choi, E. (2024). EHRXQA: A multi-modal question answering dataset for electronic health records with chest x-ray images. Advances in Neural Information Processing Systems, 36.
Chen, Z., Varma, M., Delbrouck, J. B., Paschali, M., Blankemeier, L., Van Veen, D., ... & Langlotz, C. (2024). CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation. arXiv preprint arXiv:2401.12208.
Kang, S., Kim, D., Kim, J., Lee, H. K., & Hwang, S. J. (2024). WoLF: Large Language Model Framework for CXR Understanding. arXiv preprint arXiv:2403.15456.
Johnson, A. E., Bulgarelli, L., Shen, L., Gayles, A., Shammout, A., Horng, S., ... & Mark, R. G. (2023). MIMIC-IV, a freely accessible electronic health record dataset. Scientific data, 10(1), 1.
Brady, A., Laoide, R. Ó., McCarthy, P., & McDermott, R. (2012). Discrepancy and error in radiology: concepts, causes and consequences. The Ulster medical journal, 81(1), 3.
Brady, A. P. (2017). Error and discrepancy in radiology: inevitable or avoidable?. Insights into imaging, 8, 171-182.
MIMIC-Ext-MIMIC-CXR-VQA. Available from: https://github.com/baeseongsu/mimic-cxr-vqa