Name: RadGraph2: Tracking Findings Over Time in Radiology Reports
Published: Aug. 8, 2024
License: https://github.com/MIT-LCP/license-and-dua/tree/master/drafts

Database Credentialed Access

Adam Dejl , Sameer Khanna , Patricia Therese Pile , Kibo Yoon , Steven QH Truong , Hanh Duong , Agustina Saenz , Pranav Rajpurkar

Published: Aug. 8, 2024. Version: 1.0.0

When using this resource, please cite: (show more options)
Dejl, A., Khanna, S., Pile, P. T., Yoon, K., Truong, S. Q., Duong, H., Saenz, A., & Rajpurkar, P. (2024). RadGraph2: Tracking Findings Over Time in Radiology Reports (version 1.0.0). PhysioNet. https://doi.org/10.13026/q65y-9688.

MLA	Dejl, Adam, et al. "RadGraph2: Tracking Findings Over Time in Radiology Reports" (version 1.0.0). PhysioNet (2024), https://doi.org/10.13026/q65y-9688.
APA	Dejl, A., Khanna, S., Pile, P. T., Yoon, K., Truong, S. Q., Duong, H., Saenz, A., & Rajpurkar, P. (2024). RadGraph2: Tracking Findings Over Time in Radiology Reports (version 1.0.0). PhysioNet. https://doi.org/10.13026/q65y-9688.
Chicago	Dejl, Adam, Khanna, Sameer, Pile, Patricia Therese, Yoon, Kibo, Truong, Steven QH, Duong, Hanh, Saenz, Agustina, and Pranav Rajpurkar. "RadGraph2: Tracking Findings Over Time in Radiology Reports" (version 1.0.0). PhysioNet (2024). https://doi.org/10.13026/q65y-9688.
Harvard	Dejl, A., Khanna, S., Pile, P. T., Yoon, K., Truong, S. Q., Duong, H., Saenz, A., and Rajpurkar, P. (2024) 'RadGraph2: Tracking Findings Over Time in Radiology Reports' (version 1.0.0), PhysioNet. Available at: https://doi.org/10.13026/q65y-9688.
Vancouver	Dejl A, Khanna S, Pile P T, Yoon K, Truong S Q, Duong H, Saenz A, Rajpurkar P. RadGraph2: Tracking Findings Over Time in Radiology Reports (version 1.0.0). PhysioNet. 2024. Available from: https://doi.org/10.13026/q65y-9688.

Additionally, please cite the original publication:

Khanna, S., Dejl, A., Yoon, K., Truong, S. Q., Duong, H., Saenz, A., & Rajpurkar, P. (2023). RadGraph2: Modeling Disease Progression in Radiology Reports via Hierarchical Information Extraction. In Machine Learning for Healthcare Conference (pp. 381-402). PMLR.

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

APA	Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
MLA	Goldberger, A., et al. "PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220." (2000).
CHICAGO	Goldberger, A., L. Amaral, L. Glass, J. Hausdorff, P. C. Ivanov, R. Mark, J. E. Mietus, G. B. Moody, C. K. Peng, and H. E. Stanley. "PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220." (2000).
HARVARD	Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P.C., Mark, R., Mietus, J.E., Moody, G.B., Peng, C.K. and Stanley, H.E., 2000. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
VANCOUVER	Goldberger A, Amaral L, Glass L, Hausdorff J, Ivanov PC, Mark R, Mietus JE, Moody GB, Peng CK, Stanley HE. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

Abstract

RadGraph2 is a dataset of 800 chest radiology reports annotated using a fine-grained entity-relationship schema, which is an expanded version of the previously introduced RadGraph dataset. In contrast with the previous approaches and the original RadGraph, the new version of the used information extraction schema is designed to capture not only the key findings and their context but also the mentions of changes that occurred between the prior radiology examinations and the more recent study. These changes may include the appearance of new conditions affecting the patient, their progression, or the differences in the setup of the observed supporting devices. The information extracted from each report is represented in the form of a knowledge graph composed of clinically relevant entities and relations, which makes it easily amenable to automated processing. In addition to the dataset of manually labeled reports, we release more than 220,000 reports automatically annotated by our benchmark model. This model achieved an F1 micro performance of 0.88 and 0.74 on two differently sourced withheld test sets (from MIMIC-CXR-JPG and CheXpert, respectively). We believe that RadGraph2 could facilitate the development of clinically useful systems for the automated processing of radiology reports, particularly those reasoning about the evolution of a patient’s state over time.

Background

Automated information extraction from radiology notes has a wide range of useful applications, including training of medical imaging models, automated analysis of trends in patient care and the development of assistive tools for referring clinicians highlighting the key findings from each report. A key prerequisite for the development of systems performing such extraction is the availability of suitably labeled data.

One of the largest datasets relevant to this task are MIMIC-CXR-JPG [1] and CheXpert [2], collections of chest radiographs paired with the corresponding labels. These labels were derived using an automated rule-based classifier and are rather coarse-grained, merely indicating positive, negative or uncertain findings of a predetermined set of common diseases and medical conditions. Other approaches employed more complex information extraction schemas, enabling them to model the significant entities [3], human-understandable facts [4] or spatial relationships [5] described in the radiology report. Our work, in particular, builds on the RadGraph dataset [6], which models information from radiology reports in the form of graphs, allowing it to represent both the key entities mentioned in the report and the relationships between these entities.

However, despite the granularity offered by some of the above-mentioned approaches, they all focused on detecting conditions in a single scan or instance rather than on tracking disease progression across time. As such, these approaches are unable to capture information from comparisons to priors, one of the key components of radiology reports in which the radiologists contrast the findings in the most recent study to the previous examinations. These comparisons can provide highly useful information on the progression of a patient’s medical conditions and their general healthcare trajectory. There is, therefore, a need for AI tools that can not only interpret findings in a single study but also understand how a patient’s condition has changed since the prior examinations. Currently, a major obstacle to developing such systems is the lack of labeled data.

We release the first dataset with dense annotations of both entities and relations in radiology reports that includes fine-grained characterisation of changes from priors and their context.

Methods

Base Dataset

In order to be able to capture detailed information about the findings and changes described in radiology reports, we develop a novel hierarchical schema for entities and relations, building upon the original RadGraph schema [6]. In its original formulation, the schema was designed to maximize the coverage and retention of clinically relevant information in the reports while maintaining simplicity for quick and reliable labeling. Our extended schema adheres to these design principles while introducing additional entity types to represent various kinds of changes.

We developed our schema in an iterative fashion. In each iteration, we devised a set of entities for modeling different change types, labeled several reports using the new version of the schema and gathered feedback from medical practitioners. During this process, we paid special attention to cases in which the used schema led to ambiguity or failed to capture information from annotated notes reliably. Based on the feedback received, we revised the schema and continued with further iterations until we were satisfied with its coverage, faithfulness, and reliability. The full description of the final data model is given in the data description section.

For the re-annotation of the reports from the original RadGraph dataset according to our new schema, we liaised with a team of four board-certified radiologists and one academic hospitalist. The 600 reports from the RadGraph development and test datasets were imported into a specialized text labeling platform Datasaur and split among the available annotators. The labelers were instructed to focus on accurately identifying the mentions of change and the associated relations while making minimal modifications to the entities and relations not associated with any changes, as we considered those to have been labeled with sufficient quality. Nevertheless, the annotators were free to correct blatant mistakes in any aspect of the annotations.

Apart from re-annotating the 600 reports from the original RadGraph development and test sets, we also extended RadGraph2 with 200 additional reports randomly sampled from the MIMIC-CXR portion of the RadGraph inference dataset. To simplify and expedite the task, we based the initial labels for these reports on the output of the RadGraph Benchmark model included in the inference set. The labelers were instructed to label the entities and relations associated with changes, as well as to correct any possible mistakes or deficiencies in the entities and relations identified by the benchmark model.

Inference Dataset

In addition to the base dataset, we also release more than 220,000 reports automatically annotated by our best-performing benchmark model, HGIE. Unlike traditional text-to-graph models like DyGIE++ [7] where entities are not considered to have a type hierarchy, HGIE aims to take advantage of the inherently structured organization of our labels to improve information extraction performance. We have found that approaching the task from this frame leads to a boost in performance on entity and relation extraction tasks.

Our hierarchical recognition (HR) system used for annotating the inference reports utilizes an entity taxonomy. We note that there are inherent relationships between the various entities used to label our graphs. For example, CHAN-CON-WOR and CHAN-CON-AP are both entities that refer to changes in a patient's condition. Taking advantage of these relationships between the various entities, we construct an entity taxonomy tree to model relationships.

We use a BERT-based model as our backbone for the HR task, with the objective of extracting 12 scalar outputs (one for each entity category). Each output is assumed to represent the conditional probability of an entity being true given that its parent in the entity hierarchy is true. During the inference phase, however, all the entities should be unconditionally predicted. Note that under such a training regimen, a trained model's unconditional probabilities can be calculated from the output using the probability chain rule via a simple application of the Bayes rule. This is taken advantage of via our two phase training regime.

In the first phase of entity training, the HR system is trained on data under the condition that its parent class is positive. This follows what has been done in other work on hierarchical classification. The intention behind this training regime is that it directly models the conditional probabilities of the entities by learning the dependent relationships between parent and child entities and concentrating on distinguishing lower-level labels, in particular the leaf entities.

The next phase aims at improving the accuracy of unconditional probability predictions, which is used during inference and is thus critical to classification performance. To achieve this, we finetune our hierarchically trained network on the full dataset using a standard categorical cross-entropy loss function and smaller learning rates. This training stage aims at improving the capacity of the network in predicting parent-level labels, which could be either positive or negative. Our final model achieved F1 micro scores of 0.88 and 0.74 on the MIMIC-CXR-JPG and CheXpert portions of our test set, respectively, suggesting respectable performance on labeling unseen reports.

Data Description

Information Schema

The information schema of RadGraph2 specifies the different types of entities and relations used for representing the content of each report in a structured form. While entities refer to the different concepts mentioned in the note (e.g. anatomical regions or medical conditions), relations indicate different kinds of relationships between these concepts. We define three general entity types ("anatomy", "observation" and "change") as well as three relation types ("modify", "located at" and "suggestive of"). The entity types are subdivided into further subtypes forming a hierarchy.

Entities

Entities are objects associating contiguous spans of tokens with their associated entity types. Our schema retains all of the entity types used by original RadGraph [6] while also adding new entity types for describing changes. The full hierarchy of entities employed in our schema along with examples is given below (note that only the leaf entity types are used as labels):

Change entities (CHAN) mark various types of change.
- No change entities (CHAN-NC) indicate a lack of change since the prior study.
  - Example: Moderately severe bibasilar atelectasis persists.
- Change in medical condition entities (CHAN-CON) mark various changes in the state of patient's medical conditions.
  - Condition appearance entities (CHAN-CON-AP) indicate that, compared to the previous report(s), a new adverse medical condition has been observed in the given patient.
    - Example: There is also a new left basilar opacity blunting the lateral costophrenic angle (...)
  - Condition worsening entities (CHAN-CON-WOR) indicate a worsening in a certain aspect of the patient’s clinical state compared to the prior.
    - Example: Mild - to - moderate diffuse pulmonary edema is slightly worse.
  - Condition improvement entities (CHAN-CON-IMP) indicate an improvement in a certain aspect of the patient’s clinical state compared to the prior.
    - Example: Compared to the most recent study, there is improvement in the mild pulmonary edema and decrease in the small left pleural effusion.
  - Condition resolution entities (CHAN-CON-RES) indicate that, compared to the previous report(s), a certain medical condition previously observed in the patient has completely resolved.
    - Example: Indistinct superior segment left lower lobe opacities have resolved.
- Change in medical devices entities (CHAN-DEV) mark changes related to supporting devices and tubes used by the patient.
  - Device appearance entities (CHAN-DEV-AP) indicate that, compared to the previous report(s), the patient has been fitted with a new medical device or tool.
    - Example: The patient has received the new feeding tube.
  - Change in device placement entities (CHAN-DEV-PLACE) indicate that the position of a medical device in the body of a patient changed compared to prior studies.
    - Example: Left pleural drain has been advanced to the left apex.
  - Device disappearance entities (CHAN-DEV-DISA) indicate that, compared to the previous report(s), a medical device or tool was detached or removed from the patient.
    - Example: In the interval, the patient has been extubated (...)
Anatomy entities (ANAT) mark various anatomical body parts.
- Definitely present anatomy entities (ANAT-DP) mark all mentions of body parts or anatomical locations.
  - Example: The left lung is essentially clear. ("left" and "lung" are two separate ANAT-DP entities)
Observation entities (OBS) mark observations from the radiology images mentioned in the corresponding reports.
- Definitely present observation entities (OBS-DP) indicate observations recorded with high certainty.
  - Example: There is moderate cardiomegaly. ("moderate" and "cardiomegaly" are two separate OBS-DP entities)
- Uncertain observation entities (OBS-U) indicate observations which are uncertain.
  - Example: Infection cannot be excluded.
- Definitely absent observation entities (OBS-DA) indicate observations which could be excluded based on the report.
  - Example: There is no pneumothorax.

Relations

Relations in our schema are defined as directed edges between entities. Similarly to entities, each relation is associated with its corresponding label. In our work, we utilize the same three relation labels as in [6] with slight modifications to their definitions. These labels are defined as follows:

Modify relations (modify) indicate that the first entity modifies the second entity. The possible entity types connected by this relation are (OBS-*, OBS-*), (ANAT-DP, ANAT-DP), (CHAN-*, *) and (OBS-*, CHAN-*).
- Example: right lung ("right" →"lung")
Located at relations (located_at) connect anatomy and observation entities and indicate that an observation is related to anatomy. While it often refers to a location, it can be used to describe other relationships between an observation and an anatomy. This relation connects entities of type (OBS-*, ANAT-DP).
- Example: lungs are clear ("clear" →"lungs")
Suggestive of relations (suggestive_of) indicate that the status of the second entity is derived from the first entity. The possible entity types connected by this relation are (OBS-*, OBS-*), (CHAN-*, OBS-*) and (OBS-*, CHAN-*).
- Example: The opacity may indicate pneumonia ("opacity" → "pneumonia")

Dataset Overview

The manually labeled reports in the dataset are split into three partitions — the training set containing 575 reports, the development set composed of 75 reports and the test set consisting of 150 reports. The sets of patients associated with the reports in each of these partitions are disjoint. Additionally, our partitioning retains the placement of reports from the original RadGraph. All of the protected health information in the reports is deidentified, in the same way as in original RadGraph. Apart from the manually annotated reports, we also release 220,000+ reports labeled by our best-performing model.

Files Description

We release the following files:

README.md: Contains a brief description of the dataset package
train.json: File containing 575 labeled training reports, all sourced from MIMIC-CXR-JPG
dev.json: File containing 75 labeled development reports, all sourced from MIMIC-CXR-JPG
test.json: File containing 150 labeled test reports, 100 from MIMIC-CXR-JPG and 50 from CheXpert
inference-chexpert.json: File containing 500 reports from CheXpert with labels generated by our benchmark model
inference-mimic.json: File containing 227,068 reports from MIMIC-CXR-JPG with labels generated by our benchmark model

File Format

We utilize a file format analogical to the original RadGraph dataset. Each JSON file contains a dictionary with the following structure:

The keys of the dictionary indicate the identifiers of the individual reports. For MIMIC-CXR-JPG reports, these identifiers take the form of <folder ID>/<patient ID>/<study ID>.txt. For CheXpert, the reports keys are simple numerical identifiers. Each key of the dictionary maps to a nested dictionary structure containing the report metadata and the associated entity and relation labels.
The nested dictionary has the following keys:
- "text": The full text of the report.
- "data_split": Indicates the split of the data. Either "train", "dev", "test" or "inference".
- "data_source": Indicates the original source of the report. Either "MIMIC-CXR" or "CheXpert".
- "is_original": A boolean flag indicating reports included in the original RadGraph dataset. False for the new reports added in RadGraph2. This key is not included in the additional inference data.
- "entities": A dictionary containing data about the entity and relation labels. The keys of the dictionary indicate the numerical identifiers of the individual entities, while nested dictionaries encapsulate further data about each entity. These dictionaries have the following keys:
  - "tokens": Contains the tokens the entity is composed of
  - "label": Indicates the entity label, one of the options described above.
  - "start_ix": Indicates the index of the first token of the entity (indexes start at zero).
  - "end_ix": Indicates the index of the last token of the entity (indexes start at zero). If the entity has only one token, "start_ix" and "end_ix" values are identical.
  - "relations": Contain a list of relations emanating from the given entity. Each list element represents one such relation in the form of a tuple, with the first element of the tuple indicating the relation label and the second element of the tuple indicating the ID of the target entity.

For illustration, we provide an outline of the JSON file structure:

{
	"<report ID>": {
		"text": "<text of the report>",
		"data_split": "<(train|dev|test)>",
		"data_source": "<(MIMIC-CXR|CheXpert)>"
		"is_original": <(True|False)>,
		"entities": {
			"<entity ID>": {
				"tokens": "<entity tokens string>",
				"label": "<entity label, one of the options described above>",
				"start_ix": <start index of the entity in the text, as a number>,
				"end_ix": <end index of the entity in the text, as a number>,
				"relations": [
					[
						"<relation label, one of the options described above>",
						"<ID of the target entity>"
					]
				]
			}
		}
	}
}

Usage Notes

The released data could be used for the same purposes as the original RadGraph data (e.g., for training machine learning models for information extraction from radiology reports and automated labeling of radiology notes using our benchmark model), with the additional benefit of containing a larger number of manually labeled reports. Additionally, thanks to the newly introduced entity types, the data can also be useful for the development of systems capable of automatically characterizing disease progression in patients over time.

The data has the following limitations. (1) While care has been taken to make our change-capturing schema as clear as possible, there are still certain cases in which the labeling can be ambiguous. For example, a radiologist may describe a change with a certain degree of uncertainty, which cannot be directly modeled by our schema. (2) The radiology reports in the dataset are limited to notes from the MIMIC-CXR-JPG and CheXpert datasets, which may not be fully representative of the variety in radiology findings reporting across different healthcare facilities.

Release Notes

Version 1.0.0: Initial release

Ethics

Our dataset is constructed using previously publicly available deidentified datasets and our work thus did not require IRB approval. During the development of the dataset, we treated all used radiology notes as sensitive data and abided by the usage conditions of their source datasets.

Acknowledgements

We thank Eléa Bach, Valentina Carducci, Elaine Ye, Mengyao Zheng, Madhur Nayan and David Sontag for their help at the initial stage of the project. We would also like to acknowledge the Datasaur team for providing us with free access to their platform and support with importing our data.

* Adam Dejl and Sameer Khanna contributed equally to this resource

** Agustina Saenz and Pranav Rajpurkar contributed equally to this resource

Conflicts of Interest

The authors declare no conflicts of interest.

References

Johnson, A., Lungren, M., Peng, Y., Lu, Z., Mark, R., Berkowitz, S., & Horng, S. (2019). MIMIC-CXR-JPG - chest radiographs with structured labels (version 2.0.0). PhysioNet. https://doi.org/10.13026/8360-t248.
Irvin, J., Rajpurkar, P., Ko, M., Yu, Y., Ciurea-Ilcus, S., Chute, C., ... & Ng, A. Y. (2019, July). Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In Proceedings of the AAAI conference on artificial intelligence (Vol. 33, No. 01, pp. 590-597).
Sugimoto, K., Takeda, T., Oh, J. H., Wada, S., Konishi, S., Yamahata, A., ... & Matsumura, Y. (2021). Extracting clinical terms from radiology reports with deep learning. Journal of Biomedical Informatics, 116, 103729.
Steinkamp, J. M., Chambers, C., Lalevic, D., Zafar, H. M., & Cook, T. S. (2019). Toward complete structured information extraction from radiology reports using machine learning. Journal of digital imaging, 32, 554-564.
Datta, S., Si, Y., Rodriguez, L., Shooshan, S. E., Demner-Fushman, D., & Roberts, K. (2020). Understanding spatial language in radiology: Representation framework, annotation, and spatial relation extraction from chest X-ray reports using deep learning. Journal of biomedical informatics, 108, 103473.
Jain, S., Agrawal, A., Saporta, A., Truong, S. Q., Nguyen Duong, D., Bui, T., Chambon, P., Lungren, M., Ng, A., Langlotz, C., & Rajpurkar, P. (2021). RadGraph: Extracting Clinical Entities and Relations from Radiology Reports (version 1.0.0). PhysioNet. https://doi.org/10.13026/hm87-5p47.
Wadden, D., Wennberg, U., Luan, Y., & Hajishirzi, H. (2019, November). Entity, Relation, and Event Extraction with Contextualized Span Representations. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (pp. 5784-5789).