Resources


Model Credentialed Access

Shareable Artificial Intelligence to Extract Cancer Outcomes from Electronic Health Records for Precision Oncology Research

Kenneth Kehl, Pavel Trukhanov, Christopher Fong, Justin Jee, Karl Pichotta, Morgan Paul, Chelsea Nichols, Michele Waters, Nikolaus Schultz, Deborah Schrag

The DFCI-imaging-student and DFCI-medonc-student AI models for extracting cancer outcomes from imaging reports and medical oncologist notes from electronic health records.

Published: Oct. 24, 2024. Version: 1.0.0


Database Credentialed Access

EHRCon: Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records

Yeonsu Kwon, Jiho Kim, Gyubok Lee, Seongsu Bae, Daeun Kyung, Wonchul Cha, Tom Pollard, Alistair Johnson, Edward Choi

Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records

Published: Aug. 19, 2024. Version: 1.0.0


Database Credentialed Access

EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images

Seongsu Bae, Daeun Kyung, Jaehee Ryu, Eunbyeol Cho, Gyubok Lee, Sunjun Kweon, Jungwoo Oh, Lei JI, Eric Chang, Tackeun Kim, Edward Choi

We present EHRXQA, the first multi-modal EHR QA dataset combining structured patient records with aligned chest X-ray images. EHRXQA contains a comprehensive set of QA pairs covering image-related, table-related, and image+table-related questions.

question answering benchmark evaluation visual question answering electronic health records multi-modal question answering deep learning ehr question answering semantic parsing machine learning chest x-ray

Published: July 23, 2024. Version: 1.0.0


Challenge Credentialed Access

Discharge Me: BioNLP ACL'24 Shared Task on Streamlining Discharge Documentation

Justin Xu

Data for the "Discharge Me!" Shared Task on Streamlining Discharge Documentation for BioNLP ACL'24

generation bionlp acl discharge summary

Published: April 12, 2024. Version: 1.3


Database Credentialed Access

RadCoref: Fine-tuning coreference resolution for different styles of clinical narratives

Yuxiang Liao, Hantao Liu, Irena Spasic

RadCoref is a small subset of MIMIC-CXR with manually annotated coreference mentions and clusters. Based on the annotated data, we fine-tuned a deep neural model and used it to annotate the whole MIMIC-CXR dataset. Both data are available.

natural language processing coreference resolution radiology

Published: Jan. 30, 2024. Version: 1.0.0


Database Credentialed Access

BOLD, a blood-gas and oximetry linked dataset

João Matos, Tristan Struja, Jack Gallifant, Luis Filipe Nakayama, Marie Charpignon, Xiaoli Liu, Jaime dos Santos Cardoso, Leo Anthony Celi, An Kwok Wong

An open-source pulse oximetry and arterial blood gas dataset, derived from MIMIC-III, MIMIC-IV, and eICU-CRD

health equity pulse oximetry intensive care unit electronic health records

Published: Nov. 8, 2023. Version: 1.0


Database Credentialed Access

CLIP: A Dataset for Extracting Action Items for Physicians from Hospital Discharge Notes

James Mullenbach, Yada Pruksachatkun, Sean Adler, Jennifer Seale, Jordan Swartz, T Greg McKelvey, Yi Yang, David Sontag

Clinical action items annotated over MIMIC-III. 718 discharge summaries are labeled at a sentence- and character-level with multiple action labels including Appointment, Lab, Procedure, Medication, Imaging, Patient Instructions, and Other.

Published: June 21, 2021. Version: 1.0.0


Database Credentialed Access

EHR-DS-QA: A Synthetic QA Dataset Derived from Medical Discharge Summaries for Enhanced Medical Information Retrieval Systems

Konstantin Kotschenreuther

Dataset consisting of question and answer pairs synthetically generated from medical discharge summaries, designed to facilitate the training and development of large language models specifically tailored for healthcare applications

mimic-iv clinical question-answering medical discharge summaries large language models

Published: Jan. 11, 2024. Version: 1.0.0


Model Credentialed Access

Characterization of Stigmatizing Language in Medical Records

Keith Harrigian, Ayah Zirikly, Brant Chee, Alya Ahmad, Anne Links, Somnath Saha, Mary Catherine Beach, Mark Dredze

A suite of classifiers for detecting three types of stigmatizing language in electronic medical records. Trained on MIMIC-IV discharge notes.

clinical natural language processing domain transfer bias stigmatizing language large language models mimic

Published: Nov. 6, 2023. Version: 1.0.0


Database Credentialed Access

MedNLI for Shared Task at ACL BioNLP 2019

Chaitanya Shivade

Data for the MedNLI Shared Task at the 2019 ACL BioNLP 2019 Workshop on Biomedical Language Processing

natural language inference recognizing textual entailment mimic

Published: Nov. 28, 2019. Version: 1.0.1