Resources


Database Restricted Access

OpenOximetry Repository

Nicholas Fong, Michael Lipnick, Philip Bickler, John Feiner, Tyler Law

A repository of matched arterial oxygen and pulse oximeter readings obtained under controlled conditions, with high-frequency physiologic waveforms and skin color measurements.

Published: Feb. 19, 2025. Version: 1.1.0


Database Restricted Access

DREAMT: Dataset for Real-time sleep stage EstimAtion using Multisensor wearable Technology

Ke Wang, Jiamu Yang, Ayush Shetty, Jessilyn Dunn

Dataset for Real-time sleep stage EstimAtion using Multisensor wearable Technology

biomedical time series classification wearable sleep disorders

Published: Feb. 5, 2025. Version: 2.0.0


Database Restricted Access

LATTE-CXR: Locally Aligned TexT and imagE, Explainable dataset for Chest X-Rays

Elham Ghelichkhan, Tolga Tasdizen

This dataset includes bounding box-statement pairs for chest X-ray images, derived from radiologists’ eye-tracking data (for explainability) and annotations, for local visual-language models.

eye-tracking chest x-ray dataset automatically generated dataset caption-guided object detection localization image captioning with region-level description grounded radiology report generation phrase grounding xai multi-modal learning local visual-language models

Published: Feb. 4, 2025. Version: 1.0.0


Database Restricted Access

Application of Med-PaLM 2 in the refinement of MIMIC-CXR labels

Kendall Park, Rory Sayres, Andrew Sellergren, Tom Pollard, Fayaz Jamil, Timo Kohlberger, Charles Lau, Atilla Kiraly

This work further refines the labels associated with CheXpert in MIMIC-CXR-JPG 2.0.0 by filtering with Med-PaLM 2 followed by verification by manual review by three US board-certified radiologists.

mimic-cxr labels

Published: Feb. 4, 2025. Version: 1.0.0


Database Credentialed Access

MIMIC-IV-Ext-BHC: Labeled Clinical Notes Dataset for Hospital Course Summarization

Asad Aali, Dave Van Veen, Yamin Arefeen, Jason Hom, Christian Bluethgen, Eduardo Pontes Reis, Sergios Gatidis, Namuun Clifford, Joseph Daws, Arash Tehrani, Jangwon Kim, Akshay Chaudhari

This dataset presents a collection of preprocessed and labeled clinical notes derived from "MIMIC-IV-Note", and aims to facilitate the development of ML models focused on summarizing brief hospital courses (BHC) from clinical notes.

natural language processing clinical notes brief hospital course text summarization machine learning

Published: Feb. 3, 2025. Version: 1.2.0


Database Open Access

Synthetic Mention Corpora for Disease Entity Recognition and Normalization

Kuleen Sasse, John David Osborne

We present the Synthetic Mention Corpora for Disease Entity Recognition and Normalization, containing 128000 disease mentions from the UMLS disorder group, generated by an LLM. This corpus aims to improve these tasks in biomedical and clinical texts.

nlp named entity recognition machine learning data augmentation entity normalization

Published: Feb. 3, 2025. Version: 1.0.0


Database Credentialed Access

Medical-Diff-VQA: A Large-Scale Medical Dataset for Difference Visual Question Answering on Chest X-Ray Images

Xinyue Hu, Lin Gu, Qiyuan An, Mengliang Zhang, liangchen liu, Kazuma Kobayashi, Tatsuya Harada, Ronald Summers, Yingying Zhu

MIMIC-Diff-VQA provides a large-scale dataset for Difference visual question answering in medical chest x-ray images.

vqa difference visual question answering difference vqa chest x-ray visual question answering

Published: Feb. 3, 2025. Version: 1.0.1


Database Restricted Access

CXRGraph: Using Information Extraction to Normalize the Training Data for Automatic Radiology Report Generation

Yuxiang Liao, Hoisang Heung, Hantao Liu, Irena Spasic

CXRGraph is a structured radiology report dataset built upon RadGraph and tailored for the Automatic Radiology Report Generation task. It can identify more task-relevant information such as abnormalities and hallucinated prior references.

relation extraction information extraction natural language processing named entity recognition structured radiology report

Published: Feb. 3, 2025. Version: 1.0.0


Database Credentialed Access

Symile-MIMIC: a multimodal clinical dataset of chest X-rays, electrocardiograms, and blood labs from MIMIC-IV

Adriel Saporta, Aahlad Manas Puli, Mark Goldstein, Rajesh Ranganath

A multimodal clinical dataset consisting of CXRs, ECGs, and blood labs, designed to evaluate Symile, a simple contrastive loss that accommodates any number of modalities and allows any model to produce representations for each modality.

database cxr ecg chest x-ray contrastive learning model multimodal mimic electrocardiogram

Published: Jan. 28, 2025. Version: 1.0.0


Database Restricted Access

Visual Question Answering evaluation dataset for MIMIC CXR

Timo Kohlberger, Charles Lau, Tom Pollard, Andrew Sellergren, Atilla Kiraly, Fayaz Jamil

This dataset provides 224 VQAs for 40 test set cases, and 111 VQAs for 23 validation set cases of the MIMIC CXR dataset.

Published: Jan. 28, 2025. Version: 1.0.0