Featured Resources


Database Restricted Access

Bridge2AI-Voice: An ethically-sourced, diverse voice dataset linked to health information

Alistair Johnson, Jean-Christophe BĂ©lisle-Pipon, David Dorr, Satrajit Ghosh, Philip Payne, Maria Powell, Anais Rameau, Vardit Ravitsky, Alexandros Sigaras, Olivier Elemento, Yael Bensoussan

A dataset of voice recordings and metadata to enable the development, benchmarking, and validation of clinically applicable machine-learning models for diagnosing a wide range of health conditions.

voice bridge2ai

Published: Jan. 17, 2025. Version: 1.1


Database Open Access

VitalDB, a high-fidelity multi-parameter vital signs database in surgical patients

Hyung-Chul Lee, Chul-Woo Jung

VitalDB, a high-fidelity multi-parameter vital signs database in surgical patients

waveform anesthesia vitaldb intraoperative biosignal ecg

Published: Sept. 21, 2022. Version: 1.0.0


Database Credentialed Access

Northwestern ICU (NWICU) database

Dana Moukheiber, William Temps, Bhadrappa Molgi, Yikuan Li, Alice Lu, Prasanth Nannapaneni, Abdulrahman Chahin, Sicheng Hao, Felipe Torres Fabregas, Leo Anthony Celi, Adrian Wong, Maxwell Lloyd, Xavier Borrat Frigola, Hyung-Chul Lee, Daniel Schneider, Tom Pollard, Yuan Luo, Abel Kho, Roger Mark

A freely available COVID-rich ICU database comprising de-identified health-related data from Northwestern Memorial Health Center (NHMC).

Published: Nov. 19, 2024. Version: 0.1.0


Database Credentialed Access

MIMIC-IV

Alistair Johnson, Lucas Bulgarelli, Tom Pollard, Steven Horng, Leo Anthony Celi, Roger Mark

Large database of de-identified health information from patients admitted to Beth Israel Deaconess Medical Center

critical care intensive care unit machine learning mimic

Published: Jan. 6, 2023. Version: 2.2


Database Credentialed Access

MIMIC-CXR Database

Alistair Johnson, Tom Pollard, Roger Mark, Seth Berkowitz, Steven Horng

Chest radiographs in DICOM format with associated free-text reports.

computer vision chest x-rays natural language processing radiology machine learning mimic

Published: Sept. 19, 2019. Version: 2.0.0


Database Credentialed Access

BRAX, a Brazilian labeled chest X-ray dataset

Eduardo Pontes Reis, Joselisa Paiva, Maria Carolina Bueno da Silva, Guilherme Alberto Sousa Ribeiro, Victor Fornasiero Paiva, Lucas Bulgarelli, Henrique Lee, Paulo Victor dos Santos, vanessa brito, Lucas Amaral, Gabriel Beraldo, Jorge Nebhan Haidar Filho, Gustavo Teles, Gilberto Szarf, Tom Pollard, Alistair Johnson, Leo Anthony Celi, Edson Amaro

BRAX contains 24,959 chest radiography exams and 40,967 images acquired in a large general Brazilian hospital. All images have been read by trained radiologists and 14 labels were derived from Brazilian Portuguese reports using NLP.

chest x-ray dataset artificial intelligence

Published: June 17, 2022. Version: 1.1.0


Latest Resources


Database Restricted Access

OpenOximetry Repository

Nicholas Fong, Michael Lipnick, Philip Bickler, John Feiner, Tyler Law

A repository of matched arterial oxygen and pulse oximeter readings obtained under controlled conditions, with high-frequency physiologic waveforms and skin color measurements.

Published: Feb. 19, 2025. Version: 1.1.0


Database Restricted Access

DREAMT: Dataset for Real-time sleep stage EstimAtion using Multisensor wearable Technology

Ke Wang, Jiamu Yang, Ayush Shetty, Jessilyn Dunn

Dataset for Real-time sleep stage EstimAtion using Multisensor wearable Technology

biomedical time series classification wearable sleep disorders

Published: Feb. 5, 2025. Version: 2.0.0


Database Restricted Access

LATTE-CXR: Locally Aligned TexT and imagE, Explainable dataset for Chest X-Rays

Elham Ghelichkhan, Tolga Tasdizen

This dataset includes bounding box-statement pairs for chest X-ray images, derived from radiologists’ eye-tracking data (for explainability) and annotations, for local visual-language models.

eye-tracking chest x-ray dataset automatically generated dataset caption-guided object detection localization image captioning with region-level description grounded radiology report generation phrase grounding xai multi-modal learning local visual-language models

Published: Feb. 4, 2025. Version: 1.0.0


Database Restricted Access

Application of Med-PaLM 2 in the refinement of MIMIC-CXR labels

Kendall Park, Rory Sayres, Andrew Sellergren, Tom Pollard, Fayaz Jamil, Timo Kohlberger, Charles Lau, Atilla Kiraly

This work further refines the labels associated with CheXpert in MIMIC-CXR-JPG 2.0.0 by filtering with Med-PaLM 2 followed by verification by manual review by three US board-certified radiologists.

mimic-cxr labels

Published: Feb. 4, 2025. Version: 1.0.0


Database Credentialed Access

MIMIC-IV-Ext-BHC: Labeled Clinical Notes Dataset for Hospital Course Summarization

Asad Aali, Dave Van Veen, Yamin Arefeen, Jason Hom, Christian Bluethgen, Eduardo Pontes Reis, Sergios Gatidis, Namuun Clifford, Joseph Daws, Arash Tehrani, Jangwon Kim, Akshay Chaudhari

This dataset presents a collection of preprocessed and labeled clinical notes derived from "MIMIC-IV-Note", and aims to facilitate the development of ML models focused on summarizing brief hospital courses (BHC) from clinical notes.

natural language processing clinical notes brief hospital course text summarization machine learning

Published: Feb. 3, 2025. Version: 1.2.0


Database Open Access

Synthetic Mention Corpora for Disease Entity Recognition and Normalization

Kuleen Sasse, John David Osborne

We present the Synthetic Mention Corpora for Disease Entity Recognition and Normalization, containing 128000 disease mentions from the UMLS disorder group, generated by an LLM. This corpus aims to improve these tasks in biomedical and clinical texts.

nlp named entity recognition machine learning data augmentation entity normalization

Published: Feb. 3, 2025. Version: 1.0.0