This page displays an alphabetical list of all the databases on PhysioNet. To search content on PhysioNet, visit the search page. Enter the search terms, add a filter for resource type if needed, and select how you would like the results to be ordered (for example, by relevance, by date, or by title).
Each project is made available under one of the following access policies:
- Open Access: Accessible by all users, with minimal restrictions on reuse.
- Restricted Access: Accessible by registered users who sign a Data Use Agreement.
- Credentialed Access: Accessible by registered users who complete the credentialing process and sign a Data Use Agreement.
Open databases
- Abdominal and Direct Fetal ECG Database: Multichannel fetal electrocardiogram recordings obtained from 5 different women in labor, between 38 and 41 weeks of gestation.
- A Comprehensive Dataset of Pattern Electroretinograms for Ocular Electrophysiology Research: The PERG-IOBA Dataset: 336 CSV records with 1354 PERG responses (microvolts) from 304 subjects at IOBA. Includes age (years), gender, diagnoses, and visual acuity in logMar scale.
- AF Termination Challenge Database: ECG recordings created for the Computers in Cardiology Challenge 2004, which focused on predicting spontaneous termination of atrial fibrillation.
- AHA Database Sample Excluded Record: Two ECG signals that were excluded from the 1980 American Heart Association database.
- A large scale 12-lead electrocardiogram database for arrhythmia study: A 12-lead electrocardiogram database for arrhythmia research covering more than 10,000 patients
- A multi-camera and multimodal dataset for posture and gait analysis: Multimodal dataset with 166k samples for vision-based applications with a smart walker used in gait and posture rehabilitation. It is equipped with a pair of Depth cameras with data synchronized with an inertial MoCap system worn by the participant.
- A Multi-Modal Satellite Imagery Dataset for Public Health Analysis in Colombia: Multi-Modal Satellite imagery Dataset in Colombia: A public health analysis with spatiotemporally aligned satellite images and its corresponding metadata across 81 municipalities (2016-2018), facilitating multimodal AI applications.
- ANSI/AAMI EC13 Test Waveforms: The files in this set can be used for testing a variety of devices that monitor the electrocardiogram. The recordings include both synthetic and real waveforms.
- Apnea-ECG Database: Seventy ECG signals with expert-labelled apnea annotations and machine-generated QRS annotations.
- A Pressure Map Dataset for In-bed Posture Classification: Pressure sensor data captured from 13 participants in various sleeping postures.
- Auditory evoked potential EEG-Biometric dataset: Recording of electroencephalogram (EEG) signals with the aim to develop an EEG-based Biometric. The Data includes resting-state and auditory stimuli experiments.
- Autonomic Aging: A dataset to quantify changes of cardiovascular autonomic function during healthy aging: This database contains resting recordings of ECG and continuous noninvasive blood pressure of 1,104 healthy volunteers
- A Wearable Exam Stress Dataset for Predicting Cognitive Performance in Real-World Settings: The data contains electrodermal activity, heart rate, blood volume pulse, skin surface temperature, inter beat interval and accelerometer data recorded during three exam sessions (midterm 1, midterm 2 and finals) as well as their corresponding grades
- Behavioral and autonomic dynamics during propofol-induced unconsciousness: Multimodal point process indices for heart rate variability and electrodermal activity for 9 subjects who are undergoing a controlled propofol sedation experiment where the concentration was increased and then decreased in stages.
- BIDMC Congestive Heart Failure Database: Long-term ECG recordings from 15 subjects with severe congestive heart failure.
- BIDMC PPG and Respiration Dataset: ECG signals extracted from the MIMIC-II Matched Waveform Database, with manual breath annotations added by annotators using impedance respiratory signal.
- BIG IDEAs Lab Glycemic Variability and Wearable Device Data: Glucose measurements and wrist-worn wearable sensor data from highnormoglycemic participants.
- Blood Pressure in Salt-Sensitive Dahl Rats: This database contains continuous blood pressure recordings for 9 Dahl SS rats and 6 Dahl SS.13BN rats, under high and low salt conditions.
- Body Sway When Standing and Listening to Music Modified to Reinforce Virtual Reality Environment Motion: This data were intended to show that music manipulated to match VR motion provided by an Oculus Rift head mounted display increased body sway when standing still.
- Brain Hemorrhage Extended (BHX): Bounding box extrapolation from thick to thin slice CT images: The first version of this dataset was made available in the forum of Kaggle competition 'RSNA Intracranial Hemorrhage Detection' (v1.0). Then minor corrections were implemented (v1.1).
- Brno University of Technology ECG Quality Database (BUT QDB): The database is intended for the development and objective comparison of algorithms designed to assess the quality of ECG records. It also enables objective comparison of results between authors.
- Brno University of Technology ECG Signal Database with Annotations of P Wave (BUT PDB): BUT PDB is an ECG signal database with marked peaks of P waves created for the development, and objective comparison of P wave detection algorithms. The database consists of 50 2-minute 2-lead ECG signal records with various types of pathology.
- Brno University of Technology Smartphone PPG Database (BUT PPG): BUT PPG is a database created for the purpose of evaluating PPG signal quality and estimation of heart rate. The data comprises 3,888 10s recordings of PPGs recorded by smartphone and associated ECG and ACC signals and annotations.
- CAP Sleep Database: The CAP Sleep Database is a collection of 108 polysomnographic recordings registered at the Sleep Disorders Center of the Ospedale Maggiore of Parma, Italy. The waveforms (contained in the .edf files…
- Cardiorespiratory measurement from graded cycloergometer exercise testing: Cardiorespiratory measurements acquired during 18 exercise tests performed at French West Indies University
- CAST RR Interval Sub-Study Database: Data from the Cardiac Arrhythmia Suppression Trial (CAST), a study designed to test the hypothesis that suppression of ventricular premature complexes would improve survival.
- Cerebral perfusion and cognitive decline in type 2 diabetes: Dataset collected during a study on type 2 diabetes on brain blood flow, vasoreactivity and functional outcomes (gait and balance) using TCD, MRI perfusion and foot pressure distribution and gait measures.
- Cerebral Vasoregulation in Diabetes: Diabetes is a risk factor for cerebral hypoperfustion and microvascular disease. This study assessed the effects of diabetic autonomic neuropathy with OH on cerebral vasoregulation.
- Cerebral Vasoregulation in Elderly with Stroke: Multimodal data from a large study investigating the effects of ischemic stroke on cerebral vasoregulation.
- Cerebromicrovascular Disease in Elderly with Diabetes: Type 2 diabetes increases risk for cerebrovascular disease, cognitive and mobility decline in older people. This project evaluated relationship between diabetes, inflammation cerebrovascular reactivity and functional outcomes.
- CGMacros: a scientific dataset for personalized nutrition and diet monitoring: CGMacros contains information from two continuous glucose monitors (CGM), food macronutrients, food photographs, physical activity, and anonymized participant demographics, anthropometric measurements and health parameters.
- CHARIS database: Multi-channel recordings of ECG, arterial blood pressure (ABP), and intracranial pressure (ICP) of patients diagnosed with traumatic brain injury.
- CHB-MIT Scalp EEG Database: EEG recordings from pediatric subjects with intractable seizures, collected at the Children’s Hospital Boston.
- CheXmask Database: a large-scale dataset of anatomical segmentation masks for chest x-ray images: CheXmask Database is a 657,566 uniformly annotated chest radiographs with segmentation masks. Images were segmented using HybridGNet, with automatic quality control indicated by RCA scores.
- CiPA ECG Validation Study: ECG effects of ranolazine, verapamil, lopinavir+ritonavir, chloroquine, dofetilide, diltiazem, and dofetilide+diltiazem in a small clinical study.
- Clinical data from the MIMIC-II database for a case study on indwelling arterial catheters: Dataset extracted from MIMIC-II for a tutorial on effectiveness of indwelling arterial catheters in hemodynamically stable patients with respiratory failure for mortality outcomes.
- CogWear: Can we detect cognitive effort with consumer-grade wearables?: Physiological data captured in experimental condition by three wearable devices.
- Combined measurement of ECG, Breathing and Seismocardiograms: ECG and seismocardiogram data collected from 20 presumed healthy volunteers.
- Complex Upper-Limb Movements: Hand trajectory data collected from ten subjects as they performed various upper-limb motor tasks.
- Congestive Heart Failure RR Interval Database: This database includes beat annotation files for 29 long-term ECG recordings of subjects aged 34 to 79, with congestive heart failure (NYHA classes I, II, and III).
- Continuous Cuffless Monitoring of Arterial Blood Pressure via Graphene Bioimpedance Tattoos: Cuffless blood pressure data repository that includes raw time data for 4-channel Bioimpedance signals using Graphene Tattoos from the wrist with synchronized continuous blood pressure and PPG signals from 7 subjects
- CPAP Pressure and Flow Data from a Local Trial of 30 Adults at the University of Canterbury: A pressure and flow dataset was collected from a trial of 30 adults at the University of Canterbury undergoing CPAP therapy for a variety of instructed breath rates at PEEP levels of 4cmH2O and 7cmH2O.
- CTU-CHB Intrapartum Cardiotocography Database: 552 cardiotocography records collected between 2010 and 2012 at the Czech Technical University and University Hospital in Brno.
- CUILESS2016: A corpus of Concept Unique Identifier concepts taken from the SemEval2015 Task 14.
- CU Ventricular Tachyarrhythmia Database: This database includes 35 eight-minute ECG recordings of human subjects who experienced episodes of sustained ventricular tachycardia, ventricular flutter, and ventricular fibrillation.
- ECG Effects of Dofetilide, Moxifloxacin, Dofetilide+Mexiletine, Dofetilide+Lidocaine and Moxifloxacin+Diltiazem: ECG from 22 subjects for a study on response of hERG potassium channel blocking drugs with and without the addition of late sodium or calcium channel blocking drugs.
- ECG Effects of Ranolazine, Dofetilide, Verapamil, and Quinidine: ECGs of 22 subjects for a study aimed at comparing the effects of QT prolonging drugs versus placebo on electrophysiological parameters.
- ECG Fragment Database for the Exploration of Dangerous Arrhythmia: Dataset derived from the MIT-BIH Malignant Ventricular Ectopy Database.
- ECG-ID Database: ECG recordings from 90 volunteers.
- EEG During Mental Arithmetic Tasks: The database contains EEG recordings of subjects before and during the performance of mental arithmetic tasks.
- EEG Motor Movement/Imagery Dataset: EEG recordings obtained from 109 volunteers.
- EEG Signals from an RSVP Task: This project contains EEG data from 11 healthy participants upon rapid presentation of images through the Rapid Serial Visual Presentation (RSVP) protocol at speeds of 5, 6, and 10 Hz.
- Effect of 24-hour sleep deprivation on cerebral hemodynamics and cognitive performance: We assessed the impact of 24-hour sleep deprivation on the global properties of frontal lobe functional networks and on cognitive performance. fNIRS measurements were carried out in the resting state and in response to a finger-tapping paradigm.
- Effect of Deep Brain Stimulation on Parkinsonian Tremor: Rest tremor velocity in the index finger of 16 subjects with Parkinson's disease who receive chronic high frequency electrical deep brain stimulation.
- eICU Collaborative Research Database Demo: An openly available subset of the eICU Collaborative Research Database.
- Electrocardiogram, skin conductance and respiration from spider-fearful individuals watching spider video clips: Dataset used for development of an algorithm for on-line anxiety level detection from biosignals.
- Electroencephalogram and eye-gaze datasets for robot-assisted surgery performance evaluation: The brain activity and eye gaze data were recorded from 25 participants performing surgical tasks using a robot simulator. The performance score was created by the simulator. Data can be used to evaluate surgical performance.
- EPHNOGRAM: A Simultaneous Electrocardiogram and Phonocardiogram Database: An open-access database recorded during the EPHNOGRAM project, consisting of simultaneous electrocardiogram (ECG) and phonocardiogram (PCG) recordings from young healthy adults, during stress-test experiments.
- Epicardially attached cardiac accelerometer data from canines and porcines: The dataset contains data recorded from epicardially attached accelerometer to canines' and porcines' hearts. The data comprises of tri-axial acceleration signals, left ventricular pressure (LVP), rate of change of LVP and in some cases ECG as well.
- ERP-based Brain-Computer Interface recordings: Data generated as part of a study aimed at identifying the factors limiting the performance of brain-computer interfaces based on event-related potentials.
- European ST-T Database: Annotated excerpts of ambulatory ECG recordings from 79 subjects, designed for evaluation of algorithms for analysis of ST and T-wave changes.
- Evoked Auditory Responses in Heading Impaired: Auditory Brainstem Response and Otoacoustic Emission recordings in eight hearing impaired listeners.
- Evoked Auditory Responses in Normals: Auditory Brainstem Response and Otoacoustic Emission recordings generated as part of a study examining evoked potentials and loudness growth.
- Examples of Electromyograms: An electromyogram (EMG) is a common clinical test used to assess function of muscles and the nerves that control them. EMG studies are used to help in the diagnosis and management of disorders such a…
- Eye Tracking Dataset for the 12-Lead Electrocardiogram Interpretation of Medical Practitioners and Students: The project aims at collecting a dataset using eye-tracking technology to understand the 12-lead electrocardiogram interpretation visual behavior for medical practitioners and students with different expertise levels.
- Facial and oral temperature data from a large set of human subject volunteers: Data for each subject include temperatures measured at 29 facial locations over four rounds with two IRTs, oral temperatures measured with a thermometer in two modes, subject demographics (gender, age, ethnicity), environmental conditions, etc.
- Fantasia Database: ECG and respiration signals collected from 40 young and elderly subjects during supine resting.
- Fetal ECG Synthetic Database: The _FECGSYNDB_ is a large database of simulated adult and non-invasive fetal ECG (NI-FECG) signals, which provides a robust resource that enables reproducible research in the field. The data is gene…
- Fetal PCG Database: The project collects a series of 26 fetal phonocardiographic (PCG) signals from different pregnant women during the last months of their singleton physiological pregnancies (gestational week between …
- Gait in Aging and Disease Database: Walking stride interval time series from 15 subjects.
- Gait in Neurodegenerative Disease Database: Database of force-sensitive resistors (with the output roughly proportional to the force under the foot) from patients from Parkinson's disease (n=15), Huntington's disease (n=20), amyotrophic lateral sclerosis (n=13), and healthy subjects (n=16).
- Gait in Parkinson's Disease: This database contains measures of gait from 93 patients with idiopathic PD (mean age: 66.3 years; 63% men), and 73 healthy controls (mean age: 66.3 years; 55% men). The database includes the vertica…
- Gait Maturation Database: The data contained here are from 50 healthy children ranging in age from 40 months to 163 months. Each data file is named with a subject identifier (1-50) and the subject's age (e.g., the file name 2…
- Gesture Recognition and Biometrics ElectroMyogram (GRABMyo): Open-access dataset of electromyogram (EMG) recordings collected from the wrist and forearm muscles of 43 people while they performed hand gestures.
- Haaglanden Medisch Centrum sleep staging database: A collection of 151 whole-night PolySomnoGraphic (PSG) sleep recordings from the Haaglanden Medisch Centrum (HMC, The Netherlands) sleep center containing different traces of ExG activity and expert's scorings of sleep stages
- Heart and lung segmentations for MIMIC-CXR/MIMIC-CXR-JPG and Montgomery County TB databases: Heart and lung segmentations for 200 MIMIC-CXR/MIMIC-CXR-JPG chest x-rays and heart segmentations for 138 Montgomery County tuberculosis chest X-rays.
- Heart Rate Oscillations during Meditation: Heart rate time series for 5 different groups of healthy subjects performing meditation techniques.
- Human Balance Evaluation Database: Force platform recordings from 163 subjects undergoing stabilography tests.
- I-CARE: International Cardiac Arrest REsearch consortium Database: The clinical and EEG data for this dataset originates from seven academic hospitals in the U.S. and Europe led by investigators part of the International Cardiac Arrest REsearch consortium (I-CARE).
- Icelandic 16-electrode Electrohysterogram Database: This database consists of 122 16-electrode EHG recordings performed on 45 pregnant women. The recordings were performed between 2008 and 2010 in Iceland.
- Icentia11k Single Lead Continuous Raw Electrocardiogram Dataset: This is a dataset of continuous raw electrocardiogram (ECG) signals for representation learning containing 11 thousand patients and 2 billion labelled beats.
- Image-derived cardiomegaly biomarker values for 96K chest X-rays in MIMIC-CXR/MIMIC-CXR-JPG: Automatically extracted cardiomegaly biomarkers - cardiothoracic ratio (CTR) and cardiopulmonary area ratio (CPAR) - for all posterior-anterior chest x-ray scans in MIMIC-CXR/MIMIC-CXR-JPG.
- Indian Institute of Science Fetal Heart Sound Database (IIScFHSDB): The IIScFHSDB has 60 fetal phonocardiography recordings obtained in a hospital setting with the objective to provide fPCG recordings with clinical noise settings for development of signal processing algorithms for FHR determination and denoising fPCG
- Induced Cesarean EHG DataSet (ICEHG DS): An open dataset with electrohysterogram records of pregnancies ending in induced and cesarean section delivery: The design and development of ICEHG DS was funded by the Slovenian Research Agency (ARRS) under the research project Metabolic and inborn factors of reproductive health, birth III.
- Influence of the MHD effect on 12-lead and 3-lead ECGs recorded in 1T to 7T MRI scanners: ECG signals were acquired in various MRI scanners to enable the study of the magnetohydrodynamic (MHD) effect. The MHD effect, which is caused by an interaction of the blood flow and the MRI’s high static magnetic field, superimposes the ECG signal.
- In-Gauge and En-Gage: Understanding Occupants' Behaviour, Engagement, Emotion, and Comfort Indoors with Heterogeneous Sensors and Wearables: The project aims to understand occupants’ behaviour, engagement, emotion, and comfort indoors with heterogeneous sensors and wearables.
- Integration of Electroencephalogram and Eye-Gaze Datasets for Performance Evaluation in Fundamentals of Laparoscopic Surgery (FLS) Tasks: Brain activity and eye gaze data were collected from a group of 25 participants who completed the FLS tasks using a trainer box (Pyxus®). Each participant performed the tasks five times, and their performance was evaluated by an expert rater.
- Intracardiac Atrial Fibrillation Database: This database consists of endocardial recordings from the right atria of 8 patients in atrial fibrillation or flutter. A decapolar catheter with 2-5-2mm spacing (7mm spacing between bipoles) was plac…
- Kiel Cardio Database: The Kiel Cardio Database (KCD) contains one-minute 8-lead magnetocardiographic (MCG) measurements from seven subjects. Each subject underwent 25 consecutive measurements using a sensor array comprising four QuSpin QZFMs.
- KINECAL: A dataset for balance falls-risk assessment and balance impairment analysis
- Labeled raw accelerometry data captured during walking, stair climbing and driving: Labeled raw accelerometry data collected during outdoor walking, stair climbing, and driving for 32 healthy adults. Data were collected simultaneously at four body locations: left wrist, left hip, both ankles.
- Lobachevsky University Electrocardiography Database: ECG signal database that consists of 200 10-second 12-lead records. The boundaries and peaks of P, T waves and QRS complexes were manually annotated by cardiologists. Each record is annotated with the corresponding diagnosis.
- Long Term AF Database: This database includes 84 long-term ECG recordings of subjects with paroxysmal or sustained atrial fibrillation (AF). Each record contains two simultaneously recorded ECG signals digitized at 128 Hz …
- Long Term Movement Monitoring Database: The LTMM database contains 3-day 3D accelerometer recordings of 71 elder community residents, used to study gait, stability, and fall risk.
- Long-term Recordings of Gait Dynamics: Stride interval fluctuations were studied in ten young, healthy men. Participants had no history of any neuromuscular, respiratory or cardiovascular disorders, and were taking no medications. Mean ag…
- Long Term ST Database: The Long-Term ST Database contains 86 lengthy ECG recordings of 80 human subjects, chosen to exhibit a variety of events of ST segment changes, including ischemic ST episodes, axis-related non-ischem…
- MAMEM SSVEP Database: Released by the Information Technologies Institute (CERTH-ITI) and powered by MAMEM HORIZON 2020, the MSSVEP database contains EEG recordings of 11 subjects under the stimulation of flickering lights…
- Mental workload during n-back task captured by TransCranial Doppler (TCD) sonography and functional Near-Infrared Spectroscopy (fNIRS) monitoring: Functional near-infrared spectroscopy and transcranial doppler sonography were used to measure changes in cerebral hemodynamics during cognitive stimulation.
- MGH/MF Waveform Database: The Massachusetts General Hospital/Marquette Foundation (MGH/MF) Waveform Database is a comprehensive collection of electronic recordings of hemodynamic and electrocardiographic waveforms of stable a…
- MICRO Motion capture data from groups of participants standing still to auditory stimuli (2012): How and why does music make us move? This dataset was collected as part of a project to investigate how music influences small magnitude motion observed when people try to stand still.
- MICRO Motion capture data from groups of participants standing still to auditory stimuli (2015): How and why does music make us move? This project investigates how music influences small magnitude motion observed when people try to stand still.
- MIMIC Database: The MIMIC Database includes data recorded from over 90 ICU patients. The data in each case include signals and periodic measurements obtained from a bedside monitor as well as clinical data obtained …
- MIMIC-III Clinical Database Demo: An open source demo of the MIMIC-III Clinical Database
- MIMIC-III Waveform Database: The MIMIC-III Waveform Database contains numerous physiological signals (including continuous ECG, PPG, ABP, and other signals) and periodic measurements, recorded by bedside patient monitors from about 30,000 patients in intensive care units.
- MIMIC-III Waveform Database Matched Subset: Physiological signals (including continuous ECG, PPG, ABP, and other signals) that are associated with patients in the MIMIC-III Clinical Database.
- MIMIC-IV Clinical Database Demo: An openly available subset of patients in the MIMIC-IV database.
- MIMIC-IV Clinical Database Demo on FHIR: MIMIC-IV-on-FHIR is a hundred patient demo of MIMIC-IV v2.0 in the Fast Healthcare Interoperability Resources(FHIR) format. MIMIC-IV-on-FHIR provides implementers with a real-world FHIR datastore to aid in FHIR research and development.
- MIMIC-IV demo data in the OMOP Common Data Model: Preliminary work to transform a MIMIC-IV demo dataset to the OMOP Common Data Model
- MIMIC-IV-ECG Demo - Diagnostic Electrocardiogram Matched Subset Demo: The MIMIC-IV ECG Demo module contains 659 diagnostic electrocardiograms across 92 unique patients. These 92 patients overlap with the patients from the MIMIC-IV Clinical Demo and are also part of the MIMIC-IV Clinical Database.
- MIMIC-IV-ECG: Diagnostic Electrocardiogram Matched Subset: The MIMIC-IV ECG module contains approximately 800,000 diagnostic electrocardiograms across nearly 160,000 unique patients. These patients overlap with the patients from the MIMIC-IV Clinical Database.
- MIMIC-IV-ED Demo: An openly available subset of the MIMIC-IV-ED database
- MIMIC-IV Waveform Database: The MIMIC-IV Waveform Database collects physiological signals and measurements from ICU bedside monitors. Coupled with the clinical information available in MIMIC-IV, it provides a detailed view into the physiology of critically ill patients.
- Minute level step counts and physical activity data from the National Health and Nutrition Examination Survey (NHANES) 2011-2014: Minute level step counts obtained from five step counting algorithms for raw accelerometry data, and minute level Activity Counts, MIMS, wear predictions, and wear flags for all participants who wore accelerometers in NHANES 2011-2014.
- MIT-BIH Arrhythmia Database: Two-channel ambulatory ECG recordings, obtained from 47 subjects studied by the BIH Arrhythmia Laboratory between 1975 and 1979.
- MIT-BIH Arrhythmia Database P-Wave Annotations: P-wave annotations for twelve signals from the MIT-BIH Arrhythmia Database.
- MIT-BIH Atrial Fibrillation Database: This database includes 25 long-term ECG recordings of human subjects with atrial fibrillation (mostly paroxysmal).
- MIT-BIH ECG Compression Test Database: This database contains 168 short ECG recordings (20.48 seconds each) selected to pose a variety of challenges for ECG compressors, in particular for lossy compression methods.
- MIT-BIH Long-Term ECG Database: The MIT-BIH Long-Term ECG Database is a collection of 7 long-duration electrocardiogram (ECG) recordings (14 to 22 hours each), with manually reviewed beat annotations. The database was primarily dev…
- MIT-BIH Malignant Ventricular Ectopy Database: This database includes 22 half-hour ECG recordings of subjects who experienced episodes of sustained ventricular tachycardia, ventricular flutter, and ventricular fibrillation.
- MIT-BIH Noise Stress Test Database: This database includes 12 half-hour ECG recordings and 3 half-hour recordings of noise typical in ambulatory ECG recordings. The noise recordings were made using physically active volunteers and stan…
- MIT-BIH Normal Sinus Rhythm Database: Long-term ECG recordings of 18 subjects referred to the Arrhythmia Laboratory at Boston's Beth Israel Hospital.
- MIT-BIH Polysomnographic Database: Recordings of multiple physiologic signals during sleep, collected from 18 subjects monitored at Boston's Beth Israel Hospital Sleep Laboratory.
- MIT-BIH ST Change Database: Twenty eight ECG recordings of varying lengths, most of which were recorded during exercise stress tests and which exhibit transient ST depression.
- MIT-BIH Supraventricular Arrhythmia Database: This database includes 78 half-hour ECG recordings chosen to supplement the examples of supraventricular arrhythmias in the MIT-BIH Arrhythmia Database.
- MMG Database: Uterine magnetomyographic signals from 25 subjects recorded using a 151 channel Reproductive Assessment system.
- Modulation of Plantar Pressure and Muscle During Gait: Plantar pressure distribution and muscle activity during gait from 20 healthy male adults.
- Motion and heart rate from a wrist-worn wearable and labeled sleep from polysomnography: Motion data and heart rate measurements from Apple Watches from sleeping people undergoing polysomnography.
- Motion Artifact Contaminated ECG Database: Short duration ECG signals are recorded from a healthy 25-year-old male performing different physical activities to study the effect of motion artifacts on ECG signals and their sparsity.
- Motion Artifact Contaminated fNIRS and EEG Data: Examples of functional near-infrared spectroscopy and electroencephalogram recordings that have been created for evaluating artifact removal methods.
- Multilevel Monitoring of Activity and Sleep in Healthy People: Multilevel Monitoring of Activity and Sleep in Healthy people (MMASH) dataset provides 24 hours of continuous beat-to-beat heart data, triaxal accelerometer data, sleep quality, physical activity, psychological characteristics and salivary samples.
- MUSIC (Sudden Cardiac Death in Chronic Heart Failure): The MUSIC study is a prospective, multicentre, longitudinal study designed to assess risk predictors of cardiac mortality and sudden cardiac death in ambulatory patients with chronic heart failure.
- neuroQWERTY MIT-CSXPD Dataset: Keystroke logs collected from 85 subjects with and without Parkinson's disease.
- NInFEA: Non-Invasive Multimodal Foetal ECG-Doppler Dataset for Antenatal Cardiology Research: Open dataset featuring non-invasive electrophysiological recordings, fetal pulsed-wave Doppler and maternal respiration signals. It provides a ground truth on the fetal heart activity when an invasive scalp lead is unavailable.
- Noise Enhancement of Sensorimotor Function: Postural sway measurements for 27 healthy young and elderly volunteers.
- Non-EEG Dataset for Assessment of Neurological Status: Non-EEG physiological signals collected using non-invasive wrist worn biosensors and consists of electrodermal activity, temperature, acceleration, heart rate, and arterial oxygen level.
- Non-Invasive Fetal ECG Arrhythmia Database: Fetal cardiac arrhythmias are defined as any irregular fetal cardiac rhythm or regular rhythm at a rate outside the reference range of 100 to 200 beat per minute (bpm). Arrhythmias are discovered in …
- Non-Invasive Fetal ECG Database: Fifty-five multichannel abdominal non-invasive fetal electrocardiogram recordings, taken from a single subject between 21 to 40 weeks of pregnancy.
- Normal Sinus Rhythm RR Interval Database: Beat annotation files for 54 long-term ECG recordings of subjects in normal sinus rhythm.
- Norwegian Endurance Athlete ECG Database: This project contains 28 ECGs from 28 healthy elite athletes. The ECGs have been interpreted by the Marquette SL12 (version 23) algorithm and a Cardiologist using the International Criteria for ECG interpretation (2018).
- OB-1 Fetal ECG Database: This project is developing a set of recordings of fetal scalp electrograms and uterine muscular activity, with beat-by-beat annotations of the fetal ECG, to support studies of fetal heart rate variab…
- Open Access Dataset and Toolbox of High-Density Surface Electromyogram Recordings: We provide an open access dataset of High Density Surface Electromyogram (HD-sEMG). Our dataset can be used for both hand gesture classification-based neuroprosthetic control and EMG-force regression based proportional neuroprosthetic control.
- PADS - Parkinsons Disease Smartwatch dataset: The PADS dataset contains smartwatch-based records from interactive neurological assessments of Parkinsons disease patients, differential diagnoses and healthy controls. The data is complemented with non-motor symptoms and medical history information
- PAF Prediction Challenge Database: ECG recordings created for use in the Computers in Cardiology Challenge 2001, a competition with the goal of developing automated methods for predicting paroxysmal atrial fibrillation.
- Patient-level dataset to study the effect of COVID-19 in people with Multiple Sclerosis: This dataset is part of the Global Data Sharing Initiative. The data was acquired by people with MS and clinicians using a fast data entry tool. The dataset includes demographics, comorbidities and hospital stay and COVID-19 symptoms of PwMS.
- Pattern Analysis of Oxygen Saturation Variability: This database contains one hour oxygen saturation measurements of 36 patients, used for the analysis of oxygen saturation variability.
- Permittivity of Healthy and Diseased Skeletal Muscle: Conductivity and relative permittivity of healthy and diseased skeletal muscle.
- Physiologic Response to Changes in Posture: A collection of physiological signals in ten healthy subjects in response to a slow tilt, a fast tilt, and a standing-up maneuver.
- PhysioZoo - mammalian NSR databases: PhysioZoo is a collaborative platform dedicated to the study of the heart rate variability in electrophysiological recordings from mammals
- Post-Ictal Heart Rate Oscillations in Partial Epilepsy: This database contains "post-ictal heart rate oscillations in a heterogeneous group of patients with partial epilepsy.
- Pressure, flow, and dynamic thoraco-abdominal circumferences data for adults breathing under CPAP therapy: Dataset of pressure, flow, and dynamic abdominal and chest circumference for healthy people breathing with CPAP. Data was collected with PEEP settings of 0 (ZEEP), 4, and 8cmH2O at normal/resting, panting/short and deep/long breath patterns/rates.
- Preterm Infant Cardio-Respiratory Signals Database: ECG and respiration recordings of ten preterm infants collected from a Neonatal Intensive Care Unit.
- PTB Diagnostic ECG Database: ECGs obtained from 290 subjects using a non-commercial, prototype recorder developed at Physikalisch-Technische Bundesanstalt.
- PTB-XL+, a comprehensive electrocardiographic feature dataset: ECG feature dataset accompanying the PTB-XL ECG dataset
- PTB-XL, a large publicly available electrocardiography dataset: The PTB-XL ECG dataset is a large dataset of 21801 clinical 12-lead ECGs from 18869 patients of 10 second length. The raw signal data has been annotated by up to two cardiologists with 71 different ECG statements and is supplemented by rich metadata.
- Pulse Amplitudes from electrodermal activity collected from healthy volunteer subjects at rest and under controlled sedation: This database of pulse times and amplitudes from electrodermal activity was collected from 11 healthy volunteer subjects who were awake and at rest and 11 (different) healthy volunteer subjects who were under controlled propofol sedation.
- Pulse Transit Time PPG Dataset: Time synchronised multi-site PPG dataset for PTT including sensors’ attachment pressures, temperatures, inertial data from accelerometer and gyroscope, annotated ECG data, blood pressures, as well as blood oxygenation saturation levels (SpO2)
- Q-Pain: A Question Answering Dataset to Measure Social Bias in Pain Management: Q-Pain, a medical QA dataset designed to enable the substitution of multiple different racial and gender "profiles" for patients and to evaluate whether bias is present when deciding whether to prescribe pain medication or not.
- QT Database: One hundred two-lead ECG recordings, many extracted from other databases, with onset, peak, and end markers for P, QRS, T, and U waves.
- Quantitative Dehydration Estimation: Quantitative estimation of dehydration (total body water loss) using bioimpedance measurements, temperature measurements, salivary samples, and sweat samples.
- Radiology Report Generation Models Evaluation Dataset For Chest X-rays (RadEvalX): The RadEvalX is a publicly available dataset developed similarly to the ReXVal dataset. RedEvalX focuses on radiologist evaluations of errors found in automatically generated radiology reports.
- Recordings excluded from the NSR DB: The recordings in this collection were originally selected for inclusion in the MIT-BIH Normal Sinus Rhythm Database, but were excluded after study revealed low-grade arrhythmias.
- Regulation of Brain Cognitive States through Auditory, Gustatory, and Olfactory Stimulation with Wearable Monitoring: This dataset explores safe actuation (music, perfume, coffee) for enhancing cognitive states. It includes subjects’ responses, reaction times, and physiological data (EDA, HR, BVP, PPG, temperature, accelerometer, EEG) during n-back memory tasks.
- Respiratory and heart rate monitoring dataset from aeration study: Respiratory and cardiovascular data collected from 20 subjects. Pressure, flow, aeration, and heart-rate data were collected during trials which included resting breathing, CPAP at varied PEEP settings, breath-holds, and forced expiratory manoeuvres.
- Respiratory dataset from PEEP study with expiratory occlusion: Outlined is a pressure, flow, volume, dynamic circumference, and EIT assessed aeration dataset from resting breathing with REO at increasing CPAP PEEP settings. Vapers, asthmatics, smokers, and otherwise healthy people were included in the trial.
- Response to Valsalva Maneuver in Humans: Functional metrics of autonomic control of heart rate, including baroreflex sensitivity, have been shown to be strongly associated with cardiovascular risk. A decrease in baroreflex sensitivity with …
- RR interval time series from healthy subjects: This database contains RR intervals time series from healthy subjects aged between 1month and 55 years
- Safety and Preliminary Efficacy of Intranasal Insulin for Cognitive Impairment in Parkinson Disease and Multiple System Atrophy: Dataset collected as part of a study that aimed to determine the effects of intranasal insulin on cognition and motor performance in Parkinson's disease.
- Samples of MR Images: These magnetic resonance angiography (MRA) images show coronal slices acquired from consecutive anteroposterior positions within the torso. The study was performed on a 1.5T General Electric (GE) Sig…
- Santa Fe Time Series Competition Data Set B: This is a multivariate data set recorded from a patient in the sleep laboratory of the Beth Israel Hospital (now the Beth Israel Deaconess Medical Center) in Boston, Massachusetts. This data set was …
- SCG-RHC: Wearable Seismocardiogram Signal and Right Heart Catheter Database: This is the first public dataset that contains simultaneous recordings of Right Heart Catheter data (pressure) and chest-worn wearable patch data (electrocardiogram and seismocardiogram signals).
- ScientISST MOVE: Annotated Wearable Multimodal Biosignals recorded during Everyday Life Activities in Naturalistic Environments: Multimodal (ECG, EMG, EDA, PPG, TEMP, ACC) biosignal dataset of everyday activities. Created with 3 wearable devices based on ScientISST Sense and Empatica E4.
- SensSmartTech database of cardiovascular signals synchronously recorded by an electrocardiograph, phonocardiograph, photoplethysmograph and accelerometer: SensSmartTech is a unique multiparametric dataset recorded systematically at rest and during the relaxation after activity. It contains the simultaneously recorded electrocardiogram, phonocardiogram, arterial plethysmograms and seismocardiogram.
- SHDB-AF: a Japanese Holter ECG database of atrial fibrillation: Holter ECG database from Japan, containing data from 100 unique patients with paroxysmal AF including expert annotations of Supraventricular arrhythmias at the beat level.
- Shiraz University Fetal Heart Sounds Database: The Shiraz University (SU) fetal heart sounds database (SUFHSDB) contains fetal and maternal phonocardiogram (PCG) recordings from 109 pregnant women in single and twin pregnancies.
- Siena Scalp EEG Database: The database consists of EEG recordings of 14 epileptic patients acquired at the Unit of Neurology and Neurophysiology of the University of Siena. Subjects include 9 males (ages 25-71) and 5 females (ages 20-58).
- Simulated Fetal Phonocardiograms: This data set is a series of synthetic fetal phonocardiographic signals (PCGs) relative to different fetal states and recording conditions.
- Simulated Obstructive Disease Respiratory Pressure and Flow: Outlined is a pressure, flow, and volume dataset using a using a modular device to simulate the effects of obstructive pulmonary disease in healthy people. 20 healthy subjects were included in this dataset.
- Simultaneous physiological measurements with five devices at different cognitive and physical loads: Dataset to support comparison of usability and accuracy from simultaneous measurements collected from 13 subjects including five devices: NeXus-10 MKII, eMotion Faros 360°, Hexoskin Hx1, SOMNOTouch NIBP, Polar RS800 Multi.
- Sleep Bioradiolocation Database: The database contains 32 records of non-contact sleep monitoring by a bioradar. The records are accompanied by results of sleep scoring, based on polysomnography according to the rules of the America…
- Sleep-EDF Database: The sleep-edf database contains whole-night PolySomnoGraphic sleep recordings, containing EEG, EOG, chin EMG, and event markers.
- Sleep-EDF Database Expanded: The sleep-edf database contains 197 whole-night PolySomnoGraphic sleep recordings, containing EEG, EOG, chin EMG, and event markers. Some records also contain respiration and body temperature. Corres…
- Sleep Heart Health Study PSG Database: Data collected for a prospective cohort study designed to investigate the relationship between sleep disordered breathing and cardiovascular disease.
- Smart Health for Assessing the Risk of Events via ECG Database: Holter recordings of 139 hypertensive patients recruited at the Centre of Hypertension of the University Hospital of Naples Federico II.
- Spontaneous Ventricular Tachyarrhythmia Database: RR interval time series, recorded by implanted cardioverter defibrillators in 78 subjects.
- Squid Giant Axon Membrane Potential: The SGAMP database contains single-unit neuronal recordings of North Atlantic squid (Loligo pealei) giant axons in response to stimulus currents. The membrane potential and stimulus current are given…
- STAFF III Database: The STAFF III database was acquired during 1995-96 at Charleston Area Medical Center (WV, USA) where single prolonged balloon inflation had been introduced to achieve optimal results of percutaneous …
- St Petersburg INCART 12-lead Arrhythmia Database: Annotated ECG recordings extracted from 32 Holter records.
- Stress Recognition in Automobile Drivers: This database, contributed to PhysioNet by its creator, Jennifer Healey, contains a collection of multiparameter recordings from healthy volunteers, taken while they were driving on a prescribed rout…
- St. Vincent's University Hospital / University College Dublin Sleep Apnea Database: Overnight polysomnograms with simultaneous three-channel Holter ECG, from 25 adult subjects with suspected sleep-disordered breathing.
- Sudden Cardiac Death Holter Database: PhysioNet has inaugurated a Sudden Cardiac Death Database to support research and to stimulate progress in this important area of electrophysiology. We initiate this database with 23 complete Holter …
- Surface electromyographic signals collected during long-lasting ground walking of young able-bodied subjects: The dataset is composed of long-lasting surface electromyographic (sEMG) signals recorded from ten muscles during ground walking of 31 young able-bodied subjects in Movement Analysis Lab, Università Politecnica delle Marche, Ancona, Italy.
- Surrogate Data with Correlations, Trends, and Nonstationarities: Data collected for a study on a scaling analysis method used to estimate long-range power-law correlation exponents in noisy signals.
- Synthetic Mention Corpora for Disease Entity Recognition and Normalization: We present the Synthetic Mention Corpora for Disease Entity Recognition and Normalization, containing 128000 disease mentions from the UMLS disorder group, generated by an LLM. This corpus aims to improve these tasks in biomedical and clinical texts.
- Tai Chi, Physiological Complexity, and Healthy Aging - Gait: This project includes gait data collected with footswitches and electromyography data from subjects who walked for 10 minutes under normal conditions and for 90 seconds under dual-task conditions (walking while performing serial subtractions).
- Tappy Keystroke Data: This is the keystroke dataset for the study titled 'High-accuracy detection of early Parkinson's Disease using multiple characteristics of finger movement while typing'. This research report is curre…
- Term-Preterm EHG Database: Electrohysterogram records during regular check-ups at the University Medical Centre Ljubljana between 1997 and 2005.
- Term-Preterm EHG DataSet with Tocogram: Electrohysterogram signals accompanied by a simultaneously recorded external tocogram.
- The CirCor DigiScope Phonocardiogram Dataset: A large collection of multi-location heart sound signals, with 5272 records collected from 1568 subjects. Heart murmurs have been annotated by a human annotator based on their time, shape, pitch, grading, quality, location and location intensity.
- Treadmill Maximal Exercise Tests from the Exercise Physiology and Human Performance Lab of the University of Malaga: Cardiorespiratory measurements of 992 treadmill maximal graded exercise tests. Heart rate, oxygen consumption, carbon dioxide generation, and pulmonary ventilation are provided.
- T-Wave Alternans Challenge Database: Multichannel ECG records collected for the 2008 Computers In Cardiology Challenge.
- Two-tiered response of cardiorespiratory-cerebrovascular networks to orthostatic challenge: Dataset comprising the following physiological signals: blood pressure, respiratory rate, blood flow velocity in the middle cerebral arteries and tissue hemoglobin concentration in the prefrontal cortex.
- UniCA ElectroTastegram Database (PROP): Differential biopotential measurements recorded from the tongues of 39 healthy voluntary human subjects.
- Video Pulse Signals in Stationary and Motion Conditions: Pulse signal recordings obtained from 15 healthy volunteers.
- Visceral adipose tissue measurements during pregnancy: Maternal visceral adipose tissue measurements collected as part of a cohort study of 154 pregnant women.
- VitalDB, a high-fidelity multi-parameter vital signs database in surgical patients: VitalDB, a high-fidelity multi-parameter vital signs database in surgical patients
- VOICED Database: This database includes 208 voice samples, from 150 pathological, and 58 healthy voices.
- VTaC: A Benchmark Dataset of Ventricular Tachycardia Alarms from ICU Monitors: VTaC is an annotated ventricular tachycardia (VT) arrhythmia alarm database containing over 5,000 waveform recordings with VT alarms from ICU monitors, with each alarm labeled as either true or false by at least two human expert annotators.
- Wearable-based signals during physical exercises from patients with frailty after open-heart surgery: A data collection contains a wearable-based electrocardiogram and triaxial acceleration signals of 80 elderly patients with frailty after an open-heart surgery. The signals were collected while the patients were performing a series of exercise tests.
- Wearable Device Dataset from Induced Stress and Structured Exercise Sessions: Physiological signals(Electrodermal Activity,Blood Volume Pulse, Heart Rate, Temperature,etc) from 36 healthy volunteers collected during structured acute stress induction and aerobic/anaerobic exercise sessions using the Empatica E4 wearable device.
- Wide-field calcium imaging sleep state database: Wide-field calcium imaging database that consists of annotated sleep recording collected from transgenic mice at Washington University of St Louis School of Medicine.
- Wilson Central Terminal ECG Database: Wilson Central Terminal ECG signals recorded from 92 patients.
- Wrist PPG During Exercise: Photoplethysmogram recorded from 8 volunteers during walking, running and bike riding.
Restricted databases
- A database of hand kinematics, high-density sEMG of forearm and wrist for motion intent recognition: A database of hand kinematics, high-density sEMG of forearm and wrist.
- Application of Med-PaLM 2 in the refinement of MIMIC-CXR labels: This work further refines the labels associated with CheXpert in MIMIC-CXR-JPG 2.0.0 by filtering with Med-PaLM 2 followed by verification by manual review by three US board-certified radiologists.
- BigIdeasLab_STEP: Heart rate measurements captured by smartwatches for differing skin tones: Comparison of HR values reported by ECG (Bittium Faros) and wearables including Biovotion Everion, Empatica E4, Apple Watch, Garmin, Fitbit, and Xiaomi Miband.
- Bridge2AI-Voice: An ethically-sourced, diverse voice dataset linked to health information: A dataset of voice recordings and metadata to enable the development, benchmarking, and validation of clinically applicable machine-learning models for diagnosing a wide range of health conditions.
- CheXchoNet: A Chest Radiograph Dataset with Gold Standard Echocardiography Labels: Early detection of heart failure is vital for improving outcomes. The dataset contains 71,589 CXRs paired with gold standard labels from echocardiograms to enable the training of models to detect pathologies indicative of early stage heart failure.
- Community-Acquired Pneumonia, Endotypes and Phenotypes (NACef): Prospective, observational cohort study of Translational Medicine: Community-Acquired Pneumonia (CAP) poses a significant health risk, linked to high in-hospital morbidity and mortality rates. The dataset includes clinical details of 768 CAP patients at Clinica Universidad de La Sabana, Colombia.
- Computed Tomography Images for Intracranial Hemorrhage Detection and Segmentation: Head computed tomography (CT) scans with intracranial hemorrhage (ICH) segmentation, ICH subtypes and skull fracture.
- CXRGraph: Using Information Extraction to Normalize the Training Data for Automatic Radiology Report Generation: CXRGraph is a structured radiology report dataset built upon RadGraph and tailored for the Automatic Radiology Report Generation task. It can identify more task-relevant information such as abnormalities and hallucinated prior references.
- DREAMT: Dataset for Real-time sleep stage EstimAtion using Multisensor wearable Technology: Dataset for Real-time sleep stage EstimAtion using Multisensor wearable Technology
- Endoscapes2023, A Critical View of Safety and Surgical Scene Segmentation Dataset for Laparoscopic Cholecystectomy: Endoscapes2023 enables the development of models for object detection, semantic and instance segmentation, and Critical View of Safety (CVS) prediction, contributing to safe laparoscopic cholecystectomy.
- Flatten: COVID-19 Survey Data on Symptoms, Demographics and Mental Health in Canada: Freely accessible COVID-19 symptom dataset surveying Canadians and gathered from March to July of 2020 by the global humanitarian aid non-profit Flatten. This dataset of 294,106 surveys gathered from March 23rd to July 30th in 2020.
- Gout Emergency Department Chief Complaint Corpora: A corpus of chief complaints tagged with predicted gout flare status and chart reviewed gout flare status. Ideal for input to masked language model training to supplement lengthy clinical text notes.
- Hospitalized patients with heart failure: integrating electronic healthcare records and external outcome data: The new version added beta blockers in the dat_md.csv file. Dataset comprising hospital-level data on patients who were admitted with heart failure to Zigong Fourth People’s Hospital, Sichuan, China between 2016 and 2019.
- In-hospital physical activity measured with a new Bosch accelerometer sensor system: Measurements of physical activity with wrist-worn Bosch sensor platform to test predictive performance for the duration of hospitalization and readmission in 58 patients with acute illnesses in internal medicine
- Kinematic dataset of actors expressing emotions: 1402 kinematic recordings of twenty-two semi-professional actors expressing emotions such as happiness, sadness, anger, fear, disgust, and surprise.
- KURIAS-ECG: a 12-lead electrocardiogram database with standardized diagnosis ontology: The KURIAS-ECG database is a high-quality 12-lead ECG DB including standard vocabulary (SNOMED CT, OMOP-CDM), and ECG diagnoses of our DB are grouped into 10 diagnoses by applying the minnesota code.
- LATTE-CXR: Locally Aligned TexT and imagE, Explainable dataset for Chest X-Rays: This dataset includes bounding box-statement pairs for chest X-ray images, derived from radiologists’ eye-tracking data (for explainability) and annotations, for local visual-language models.
- MIMIC-Eye: Integrating MIMIC Datasets with REFLACX and Eye Gaze for Multimodal Deep Learning Applications: MIMIC-Eye: Integrating MIMIC Datasets with REFLACX and Eye Gaze for Multimodal Deep Learning Applications
- MIMIC-IV-Ext-DiReCT: A diagnostic reasoning dataset designed to evaluate the performance of large language models in aligning with human doctors when making diagnoses from clinical notes.
- Multimodal Physiological Indices During Surgery Under Anesthesia: Multimodal physiological indices collected during surgery when patients were under anesthesia
- Multimodal Physiological Monitoring During Virtual Reality Piloting Tasks: Physiologic and flight performance data from a virtual reality task. Data for the CogPilot Data Challenge.
- Multitaper spectra recorded during GABAergic anesthetic unconsciousness: EEG power spectra recorded during anesthesia
- OpenOximetry Repository: A repository of matched arterial oxygen and pulse oximeter readings obtained under controlled conditions, with high-frequency physiologic waveforms and skin color measurements.
- Organ Retrieval and Collection of Health Information for Donation (ORCHID): Multi-center dataset on organ procurement in the United States
- Pulmonary Edema Severity Grades Based on MIMIC-CXR: Pulmonary edema metadata and labels for MIMIC-CXR
- REFLACX: Reports and eye-tracking data for localization of abnormalities in chest x-rays: This dataset contains 3032 cases of eye-tracking data collected while five radiologists dictated reports for frontal chest x-rays, synchronized timestamped dictation transcription, and manual labels for validation of localization of abnormalities.
- Smartphone-Captured Chest X-Ray Photographs: Smartphone-captured CXR images including photographs taken from MIMIC-CXR and CheXpert, photographs taken by resident doctors, and photographs taken with different devices.
- TAME Pain: Trustworthy AssessMEnt of Pain from Speech and Audio for the Empowerment of Patients: TAME Pain is a dataset that captures acoustic signals of pain and is augmented by annotating every sentence the participant speaks with details such as background and foreground noise, speech errors, and non-speech vocal features.
- Upper body thermal images and associated clinical data from a pilot cohort study of COVID-19: Thermal videos of people with positive and negative COVID-19 tests.
- VinDr-Mammo: A large-scale benchmark dataset for computer-aided detection and diagnosis in full-field digital mammography: A large-scale benchmark dataset for computer-aided detection and diagnosis in mammography
- VinDr-PCXR: An open, large-scale pediatric chest X-ray dataset for interpretation of common thoracic diseases: An open, large-scale pediatric chest X-ray dataset that contains both lesion-level labels and image-level labels for multiple findings and diseases for interpretation of common thoracic diseases.
- VinDr-SpineXR: A large annotated medical image dataset for spinal lesions detection and classification from radiographs: VinDr-SpineXR: A large annotated medical image dataset for spinal lesions detection and classification from radiographs
- Visual Question Answering evaluation dataset for MIMIC CXR: This dataset provides 224 VQAs for 40 test set cases, and 111 VQAs for 23 validation set cases of the MIMIC CXR dataset.
Credentialed databases
- A Brazilian Multilabel Ophthalmological Dataset (BRSET): This is the first Brazilian Multilabel Ophthalmological Dataset with demographic information and retinal photos labeled images according to anatomical parameters, quality control, and presumed diagnosis.
- AMR-UTI: Antimicrobial Resistance in Urinary Tract Infections: AMR-UTI is a freely accessible dataset, derived from electronic health record (EHR) information on over 100,000 urinary tract infections (UTI) treated at Massachusetts General Hospital and Brigham & Women's Hospital in Boston, MA, USA.
- Annotated MIMIC-IV discharge summaries for a study on deidentification of names: Annotated MIMIC-IV discharge summaries used to explore deidentification of names
- Annotated Question-Answer Pairs for Clinical Notes in the MIMIC-III Database: Annotated Question Answering Pairs for Clinical Notes in the MIMIC-III Database
- Annotation dataset of problematic opioid use and related contexts from MIMIC-III Critical Care Database discharge summaries: The database contains a corpus of annotated data from the MIMIC-III Critical Care Database from a study that aimed to develop and apply an annotation schema to characterize opioid use disorder and related contextual factors.
- Annotation dataset of social determinants of health from MIMIC-III Clinical Care Database: Annotation dataset of social determinants of health from MIMC-III Clinical Care Database notes.
- A Temporal Dataset for Respiratory Support in Critically Ill Patients: A benchmark dataset offering hourly records over a 90-day period for 50,920 ICU subjects, including dynamic pulmonary function data and a spectrum of covariates for respiratory intervention analyses.
- BOLD, a blood-gas and oximetry linked dataset: An open-source pulse oximetry and arterial blood gas dataset, derived from MIMIC-III, MIMIC-IV, and eICU-CRD
- BRAX, a Brazilian labeled chest X-ray dataset: BRAX contains 24,959 chest radiography exams and 40,967 images acquired in a large general Brazilian hospital. All images have been read by trained radiologists and 14 labels were derived from Brazilian Portuguese reports using NLP.
- CAD-Chest: Comprehensive Annotation of Diseases based on MIMIC-CXR Radiology Report: The CAD-Chest dataset provides comprehensive annotations of disease, including disease severity, uncertainty, and location based on the MIMIC-CXR radiologist reports.
- Chest ImaGenome Dataset: The Chest ImaGenome dataset is a scene graph dataset with additional chronological comparison relations for chest X-rays. It is automatically derived from the MIMIC-CXR dataset. A manually annotated gold standard is also available for 500 patients.
- Chest X-ray Dataset with Lung Segmentation: CXLSeg dataset: Chest X-ray with Lung Segmentation, a comparatively large dataset of segmented Chest X-ray radiographs based on the MIMIC-CXR dataset. This contains segmentation results of 243,324 frontal view images and corresponding masks.
- Chest X-ray segmentation images based on MIMIC-CXR: A chest x-rays segmentation dataset derived from MIMIC-CXR based on deep learning algorithm and human examination.
- CHIFIR: Cytology and Histopathology Invasive Fungal Infection Reports: A corpus of cytology and histopathology reports annotated for terminology relevant to fungal infections. Ideal for validation of named entity recognition and relation extraction methods.
- CLIP: A Dataset for Extracting Action Items for Physicians from Hospital Discharge Notes: Clinical action items annotated over MIMIC-III. 718 discharge summaries are labeled at a sentence- and character-level with multiple action labels including Appointment, Lab, Procedure, Medication, Imaging, Patient Instructions, and Other.
- Comprehensive Polysomnography (CPS) Dataset: A Resource for Sleep-Related Arousal Research: This dataset includes polysomnographic sleep recordings from a study on sleep-related arousal diagnostics, featuring raw and derived data channels, annotated event types, and questionnaire data.
- CORAL: expert-Curated medical Oncology Reports to Advance Language model inference: Medical oncology progress notes annotated with advanced, comprehensive oncology-relevant concepts and relationships.
- CovIdentify Dataset: This contains wearable device data from Fitbit, Garmin, and Apple Watch users. The data is from April 2nd, 2020 to March 21st, 2021 and has been date-shifted. An appropriate amount has also shifted test dates for each user.
- C-REACT: Contextualized Race and Ethnicity Annotations for Clinical Text: Two sets of gold-standard annotations for race and ethnicity information from clinical notes in MIMIC-III. Contains race and ethnicity label assignments and related information such as country of origin and spoken language.
- Critical care database comprising patients with infection at Zigong Fourth People's Hospital: Routinely collected data from critical care units at Zigong Fourth People’s Hospital, Sichuan, China for patients admitted between January 2019 and December 2020 Missing information on temperature are updated in the new version.
- Curated Data for Describing Blood Glucose Management in the Intensive Care Unit: The data subsets consist of time series files that includes all the curated entries of glucose readings and insulin inputs from MIMIC-III database.
- CXR-PRO: MIMIC-CXR with Prior References Omitted: CXR-PRO is an adaptation of the MIMIC-CXR dataset (consisting of chest radiographs and their associated free-text radiology reports) with references to non-existent priors removed.
- Deidentified Medical Text: Gold standard corpus of 2,434 deidentified nursing notes
- DrugEHRQA: A Question Answering Dataset on Structured and Unstructured Electronic Health Records For Medicine Related Queries: DrugEHRQA is a QA dataset containing question-answers from MIMIC-III tables and discharge summaries.
- EchoNotes Structured Database derived from MIMIC-III (ECHO-NOTE2NUM): A structured echocardiogram database derived from 43,472 observational notes obtained during echocardiogram studies conducted in the intensive care unit at the Beth Israel Deaconess Medical Center between 2001 and 2012.
- EHRCon: Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records: Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records
- EHR-DS-QA: A Synthetic QA Dataset Derived from Medical Discharge Summaries for Enhanced Medical Information Retrieval Systems: Dataset consisting of question and answer pairs synthetically generated from medical discharge summaries, designed to facilitate the training and development of large language models specifically tailored for healthcare applications
- EHRNoteQA: An LLM Benchmark for Real-World Clinical Practice Using Discharge Summaries: An LLM Benchmark for Real-World Clinical Practice Using Discharge Summaries
- EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images: We present EHRXQA, the first multi-modal EHR QA dataset combining structured patient records with aligned chest X-ray images. EHRXQA contains a comprehensive set of QA pairs covering image-related, table-related, and image+table-related questions.
- eICU Collaborative Research Database: Multi-center database comprising deidentified health data associated with over 200,000 admissions to ICUs across the United States between 2014-2015.
- Electrodermal Activity of Healthy Volunteers while Awake and at Rest: This database of electrodermal activity was collected from 11 healthy volunteer subjects who were awake and at rest in seated position and 11 (different) healthy volunteer subjects who were under computer-controlled propofol sedation.
- ENCoDE, mEasuring skiN Color to correct pulse Oximetry DisparitiEs: skin tone and clinical data from a prospective trial on acute care patients.: A prospective collected EHR-linked skin tone measurements database in OMOP format with emphasis on pulse oximetry disparities.
- Establishment of a Chinese critical care database from electronic healthcare records in a tertiary care medical center: Chinese critical care database from electronic healthcare records in a tertiary care medical center
- Eye Gaze Data for Chest X-rays: This dataset was a collected using an eye tracking system while a radiologist interpreted and read 1,083 public CXR images. The dataset contains the following aligned modalities: image, transcribed report text, dictation audio and eye gaze data.
- FFA-IR: Towards an Explainable and Reliable Medical Report Generation Benchmark: Benchmark dataset for report generation based on fundus fluorescein angiography images and reports.
- Generalized Image Embeddings for the MIMIC Chest X-Ray dataset: This database contains compact information-rich embeddings of the MIMIC-CXR Database v2.0.0 using the CXR Foundation API v1.0.
- GLOBEM Dataset: Multi-Year Datasets for Longitudinal Human Behavior Modeling Generalization: GLOBEM datasets contain the first released multi-year mobile and wearable sensing datasets from 2018 to 2021, containing 705 person-years and 497 unique participants.
- GOSSIS-1-eICU, the eICU-CRD subset of the Global Open Source Severity of Illness Score (GOSSIS-1) dataset: GOSSIS-1 is an in-hospital mortality prediction algorithm for critical care patients. GOSSIS-1 was trained using data from three countries. This dataset corresponds with the USA subset of the GOSSIS-1 dataset for the 2022 publication below.
- INSPIRE, a publicly available research dataset for perioperative medicine: A public dataset that contains information related to surgery, anesthesia, laboratory results, medications, diagnosis, and outcomes from 50% of the patients who received surgery at Seoul National University Hospital between 2011 and 2020.
- Learning to Ask Like a Physician: a Discharge Summary Clinical Questions (DiSCQ) Dataset: Dataset of questions asked by medical experts about patients. Medical experts will read a discharge summary line-by-line and (1) ask any question that they may have and (2) record what in the text "triggered" them to ask their question.
- LLaVA-Rad MIMIC-CXR Annotations: This dataset provides GPT-4 extracted sections of radiology reports from MIMIC-CXR, complementing rule-based section extractions with additional reports with findings, and removing references to priors from findings.
- Maternal fat ultrasound measurement and nutritional assessment during pregnancy: A dataset centered in gestational outcomes: Dataset collected as part of a prospective study in which abdominal maternal fat tissue measurements were compared with outcomes during hospitalization for labor and delivery.
- mBRSET, a Mobile Brazilian Retinal Dataset: mBRSET - a Mobile Brazilian Retinal Dataset
- MedDec: Medical Decisions for Discharge Summaries in the MIMIC-III Database: Annotations of ten types of medical decisions from discharge summaries in the MIMIC-III database.
- Medical-CXR-VQA dataset: A Large-Scale LLM-Enhanced Medical Dataset for Visual Question Answering on Chest X-Ray Images: Medical-CXR-VQA provides a large-scale LLM-enhanced dataset for visual question answering in medical chest x-ray images.
- Medical-Diff-VQA: A Large-Scale Medical Dataset for Difference Visual Question Answering on Chest X-Ray Images: MIMIC-Diff-VQA provides a large-scale dataset for Difference visual question answering in medical chest x-ray images.
- Medical Expert Annotations of Unsupported Facts in Doctor-Written and LLM-Generated Patient Summaries: Annotations for unsupported facts in 100 original MIMIC patient summaries (discharge instructions) and hallucinations in 100 Large Language Model (LLM) generated patient summaries labeled by two medical experts.
- Medication Extraction Labels for MIMIC-IV-Note Clinical Database: Medication extraction NLP labels for 600 discharge summaries in MIMIC-IV-Note dataset.
- MedNLI - A Natural Language Inference Dataset For The Clinical Domain: This is a resource for training machine learning models for language inference in the medical domain.
- MedNLI for Shared Task at ACL BioNLP 2019: Data for the MedNLI Shared Task at the 2019 ACL BioNLP 2019 Workshop on Biomedical Language Processing
- MIMIC-CXR Database: Chest radiographs in DICOM format with associated free-text reports.
- MIMIC-CXR-JPG - chest radiographs with structured labels: Chest x-rays in JPG format with structured labels derived from the associated radiology report.
- MIMICEL: MIMIC-IV Event Log for Emergency Department: MIMIC-IV Event Log for Emergency Department
- MIMIC-Ext-MIMIC-CXR-VQA: A Complex, Diverse, And Large-Scale Visual Question Answering Dataset for Chest X-ray Images: We introduce MIMIC-Ext-MIMIC-CXR-VQA, a complex, diverse, and large-scale dataset designed for Visual Question Answering (VQA) tasks within the medical domain, focusing primarily on chest radiographs.
- MIMIC-II Clinical Database: Electronic health record data collected from >30,000 patients admitted to ICUs at a single tertiary care hospital.
- MIMIC-III and eICU-CRD: Feature Representation by FIDDLE Preprocessing: Features and labels from MIMIC-III and eICU-CRD produced by FIDDLE, an EHR preprocessing pipeline.
- MIMIC-III Clinical Database: MIMIC-III is a large, freely-available database comprising deidentified health-related data associated with over forty thousand patients who stayed in critical care units of the Beth Israel Deaconess…
- MIMIC-III Clinical Database CareVue subset: A subset of the MIMIC-III Clinical Database containing only patients admitted from 2001 - 2008.
- MIMIC-III - SequenceExamples for TensorFlow modeling: MIMIC-III data converted into TensorFlow SequenceExample format, for use in modeling pipelines.
- MIMIC-IV: Large database of de-identified health information from patients admitted to Beth Israel Deaconess Medical Center
- MIMIC-IV-ECG-Ext-ICD: Diagnostic labels for MIMIC-IV-ECG: Dataset that links ECG records from MIMIC-IV-ECG to ED discharge and hospital discharge diagnoses, which enables to train general ECG prediction models based on clinical labels and facilitates the retrieval of further clinical metadata from MIMIC-IV.
- MIMIC-IV-ECHO: Echocardiogram Matched Subset: The MIMIC-IV-ECHO module contains more than 500,000 echocardiograms across more than 4,500 unique patients. These patients overlap with the patients from the MIMIC-IV Clinical Database.
- MIMIC-IV-ED: A large database of emergency department admissions.
- MIMIC-IV-Ext-BHC: Labeled Clinical Notes Dataset for Hospital Course Summarization: This dataset presents a collection of preprocessed and labeled clinical notes derived from "MIMIC-IV-Note", and aims to facilitate the development of ML models focused on summarizing brief hospital courses (BHC) from clinical notes.
- MIMIC-IV-Ext Clinical Decision Making: A MIMIC-IV Derived Dataset for Evaluation of Large Language Models on the Task of Clinical Decision Making for Abdominal Pathologies: A curated set of ED clinical decision making cases for four abdominal pathologies. Each case contains the exams required to diagnose including HPI, physical examination, laboratory tests, and imaging. Relevant treatment information is also included.
- MIMIC-IV-Ext-GPT-3_5-Generated-Discharge-Summaries-for-Low-Resource-Codes: 9,606 Synthetic Discharge Summaries generated by GPT-3.5 based on combinations of ICD-10-code descriptions associated with real discharge summaries in MIMIC-IV. Focus on low resource codes.
- MIMIC-IV-Ext-MDS-ED: Multimodal Decision Support in the Emergency Department - a Benchmark Dataset for Diagnoses and Deterioration Prediction in Emergency Medicine: MIMIC-IV-ext-MDS-ED proposes a dataset to benchmark multimodal decision support in the emergency department. It features multimodal input (including ECG waveforms) and a comprehensive set of prediction targets (diagnoses and deterioration prediction)
- MIMIC-IV-Note: Deidentified free-text clinical notes: Deidentified free-text clinical notes for patients in the MIMIC-IV Clinical Database.
- MIMIC-IV on FHIR: MIMIC-IV and MIMIC-IV-ED data mapped into FHIR resources.
- MS-CXR: Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing: MS-CXR is a new dataset containing 1162 chest X-ray bounding box labels paired with radiology text descriptions, annotated and verified by two board-certified radiologists.
- MS-CXR-T: Learning to Exploit Temporal Structure for Biomedical Vision-Language Processing: The MS-CXR-T is a multimodal benchmark that enhances the MIMIC-CXR v2 dataset by including expert-verified annotations. Its goal is to evaluate biomedical visual-language processing models in terms of temporal semantics extracted from image and text.
- National Institutes of Health Stroke Scale (NIHSS) Annotations for the MIMIC-III Database: A dataset of annotated NIHSS scale items and corresponding scores from stroke patients discharge summaries in MIMIC-III.
- NCH Sleep DataBank: A Large Collection of Real-world Pediatric Sleep Studies with Longitudinal Clinical Data: The NCH Sleep DataBank includes 3,984 pediatric sleep studies on 3,673 unique patients conducted at Nationwide Children's Hospital between 2017 and 2019. It contains polysomnography (PSG), clinical annotations, and longitudinal clinical data.
- Neurocritical care waveform recordings in pediatric patients: The database contains waveform recordings, including arterial blood pressure, intracranial pressure, and cerebral blood flow velocity, from pediatric patients in neurocritical care.
- Northwestern ICU (NWICU) database: A freely available COVID-rich ICU database comprising de-identified health-related data from Northwestern Memorial Health Center (NHMC).
- Nosocomial Risk Datasets from MIMIC-III: Text-based Longitudinal Data for Predicting Nosocomial Disease Risk as used by CANTRIP.
- ODD: A Benchmark Dataset for the NLP-based Opioid Related Aberrant Behavior Detection: Opioid-related aberrant behaviors (ORABs) detection Dataset (ODD) which is a large-size, expert-annotated, and multi-label classification benchmark dataset corresponding to the task
- Paediatric Intensive Care database: PIC (Paediatric Intensive Care) is a large paediatric-specific, single-centre, bilingual database comprising information relating to children admitted to critical care units at a large children’s hospital in China.
- Phenotype Annotations for Patient Notes in the MIMIC-III Database: Clinical notes, annotated by at least two expert annotators for over ten patient phenotypes, including advanced cancer, substance abuse, and treatment non-adherence.
- RadCoref: Fine-tuning coreference resolution for different styles of clinical narratives: RadCoref is a small subset of MIMIC-CXR with manually annotated coreference mentions and clusters. Based on the annotated data, we fine-tuned a deep neural model and used it to annotate the whole MIMIC-CXR dataset. Both data are available.
- RadGraph2: Tracking Findings Over Time in Radiology Reports: RadGraph2 is a dataset of 800 chest radiology reports annotated using a fine-grained entity-relationship schema, which captures key findings as well as mentions of changes that occurred in comparison with the previous radiology studies.
- RadGraph: Extracting Clinical Entities and Relations from Radiology Reports: RadGraph is a dataset of entities and relations in full-text chest X-ray radiology reports, which are obtained using a novel information extraction (IE) schema to capture clinically relevant information in a radiology report.
- RaDialog Instruct Dataset: Image-based instruct data for Chest X-Ray understanding and analysis.
- Radiology Report Expert Evaluation (ReXVal) Dataset: The Radiology Report Expert Evaluation (ReXVal) Dataset is a publicly available dataset of radiologist evaluations of errors in automatically generated radiology reports.
- RadNLI: A natural language inference dataset for the radiology domain: A radiology NLI dataset introduced in the paper: Improving Factual Completeness and Consistency of Image-to-text Radiology Report Generation
- RadQA: A Question Answering Dataset to Improve Comprehension of Radiology Reports: RadQA is an electronic health record question answering dataset containing clinical questions that can be answered using the Findings and Impressions sections of radiology reports
- ReFiSco: Report Fix and Score Dataset for Radiology Report Generation: Preliminary human expert evaluation study on 60 MIMIC-CXR radiology reports
- ReXPref-Prior: A MIMIC-CXR Preference Dataset for Reducing Hallucinated Prior Exams in Radiology Report Generation: We propose ReXPref-Prior, an adapted version of MIMIC-CXR where GPT-4 has removed references to prior exams from both findings and impression sections of chest X-ray reports.
- RuMedNLI: A Russian Natural Language Inference Dataset For The Clinical Domain: RuMedNLI is the full counterpart dataset of MedNLI in Russian language.
- SCRIPT CarpeDiem Dataset: demographics, outcomes, and per-day clinical parameters for critically ill patients with suspected pneumonia: SCRIPT seeks to delineate the host/pathogen interactions during pneumonia using multiomic analysis of bronchoalveolar lavage fluid joined with clinical data and physician adjudication.
- SCRIPT X2B8 Dataset: per-day clinical features to model successful next-day extubation: This dataset contains electronic health record (EHR) data from ICU patients receiving mechanical ventilation, aggregated on a daily basis, along with annotations of intubation, extubation, tracheostomy days, and cases of failed extubation. Data can b
- Symile-MIMIC: a multimodal clinical dataset of chest X-rays, electrocardiograms, and blood labs from MIMIC-IV: A multimodal clinical dataset consisting of CXRs, ECGs, and blood labs, designed to evaluate Symile, a simple contrastive loss that accommodates any number of modalities and allows any model to produce representations for each modality.
- Synthetic Acute Hypotension and Sepsis Datasets Based on MIMIC-III and Published as Part of the Health Gym Project: This repository hosts the original Health Gym datasets of Acute Hypotension and Sepsis
- Tasks 1 and 3 from Progress Note Understanding Suite of Tasks: SOAP Note Tagging and Problem List Summarization: We introduce a hierarchical annotation suite of tasks addressing clinical text understanding, reasoning and abstraction over evidence, and diagnosis summarization. One task is section tagging major section and the other task is diagnosis generation.
- TherLid: A Thermometry Linked Dataset: TherLiD is an open-source dataset of 13,251 paired temperature readings (contact and infrared) from MIMIC-IV and eICU databases. With added demographics and derived data, it supports research on racial and ethnic disparities in infrared thermometry.
- VinDr-CXR: An open dataset of chest X-rays with radiologist annotations: VinDr-CXR: An open dataset of chest X-rays with radiologist's annotations
Contributor Review databases
- A multimodal dental dataset facilitating machine learning research and clinic services: A new dental dataset that contains 169 patients, three commonly used dental image models, and images of various health conditions of the oral cavity.
- BRATECA (Brazilian Tertiary Care Dataset): a Clinical Information Dataset for the Portuguese Language: Brazilian clinical dataset containing over 70,000 admissions from 10 hospitals in two Brazilian states.
- CARMEN-I: A resource of anonymized electronic health records in Spanish and Catalan for training and testing NLP tools: CARMEN-I is a Spanish corpus of 2,000 clinical records from Hospital Clínic, Barcelona. It covers COVID-19 patients and comorbidities, serving as a resource for training clinical NLP models and researchers in NLP applied to clinical documents.
- Chest Computed Tomography for patients with sepsis in the Emergency Department: The database is intended to support a wide array of research studies involving radiomics in sepsis patients, helping to reduce barriers to the reproducibility of clinical research.
- COVID Data for Shared Learning (CDSL): A comprehensive, multimodal COVID-19 dataset from HM Hospitales: COVID Data for Shared Learning (CDSL) is a multimodal database comprising de-identified structured health data and radiological images from 4,479 patients with COVID-19, as a comprehensive toolkit for developing predictive models.
- Electroencephalogram dynamics during unconsciousness mediated by GABAergic-anesthetics: Electroencephalogram dynamics during unconsciousness mediated by GABAergic anesthetics.
- HiRID, a high time-resolution ICU dataset: The HiRID database contains a large selection of all routinely collected data relating to patient admissions to the Department of Intensive Care Medicine of the Bern University Hospital, Switzerland (ICU).
- Salzburg Intensive Care database (SICdb), a freely accessible intensive care database: The SICdb dataset, version 1.0.8 contains 27350 admissions to an ICU in an Austrian tertiary care institution.