Name: TherLid: A Thermometry Linked Dataset
Published: Jan. 21, 2025
License: https://github.com/MIT-LCP/license-and-dua/tree/master/drafts

Database Credentialed Access

Jeremy Tan , Inês Martins , João Matos , Tiago Filipe Sousa Gonçalves , Tetsu Ohnuma , Jaime dos Santos Cardoso , Leo Anthony Celi , Vijay Krishnamoorthy , Andrea Lane , An Kwok Wong

Published: Jan. 21, 2025. Version: 1.0.0

When using this resource, please cite: (show more options)
Tan, J., Martins, I., Matos, J., Sousa Gonçalves, T. F., Ohnuma, T., dos Santos Cardoso, J., Celi, L. A., Krishnamoorthy, V., Lane, A., & Wong, A. K. (2025). TherLid: A Thermometry Linked Dataset (version 1.0.0). PhysioNet. https://doi.org/10.13026/tkww-8b64.

MLA	Tan, Jeremy, et al. "TherLid: A Thermometry Linked Dataset" (version 1.0.0). PhysioNet (2025), https://doi.org/10.13026/tkww-8b64.
APA	Tan, J., Martins, I., Matos, J., Sousa Gonçalves, T. F., Ohnuma, T., dos Santos Cardoso, J., Celi, L. A., Krishnamoorthy, V., Lane, A., & Wong, A. K. (2025). TherLid: A Thermometry Linked Dataset (version 1.0.0). PhysioNet. https://doi.org/10.13026/tkww-8b64.
Chicago	Tan, Jeremy, Martins, Inês, Matos, João, Sousa Gonçalves, Tiago Filipe, Ohnuma, Tetsu, dos Santos Cardoso, Jaime, Celi, Leo Anthony, Krishnamoorthy, Vijay, Lane, Andrea, and An Kwok Wong. "TherLid: A Thermometry Linked Dataset" (version 1.0.0). PhysioNet (2025). https://doi.org/10.13026/tkww-8b64.
Harvard	Tan, J., Martins, I., Matos, J., Sousa Gonçalves, T. F., Ohnuma, T., dos Santos Cardoso, J., Celi, L. A., Krishnamoorthy, V., Lane, A., and Wong, A. K. (2025) 'TherLid: A Thermometry Linked Dataset' (version 1.0.0), PhysioNet. Available at: https://doi.org/10.13026/tkww-8b64.
Vancouver	Tan J, Martins I, Matos J, Sousa Gonçalves T F, Ohnuma T, dos Santos Cardoso J, Celi L A, Krishnamoorthy V, Lane A, Wong A K. TherLid: A Thermometry Linked Dataset (version 1.0.0). PhysioNet. 2025. Available from: https://doi.org/10.13026/tkww-8b64.

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

APA	Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
MLA	Goldberger, A., et al. "PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220." (2000).
CHICAGO	Goldberger, A., L. Amaral, L. Glass, J. Hausdorff, P. C. Ivanov, R. Mark, J. E. Mietus, G. B. Moody, C. K. Peng, and H. E. Stanley. "PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220." (2000).
HARVARD	Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P.C., Mark, R., Mietus, J.E., Moody, G.B., Peng, C.K. and Stanley, H.E., 2000. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
VANCOUVER	Goldberger A, Amaral L, Glass L, Hausdorff J, Ivanov PC, Mark R, Mietus JE, Moody GB, Peng CK, Stanley HE. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

Abstract

A recent study showed that infrared (IR) sensors may be prone to calibration discrepancies among darker-pigmented patients. Similar disparities have already been verified in pulse oximetry. This raises questions about whether thermometry measurements may inadvertently overlook hypothermia, fever, or sepsis cases, potentially leading to delayed diagnoses and ultimately exacerbating poorer outcomes among vulnerable subpopulations.

TherLiD is a derived dataset from 3 Electronic Health Record databases: MIMIC-IV, eICU-CRD-1, and eICU-CRD-2. It consists of 13,251 temperature pairs, along with comprehensive demographic data and time-synchronized hospital information, offering a detailed profile of each patient. These pairs have one reference (contact thermometers - oral, core, and rectal) and one infrared-based (temporal) temperature value measured within a 1-hour time window, with temperature values between 30°C to 45°C. TherLiD not only provides high-quality, clinically relevant data but also offers a reproducible framework, allowing researchers to tailor the dataset to their specific research needs, including training machine learning models. This dataset was also built to facilitate temperature-related retrospective studies and promote research on racial and ethnic healthcare disparities.

Background

Body temperature monitoring and management is a key factor in critically ill patients' treatment. It can be measured using contact or non-contact (infrared {IR}-based) thermometers. During the COVID-19 pandemic, IR thermometers gained popularity in both hospital and non-hospital settings due to their ease of use, cleanliness, user-friendliness, and ability to quickly provide contactless temperature readings.

However, a recent study by Bhavani and colleagues showed that IR sensors may be prone to calibration discrepancies among darker-pigmented patients. Unlike oral (contact) measurements, temporal (IR-based) measurements were associated with lower odds of identifying fever among Black patients, when compared to White patients. Similar disparities have already been verified in pulse oximetry and, with associated inequities in oxygen therapies and increased mortality rates among these subpopulations.[1,2] Additionally, as highlighted in melanoma research, racial disparities in healthcare outcomes are often linked to delayed detection and reduced access to appropriate interventions.[3]

Finally, a comprehensive study of over 70 million patients has established that minority racial groups in the United States experience significantly higher rates of sepsis and elevated mortality compared to their White counterparts.[4] This disparity is multifactorial, with contributing factors including differences in clinical characteristics, disparities in hospital quality, genomic variations, and socioeconomic and environmental determinants.[5-7] While these factors have been well-documented, investigating device biases is not.[8]

This raises questions about whether thermometry measurements may inadvertently overlook hypothermia, fever, or sepsis cases, potentially leading to delayed diagnoses and ultimately exacerbating poorer outcomes among vulnerable subpopulations.[9]

The main purpose of this study is to provide a standardized dataset with paired temperatures and time-aligned patient contextualization information. Since existing Electronic Health Record (EHR) databases require extensive transformation and analysis, this dataset lowers the barrier to entry for researchers.[10,11] Users can easily change the available code and add information to the current dataset to fulfill their research requirements. A comprehensive walkthrough of the extraction, processing, and analysis of temperature differences between contact and non-contact thermometers is included to validate the dataset's clinical relevance.

The study harmonizes three EHR databases: MIMIC-IV, eICU-CRD-1, and eICU-CRD-2. MIMIC-IV extends MIMIC-III to include patients admitted to the ICU from 2008 to 2022 and holds approximately 80,000 medical records.[10] The eICU-CRD-1 database is a multi-center database collected through the Philips Healthcare eICU Telehealth Program, containing over 200,000 admissions from 208 hospitals or ICUs from eICU programs across the United States between 2014-2015.[11] The eICU-CRD-2 database expands on eICU-CRD-1 to include patients from 2019-2022. All records are de-identified.

Because the dataset is publicly available, researchers can develop models to expand on previous studies and address racial and ethnic disparities.

Methods

TherLiD is a derived dataset from MIMIC-IV, eICU-CRD-1, and eICU-CRD-2 databases. It was created with the following steps and inspired by BOLD, a blood-gas and oximetry-linked dataset [12]:

Temperature mapping: In the eICU-CRD-1 and eICU-CRD-2 databases, temperature location labels exhibit variations of the same label. To address this, a dictionary was created to standardize these labels, ensuring consistency for accurate data pairing.
Temperature pairing: Reading of reference (contact thermometers: oral, core, and rectal) and IR (temporal) temperature values were paired when both measurements occurred within a 1-hour time window, without requiring one to occur before the other. To achieve this, timestamps were derived from offsets in the database that have already been shifted in both MIMIC and eICU databases, with a base date determined by the earliest year referenced in the database (e.g., 2014 for eICU-CRD-1). These dates are shifted and do not reflect the actual day of measurement, ensuring compliance with HIPAA regulations regarding protected health information. Only values ranging from 30°C to 45°C were considered, and missing temperatures or their timestamps and measurement sites were not allowed. In the MIMIC-IV database, temperature pairs are always matched while the patient is in the ICU. However, in the eICU databases, temperature pairs can be created even if the patient is admitted to the hospital but not currently in the ICU.
Patient contextualization: Individual characteristics and time-varying data were aligned with the temperature pairs using the later timestamp of the two paired temperature measurements to ensure complete information about the patient. Patient admission characteristics, temperature measurement details, vital signs, laboratory values, arterial blood gas metrics, and Sequential Organ Failure Assessment (SOFA) scores were recorded.
Integration of the three databases: To maintain consistency across all databases, we standardized key variables by aligning them to a common format (e.g., mapping variations of hospital identifiers like 'hospital_id' and 'hamd_id' to a unified variable). Only variables that were present in all databases were retained for analysis, which helped minimize missing data and ensure uniformity. The race and ethnicity original information was mapped to one of the following categories: "American Indian / Alaska Native", "Asian", "Black", "Hispanic or Latino", "More Than One Race", "Native Hawaiian / Pacific Islander", "Unknown", and "White", Finally, the standardized databases were vertically concatenated.

The open-source code that comes along with this methodology can be used as the basis for projects with different requirements.[18]

Data Description

Variables can be included in one of the following categories: patient admission characteristics, temperature measurement and pair information, vital signs, laboratory values, and SOFA scores.

A "last_temp_datetime" timestamp is derived from the two paired timestamps, representing the latest timestamp between the pair, to use when retrieving sociodemographic and clinical variables. Columns with a "delta_" prefix contain information about the time difference between the temperature and variable measurements.

Sample Size: The final dataset includes 13,251 pairs, representing 8,511 patients. Out of these pairs, 1,552 pairs (11.7%) were sourced from MIMIC-IV, 7,307 pairs (55.1%) from eICU-CRD-1, and 4,392 pairs (33.2%) from eICU-CRD-2. Similarly, 916 patients (10.8%) came from MIMIC-IV, 4,853 patients (57.0%) from eICU-CRD-1, and 2,742 patients (32.2%) from eICU-CRD-2. Note that there are more pairs than patients, as a single patient can have multiple hospital stays.

Identifiers: Each row of the dataset has three identifiers, at different levels: patient, hospital, and ICU admission. The original identifiers are kept to allow linking the data with the original databases and eventually pull other variables of interest. However, to avoid overlap among the databases, we created new, unique identifiers with the prefix "unique_" for our dataset that reflect each of these three identifiers. Each encounter also has an identifier to reflect the source database.

Temperature measurement and pair information: This contains the timestamp, value, and site of a temperature measurement. The columns showing the suffix "_temporal" represent the IR-based thermometry values and timestamps while the columns showing the suffix "_reference" represent the reference temperature and values. The pair combination is displayed in a column for additional information, indicating "temporal-oral," "temporal-rectal," and "temporal-core." A "last_temp_datetime" timestamp is derived from the two paired timestamps, representing the latest timestamp between the pair, to use when retrieving sociodemographic and clinical variables. For patients with multiple pairs, pair identifiers are used with a numerical suffix (e.g. "-1") in ascending order based on the "last_temp_datetime" timestamp.

Unlike MIMIC-IV, eICU-CRD-1 and eICU-CRD-2 contain information about several hospitals. Each one of them has a unique identifier and other hospital-related information. To ensure consistency, MIMIC-IV data was given the hospital index 9999 (does not correspond to any eICU hospital identifier), the number of beds as ">= 500", the US region as "Northeast", and the teaching status as "True".

Patient and admission characteristics: Contains information about age, sex, weight, height, BMI, race and ethnicity, comorbidities, in-hospital mortality, and length of stay of each hospital and ICU admission. Ages higher than 90 were replaced by 90. BMI at admission time was calculated with the respective weight and height. All databases use the Charlson Comorbidity Index. Admission age was unified, with age between 18-89 directly evaluated.

Vital signs: Heart and respiratory rates; systolic, diastolic and mean blood pressure (both invasive and non-invasive); and blood oxygen saturation levels (SpO2) were extracted. Temperature values were recorded in the "Temperature Measurement Information" section, where the patient's temperature was assessed within a specific time window, occurring X hours prior to the most recent paired measurement timestamp ("last_temp_datetime"). These variables can be easily identified by the prefix "vitals_". They were pulled from the chartevents and nursecharting tables of the original MIMIC and eICU databases, respectively.

Laboratory Test Values: These values are crucial for characterizing each patient's health status and were categorized as follows:

Complete Blood Count (prefix: "cbc_"):
- Hemoglobin
- Hematocrit
- Mean Corpuscular Hemoglobin (MCH)
- Mean Corpuscular Hemoglobin Concentration (MCHC)
- Mean corpuscular volume (MCV)
- Platelet
- Red Blood Cells (RBC)
- Red Cell Distribution Width (RDW)
- White Blood Cells (WBC)
- Coagulation (prefix: "coag_"):
  - Fibrinogen
  - International Normalized Ratio (INR)
  - Prothrombin Time (PT)
  - Partial Thromboplastin Time (PTT)
  - Basic Metabolic Panel (prefix: "bmp_"):
    - Sodium
    - Potassium
    - Chloride
    - Bicarbonate
    - Blood Urea Nitrogen (BUN)
    - Creatinine
    - Glucose
    - Anion Gap
    - Calcium
    - Lactate
    - Hepatic Function Panel (prefix: "hfp_"):
      - Alanine Aminotransferase (ALT)
      - Alkaline Phosphatase (ALP)
      - Aspartate Aminotransferase (AST)
      - Total Bilirubin
      - Direct Bilirubin
      - Albumin
      - Other Enzymes (prefix: "other_"):
        
        Creatine Kinase (CK/CPK)
        
        Creatine Kinase-MB (CK-MB)
        
        Lactate Dehydrogenase (LD/LDH)
        
        Arterial blood gas (prefix: "abg_"):
        
        Carboxyhemoglobin
        
        Methemoglobin
        
        SaO2
        
        pH
        
        paCO2
        
        paO2

Data were collected from the labevents and labs tables of the original MIMIC and eICU databases, respectively.

SOFA scores: SOFA score describes organ dysfunction and allows us to quantify patient morbidity. Coagulation, liver, cardiovascular, central nervous system, renal, respiration, and global scores were extracted from the pivoted_sofa, sofa, and derived tables of the MIMIC-IV and eICU databases, respectively. These selected values correspond to the scores one hour previous to the "last_temp_datetime"timestamp (with "sofa_past_" prefix) or one hour after that time (with "sofa_future_" prefix). SOFA day one scores have also been added and correspond to a patient staying in the ICU for a full 24 hours.

Information is stored in different ways in each database. The methodology steps previously described can now be carried out with the code in file 1_TemperatureDataset.ipynb. The data description can be made with the files 2_ConsortDiagrams.ipynb and 3_TableOne.ipynb, which plot the consort diagram of the dataset and generate its descriptive table, respectively. An example notebook running through a simple use-case can be found in 6_Example.ipynb.

Usage Notes

The code needed to generate the derived database is made available.[18] BigQuery (SQL standard) through Google Colaboratory (Python 3.10) was used for data extraction. Users must be credentialed on PhysioNet, have completed the CITI Data or Specimens Only Research, have signed the data use agreement of each database to access them, and have created their own project on BigQuery.

The final data was stored in a single comma-separated value file (CSV). It can be recreated by running 1_TemperatureDataset.ipynb notebook. 2_ConsortDiagrams.ipynb and 3_TableOne.ipynb notebooks can be used for data description and analysis. 6_Example runs through an example of loading the dataset and making a predictive model.

The final dataset removes a significant barrier to entry for researchers in the field of critical care data science. Importantly, it sets the stage for researchers to find innovative and data-driven approaches to improve outcomes for critically ill patients. We have curated 13,251 paired IR thermometer (temporal) readings and reference standard thermometers (rectal, oral, and core) under strict, clinically relevant real-world conditions. Its development required collaboration from specialized, multidisciplinary teams of data scientists and clinicians skilled in navigating complex electronic health record (EHR) databases from various healthcare systems. With its public release, we hope the dataset will serve as a valuable resource for future generations of researchers and trainees.

A key limitation of the dataset lies in its timing constraints. The sample size is restricted by the 60-minute time window established by Bhavani et al.[1] Researchers and clinicians may need to adjust based on their specific requirements to ensure reliable insights.[13] Additionally, a broader limitation of the EHR data, reflected in the preprocessed dataset, is the lack of objective information on patients' skin tone and the manufacturer or model of the thermometer devices used. These gaps limit the scope of future research opportunities. Furthermore, there may be impacts from undocumented confounders in clinical practice, such as heating solutions and warm blankets, that may affect the assessment of surface temperatures. Studies have also demonstrated that skin tone can influence the accuracy of IR thermometers, revealing biases in certain medical devices.[14,15] Finally, thermometer accuracy varies significantly across different models and manufacturers, which can impact the reliability of temperature readings.[16,17] Without detailed information on these factors, it is challenging to fully assess and address potential biases in temperature measurements.

Ethics

The use of the data in this research came from MIMIC-IV, eICU-CRD-1, and eICU-CRD-2: all fully de-identified databases (containing no protected health information) that we received permission for use under a PhysioNet Credentialed Health Data Use Agreement (v1.5.0). The study was determined to be exempt from human subjects research. All experiments need to follow the PhysioNet Credentialed Health Data License Agreement. Medical charting by providers in the electronic health record is at-risk for multiple types of bias.

Conflicts of Interest

AIW holds equity and management roles in Ataia Medical. All other authors report no conflicts of interest.

References

Bhavani SV, Wiley Z, Verhoef PA, Coopersmith CM, Ofotokun I. Racial Differences in Detection of Fever Using Temporal vs Oral Temperature Measurements in Hospitalized Patients. JAMA 2022;328:885–6. https://doi.org/10.1001/jama.2022.12290.
Sjoding MW, Dickson RP, Iwashyna TJ, Gay SE, Valley TS. Racial Bias in Pulse Oximetry Measurement. New England Journal of Medicine 2020;383:2477–8. https://doi.org/10.1056/NEJMc2029240.
Brunsgaard EK, Jensen J, Grossman D. Melanoma in skin of color: Part II. Racial disparities, role of UV, and interventions for earlier detection. Journal of the American Academy of Dermatology 2023;89:459–68. https://doi.org/10.1016/j.jaad.2022.04.057.
Barnato AE, Alexander SL, Linde-Zwirble WT, Angus DC. Racial variation in the incidence, care, and outcomes of severe sepsis: Analysis of population, patient, and hospital characteristics. American Journal of Respiratory and Critical Care Medicine 2008;177:279–84. https://doi.org/10.1164/rccm.200703-480OC.
Black LP, Hopson C, Puskarich MA, Modave F, Booker SQ, DeVos E, et al. Racial disparities in septic shock mortality: A retrospective cohort study. The Lancet Regional Health -- Americas 2023;29:100646. https://doi.org/10.1016/j.lana.2023.100646.
DiMeglio M, Dubensky J, Schadt S, Potdar R, Laudanski K. Factors underlying racial disparities in sepsis management. Healthcare (Basel) 2018;6:133. https://doi.org/10.3390/healthcare6040133.
Ko R-E, Suh GY. Factors underlying racial and gender disparities in sepsis management. In: Borges M, Hidalgo J, Perez-Fernandez J, editors. The Sepsis Codex, Elsevier; 2023, p. 247–55. https://doi.org/10.1016/B978-0-323-88271-2.00035-3.
Charpignon ML, Byers J, Cabral S, Celi LA, Fernandes C, Gallifant J, et al. Critical bias in critical care devices. Critical Care Clinics 2023;39:795–813. https://doi.org/10.1016/j.ccc.2023.02.005.
Wong AI, Charpignon M, Kim H, Others. Analysis of Discrepancies Between Pulse Oximetry and Arterial Oxygen Saturation Measurements by Race and Ethnicity and Association With Organ Dysfunction and Mortality. JAMA Network Open 2021;4:e2131674. https://doi.org/10.1001/jamanetworkopen.2021.31674.
Johnson A, Bulgarelli L, Pollard T, Horng S, Celi LA, Mark R. MIMIC-IV (version 2.2) 2023. https://doi.org/10.13026/6mm1-ek67.
Pollard T, Johnson A, Raffa J, Celi LA, Badawi O, Mark R. eICU Collaborative Research Database (version 2.0). PhysioNet. 2019. Available from: https://doi.org/10.13026/C2WM1R.
Matos J, Struja T, Gallifant J, Nakayama L, Charpignon M-L, Liu X, et al. BOLD: Blood-gas and Oximetry Linked Dataset. Scientific Data 2024;11:535. https://doi.org/10.1038/s41597-024-03225-z.
Guo C, Lu M, Chen J. An evaluation of time series summary statistics as features for clinical prediction tasks. BMC Med Inform Decis Mak 2020;20:48. https://doi.org/10.1186/s12911-020-1063-x.
Hao S, Dempsey K, Matos J, Cox CE, Rotemberg V, Gichoya JW, et al. Utility of skin tone on pulse oximetry in critically ill patients: A prospective cohort study. Crit Care Explor 2024;6:e1133. https://doi.org/10.1097/CCE.0000000000001133.
Adams S, Bucknall T, Kouzani A. A Study on the Agreement of Body Temperatures Measured by Infrared Cameras and Oral Thermometry. Res Sq 2020. https://doi.org/10.21203/rs.3.rs-123101/v1.
Mah AJ, Ghazi Zadeh L, Khoshnam Tehrani M, Askari S, Gandjbakhche AH, Shadgan B. Studying the accuracy and function of different thermometry techniques for measuring body temperature. Biology (Basel) 2021;10:1327. https://doi.org/10.3390/biology10121327.
Allyn W. n.d. https://www.hillrom.eu/content/dam/hillrom-aem/us/en/marketing/knowledge/content-marketing/case-studies/SM2557-EN-rA_SureTemp-Plus_Accuracy_Case-Study_LR.pdf. Accessed 4 November 2024.
TherLiD: a thermometry linked dataset". GitHub, https://github.com/joamats/therlid. Accessed 25 September 2024.