Database Contributor Review

Salzburg Intensive Care database (SICdb), a freely accessible intensive care database

Niklas Rodemund Andreas Kokoefer Bernhard Wernly Crispiana Cozowicz

Published: Sept. 10, 2024. Version: 1.0.8


When using this resource, please cite: (show more options)
Rodemund, N., Kokoefer, A., Wernly, B., & Cozowicz, C. (2024). Salzburg Intensive Care database (SICdb), a freely accessible intensive care database (version 1.0.8). PhysioNet. https://doi.org/10.13026/8m72-6j83.

Additionally, please cite the original publication:

Rodemund, N., Wernly, B., Jung, C. et al. Harnessing Big Data in Critical Care: Exploring a new European Dataset. Sci Data 11, 320 (2024). https://doi.org/10.1038/s41597-024-03164-9

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

Abstract

The SICdb dataset offers insights into over 27 thousand intensive care admissions, including therapies and data on preceding surgeries. Data were collected between 2013 and 2021 from four different intensive care units at the University Hospital Salzburg, having more than 3 thousand intensive care admissions per year on 41 beds. The dataset is deidentified and contains, amongst others, case information, vital signs, laboratory results and medication data. SICdb provides both aggregated once-per-hour and highly granular once-per-minute data, making it suitable for computational and machine learning-based research.


Background

Over the past decade, the use of advanced statistical methodology and artificial intelligence in data-driven research has become increasingly popular in many areas of medicine, particularly in critical care [1-2]. Medical datasets can be utilized to develop, validate, and improve machine learning models and algorithms. Major application areas for machine learning in healthcare include risk stratification, prediction of mortality, early detection of sepsis and septic shock, cardiac event prediction and acute kidney injury prediction, among others [3]. While numerous openly available datasets have been published to support research [4-7] and their myriad benefits [8], the availability of high-resolution data remains limited. To address this gap, we present SICdb, a new, highly granular, real-world dataset that makes more medical data publicly available. The data were collected at the University Hospital Salzburg in Austria, a tertiary care center with 58 intensive and intermediate care beds. The data from 41 of these beds have been processed into a high-resolution dataset, including vital signs, scores, laboratory results, and medication data, among others.


Methods

The dataset includes data collected with MetaVision patient data management software (iMDSoft, Tel Aviv, Israel). Exports from ORBIS (Dedalus Healthcare GmbH, Bonn, Germany), containing ICD-10 codes and in-hospital mortality, were also merged into the dataset.

We developed software to export, deidentify, and process the data obtained from MetaVision (iMDSoft, Tel Aviv, Israel). The exporting and processing procedures are repeatable and allow for incremental updates. The data tables were restructured to facilitate data analysis. The raw signal data, which contains over 1.5 billion data points, were reorganized to enable sharing of the data while maintaining its high resolution of once-per-minute. Time-related data, other than the admission year, were removed, and geographical information was omitted to increase anonymity. Additional datasets from other sources, like ORBIS and governmental mortality data, were matched using a lookup database. Only deidentified data were processed into the final database. The SICdb software provides data in multiple formats and prepares it for export to various statistical software solutions.

All data are anonymized as defined by the European General Data Protection Regulation, Article 4(5). (The European Parliament, 2016) The deidentification strategy additionally complies with the US regulations for health data, HIPAA (“Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule”). SICdb is fully approved by the local ethical commission of the Land Salzburg, Austria (EK Nr: 1115/2021).


Data Description

The dataset, version 1.0.7, includes data from more than 27350 admissions to the Department of Anesthesiology and Intensive Care Medicine at the General Hospital Salzburg and Paracelsus Medical University.

Data Tables

The SICdb dataset consists of billions of data entries across 7 data tables. The main table cases contains a single entry for each intensive care admission and includes information about the patient (such as age, weight, and sex) and case details (such as diagnosis, scores, and ICD10 codes). The TimeOfStay field indicates the time from the first admission to a Metavision-enabled ward to the final closing of the case, including any preceding surgery. Personal data such as age, weight, and height have been grouped into bins of 5, with ages over 90 placed in the final bin. The OffsetOfDeath field indicates survival in seconds from admission and is capped at one year, thus representing on-year mortality. Mortality data has been obtained from multiple sources including governmental mortality data and therefore includes out-of-hospital deaths.

All other data tables are related to the cases table through the CaseID field. Most data has timing information, with the offset field indicating the number of seconds from admission to the time of the event. The laboratory table contains laboratory values and the medication table provides data on administered drugs. There are several generic data tables that contain data sorted by type. The data_ref table contains additional nominal/categorical data, one entry per admission, and the data_range table documents items with a start and end time, such as data on central lines or drainages. The data_float_h table contains float data, aggregated once per hour, and includes most signal data. To reduce table size, minute data is serialized as a stream of IEEE 754 floats in the data_float_h.rawdata field. For further instructions on using this data, see the documentation found in Documentation.pdf or online [7], and refer to the unpacking script example provided on our GitHub repository [8].

Reference Table

Nominal data are encoded, and the reference table d_references provides additional information about the associated field. The referenced fields in all data tables correspond to the primary key of the d_references table, ReferenceGlobalID. The ReferenceValue field in d_references gives the variable's value, and the ReferenceUnit field holds the unit of measurement, if applicable.

Data Format

GZip-compressed RFC 4180 comma-separated files are provided. The most current documentation, including table schemas, can be found online [8], and an offline copy is included in the files under the name Documentation.pdf. A GitHub repository has been created to share code, report issues, and discuss the dataset [9].


Usage Notes

To access SICdb (1.0.7) contributors’ approval is required. Additionally, it is required to be a credentialed PhysioNet user and request access providing a specific research question. Furthermore, a data use agreement has to be accepted. We would like to remind all users that this database contains sensitive information related to the clinical care of patients. As such, it must be treated with the utmost care and respect. Any attempt to identify individual patients using this database is illegal.

Documentation for the dataset is available in Documentation.pdf and online, containing a table schema and detailed descriptions of the fields and data. The most up-to-date information can be found on our website [8]. Additionally, we have created a repository on GitHub to share code and discuss the dataset [9].

Additional Notes

SICdb is a real-world dataset created from data automatically collected in clinical practice. As such, it may contain implausible values.

Due to technical limitations, only 41 of the 58 beds are included in the SICdb dataset. Most notably this excludes a large part of internal medicine patients. This leads to a relative overrepresentation of perioperative cases, explaining the overall low mortality in the dataset.


Release Notes

SICdb v1.0.8

SICdb v1.0.8 was released in September, 2024.

  • Mapped LOINC codes to laboratory references for better standardization and interoperability
  • Introduced `d_references`.`LOINC_code`
  • Introduced `d_references`.`LOINC_short` field representing the abbreviated LOINC text representation.
  • Introduced `d_references`.`LOINC_long` field representing the full LOINC text representation.

SICdb v1.0.7

SICdb v1.0.7 was released in April, 2024.

  • Added field `cases`.`HospitalDischargeDay` `HospitalStayDays` representing the day of release from hospital after admission and the full length of hospital stay, respectively.
  • Added field `cases`.`AdmissionUrgency`, depicting the urgency of admission
  • Added High Flow (HFNC) therapy data
  • Added Richmond Agitation-Sedation Scale (RASS) score
  • Added Numeric Rating Scale (NRS-11)
  • Added SOFA Score
  • Removed 36 invalid cases
  • Recalculated field `cases`.`OffsetAfterFirstAdmission`, fixing an issue that occasionally led to invalid values

SICdb v1.0.6

SICdb v1.0.6 was released in June, 2023.

  • Fixed the invalid mapping of the ecg heart rate signal in data_float_h.csv where DataID = 707.
  • Fixed several invalid weight/height values.
  • Added `data_ref`.`FieldID` indicating the name of the referenced field, related to `d_references`.`ReferenceGlobalID`. In older versions this was indicated in another file, which was removed to reduce file count. Examples are "PreconditionDiabetes", "PreconditionArtHypertension", "PreconditionLungDisease", and "PreconditionRenalDysfunction".
  • Added KDIGO_AKI_168, indicating the stage of AKI as defined by KDIGO Clinical Practice Guideline [10] in the first 168 hours of stay. The algorithm has been published in our repository [9].
  • Added column `cases`.`ICUOffset`, indicating time in seconds of first ICU bed assignment for better comparability with other ICU datasets.
  • Added more signal data to data_float_h.csv.

Ethics

SICdb is fully approved by the local ethical commission of the Land Salzburg, Austria. (EK Nr: 1115/2021). Requirement for individual patient consent was waived because the project did not impact clinical care and all protected health information was deidentified.


Conflicts of Interest

The authors declare that they have no competing interests.


References

  1. Paul Elbers, MD, PhD E. Amsterdam Medical Data Science. https://amsterdammedicaldatascience.nl/#amsterdamumcdb. [Accessed May 14, 2021.]
  2. Alberto IR, Alberto NR, Ghosh AK, Jain B, Jayakumar S, Martinez-Martin N, McCague N, Moukheiber D, Moukheiber L, Moukheiber M, Moukheiber S. The impact of commercial health datasets on medical research and health-care algorithms. The Lancet Digital Health. 2023 May 1;5(5):e288-94.
  3. Online documentation - https://www.sicdb.com/Documentation/ [Accessed June 6th 2022]
  4. GitHub Repository - https://github.com/nrodemund/sicdb [Accessed June 6th 2022]
  5. Khwaja A. KDIGO clinical practice guidelines for acute kidney injury. Nephron Clin Pract. 2012;120(4):c179-84. doi: 10.1159/000339789. Epub 2012 Aug 7. PMID: 22890468.
  6. Johnson AEW, Pollard TJ, Shen L, et al. MIMIC-III, a freely accessible critical care database. Sci data. 2016;3:160035. doi:10.1038/sdata.2016.35
  7. Faltys M, Zimmermann M, Lyu X, Hüser M, Hyland S, Rätsch G, Merz T. HiRID, a high time-resolution ICU dataset (version 1.1.1). PhysioNet. 2021. doi:/10.13026/nkwc-js72.
  8. Celi LA, Mark RG, Stone DJ, Montgomery RA. “Big data” in the intensive care unit. Closing the data loop. Am J Respir Crit Care Med. 2013;187(11):1157-1160. doi:10.1164/rccm.201212-2311ED
  9. Cooke CR, Iwashyna TJ. Using existing data to address important clinical questions in critical care. Crit Care Med. 2013;41(3):886-896. doi:10.1097/CCM.0b013e31827bfc3c
  10. Syed M, Syed S, Sexton K, Syeda HB, Garza M, Zozus M, Syed F, Begum S, Syed AU, Sanford J, Prior F. Application of Machine Learning in Intensive Care Unit (ICU) Settings Using MIMIC Dataset: Systematic Review. Informatics (MDPI). 2021 Mar;8(1):16. doi: 10.3390/informatics8010016. Epub 2021 Mar 3. PMID: 33981592; PMCID: PMC8112729.
  11. Pollard TJ, Johnson AEW, Raffa JD, Celi LA, Mark RG, Badawi O. The eICU Collaborative Research Database, a freely available multi-center database for critical care research. Sci data. 2018;5:180178. doi:10.1038/sdata.2018.178

Share
Access

Access Policy:
Only credentialed users who sign the DUA can access the files. In addition, users must have individual studies reviewed by the contributor.

License (for files):
PhysioNet Contributor Review Health Data License 1.5.0

Data Use Agreement:
PhysioNet Contributor Review Health Data Use Agreement 1.5.0

Required training:
CITI Data or Specimens Only Research

Corresponding Author
You must be logged in to view the contact information.
Versions

Files