Database Credentialed Access
GOSSIS-1-eICU, the eICU-CRD subset of the Global Open Source Severity of Illness Score (GOSSIS-1) dataset
Jesse Raffa , Alistair Johnson , Tom Pollard , Omar Badawi
Published: July 20, 2022. Version: 1.0.0
When using this resource, please cite:
(show more options)
Raffa, J., Johnson, A., Pollard, T., & Badawi, O. (2022). GOSSIS-1-eICU, the eICU-CRD subset of the Global Open Source Severity of Illness Score (GOSSIS-1) dataset (version 1.0.0). PhysioNet. https://doi.org/10.13026/gbmg-a531.
Please include the standard citation for PhysioNet:
(show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
Abstract
GOSSIS-1 is a modern, free, open-source in-hospital mortality prediction algorithm for critical care patients, achieving excellent discrimination and calibration across three countries (Australia, New Zealand and the USA). GOSSIS-1 was developed on two large datasets of critical care patients. This project contains the USA subset of patients derived from the eICU Collaborative Research Database (eICU-CRD). The dataset, which we call GOSSIS-1-eICU, consists of 131,051 unique patients from 204 hospitals from ICU admissions discharged in 2014-15. The code to create the dataset from eICU-CRD and generate GOSSIS-1 predictions are also available. This project contains: 1) the derived dataset from eICU-CRD, 2) the dataset with required missing data imputed and 3) the GOSSIS-1 in-hospital predictions (probabilities. The patientunitstayid
and hospitalid
eICU-CRD identifiers are included to allowing linking back to eICU-CRD. Training and test sets are identified to allow for direct comparisons of performance.
Background
GOSSIS-1 [1] was developed as the first version of a series of global open-source severity of illness scores by the GOSSIS consortium [2]. The consortium aims create a database of critical care datasets from ICUs around the globe and to use these datasets to develop a family of open-source scoring systems for assessing the severity of illness of critical care patients internationally. GOSSIS-1 was developed using data from two well-known datasets consisting of data from Australia and New Zealand via the ANZICS-APD dataset [3], and the USA via the eICU-CRD dataset [4]. The GOSSIS-1 model achieved high discrimination and calibration in all countries and relevant subsets [1]. This project contains the USA subset of data that was used to train the GOSSIS-1 model. The data originates from eICU-CRD and we call this dataset, GOSSIS-1-eICU.
Methods
The GOSSIS-1-eICU data were extracted from the eICU-CRD database [4]. The eICU-CRD is a relational database consisting of about 200,000 ICU admissions from over 200 hospitals throughout the USA. Importantly, the GOSSIS-1-eICU data consists of critical care admissions from 2014-15, where the length of ICU stay was >6 hours. Data, including physiologic and vital signs were collected from the first 24 hours of the ICU stay. Readmissions to the ICU, patients <16 years old, and those with a missing outcome or with no heart rate recorded were excluded. The code [5] used to extract the GOSSIS-1-eICU dataset from eICU-CRD is available on GitHub. Further details about the extraction can be found in the GOSSIS-1 paper [1] and the paper’s supplementary materials. Further details about eICU-CRD can be found in its data description and on the eICU-CRD website [6].
Data Description
This project contains three data files:
gossis-1-eicu-only.csv.gz
gossis-1-eicu-only-model-ready.csv.gz
gossis-1-eicu-predictions.csv.gz
Each dataset includes the patientunitstayid
identifier which allows linking back to eICU-CRD. The datasets have 131,051 rows containing data (corresponding to the number of admissions), along with a header row.
The first file, gossis-1-eicu-only.csv.gz
, is a gzip compressed CSV file containing the features and outcomes from eICU-CRD patients used to train the GOSSIS-1 model. Each row of the CSV file corresponds with one ICU admission. The file includes a header specifying the variable names. A data specification file, variable-definitions.yaml
is also included, specifying the valid values, ranges, along with short descriptions of each variable by their name. The same information is largely contained in Supplementary Tables 1 and 2 of the GOSSIS-1 paper (1). For the diagnosis variables, apache_3j_bodysystem
, apache_2_bodysystem
, apache_3j_diagnosis
and apache_2_diagnosis
there is a mapping file to define each of the codes called apache_diagnosis_map.csv
. This dataset has minimal data cleaning and contains 216 columns.
The second file, gossis-1-eicu-only-model-ready.csv.gz
is a gzip compressed CSV file containing the specific features and outcomes from eICU-CRD patients used to train the GOSSIS-1 model after preprocessing, and imputation. Importantly, most physiological *_apache
variables have been excluded, the Glasgow Coma Scale variables, ventilated_apache
and intubated_apache
have been transformed, and all d1_*_min
and d1_*_max
variables have been transposed into the midpoint (d1_*_avg
) and range (d1_*_diff
). We have also indicated through the partition
variable whether the admission was in the training set (70%) or test set (30%).
The last file, gossis-1-eicu-predictions.csv.gz
, is a gzip compressed CSV file, containing only two columns – patientunitstayid
and gossis1_ihm_pred
corresponding with the eICU-CRD identifier and the GOSSIS-1 in-hospital mortality predictions (probabilities). Please note, this dataset contains both training and test set patients. In the 39,318 test set patients, we have reported an AUROC of 0.904 (0.900–0.909), SMR of 0.992 (0.959–1.024) and a Brier score of 0.055. You can find the R package, rGOSSIS1, which generates GOSSIS-1 predictions on GitHub [7].
Usage Notes
This project contains the USA subset of data that was used to train the GOSSIS-1 model, as described in the Global Open Source Severity of Illness Score (GOSSIS) publication [1]. The data originates from eICU-CRD and we call this dataset, GOSSIS-1-eICU. All data should be handled as if it were eICU-CRD data and is covered under the same terms of use. In particular, data cannot be shared with non-approved users.
gossis-1-eicu-only.csv.gz
: Generation of this dataset can be accomplished using code from the GOSSIS GitHub repository [7]. The dataset contains missing data, several patient outcomes, and demographic variables used to assess model performance in subset (e.g., ethnicity) which were not used in the GOSSIS-1 model itself. This dataset is also available to approved users on BigQuery under the name gossis1_eicu_raw
.
gossis-1-eicu-only-model-ready.csv.gz
: This dataset is derived from gossis-1-eicu-only.csv.gz
, after running the preprocess_data
, impute_data
(using algorithm 3) and prepare_fit
functions in the rGOSSIS1 package [7]. Extraneous columns which are not used in GOSSIS-1 predictions have been removed. This dataset can be fed into the GOSSIS-1 prediction function (gpredict
), or used to fit a new model. This dataset is also available to approved users on BigQuery under the name gossis1_eicu_predvar
.
gossis-1-eicu-predictions.csv.gz
: This dataset is derived from running the gpredict
function on gossis-1-eicu-only-model-ready.csv.gz
. This dataset may be suitable for performance comparisons to other models. Alternatively, gossis1_ihm_pred
can be used as one would currently use the APACHE IVa in hospital mortality prediction column, predictedhospitalmortality
, in the apachepatientresult
table currently in eICU-CRD. This dataset is also available to approved users on BigQuery under the names gossis1_eicu_ihmp_pred
.
Release Notes
Version 1.0.0: This initial release corresponds with the publication [1] of “The Global Open Source Severity of Illness Score (GOSSIS)” in Critical Care Medicine.
Ethics
This dataset is entirely derived from eICU-CRD. Within eICU-CRD, all tables are deidentified to meet the safe harbor provision of the US Health Insurance Portability and Accountability Act (HIPAA). These provisions include the removal of all protected health information. Hospital and unit identifiers have also been removed to protect the privacy of contributing organizations. The schema was established in collaboration with Privacert (Cambridge, MA), who certified the re-identification risk as meeting safe harbor standards (HIPAA Certification no. 1031219-2).
Acknowledgements
We wish to thank the GOSSIS consortium, Philips and ANZICS for all their help in developing these datasets and GOSSIS-1.
Conflicts of Interest
The authors have no conflicts of interest to declare.
References
- Raffa JD, Johnson AEW, O’Brien Z, Pollard TJ, Mark RG, Celi LA, et al. The Global Open Source Severity of Illness Score (GOSSIS). Crit Care Med. :10.1097/CCM.0000000000005518.
- GOSSIS: Global Open Source Severity of Illness Score: International Benchmarking for Critical Care [Internet]. [cited 2022 Jun 24]. Available from: https://gossis.mit.edu/
- Stow PJ, Hart GK, Higlett T, George C, Herkes R, McWilliam D, et al. Development and implementation of a high-quality clinical database: the Australian and New Zealand Intensive Care Society Adult Patient Database. J Crit Care. 2006 Jun;21(2):133–41.
- Pollard TJ, Johnson AEW, Raffa JD, Celi LA, Mark RG, Badawi O. The eICU Collaborative Research Database, a freely available multi-center database for critical care research. Sci Data. 2018 Dec;5(1):180178.
- GOSSIS: The Global Open Source Severity of Illness Score [Internet]. MIT Laboratory for Computational Physiology; 2022 [cited 2022 Jun 24]. Available from: https://github.com/MIT-LCP/gossis
- eICU [Internet]. [cited 2022 Jun 23]. Available from: https://eicu-crd.mit.edu/about/eicu/
- Raffa JD. rGOSSIS1 [Internet]. 2020 [cited 2022 Jun 24]. Available from: https://github.com/jraffa/rGOSSIS1
Access
Access Policy:
Only credentialed users who sign the DUA can access the files.
License (for files):
PhysioNet Credentialed Health Data License 1.5.0
Data Use Agreement:
PhysioNet Credentialed Health Data Use Agreement 1.5.0
Required training:
CITI Data or Specimens Only Research
Discovery
DOI (version 1.0.0):
https://doi.org/10.13026/gbmg-a531
DOI (latest version):
https://doi.org/10.13026/drts-zb06
Topics:
icu
critical care
severity of illness
global
gossis
apache
mortality prediction
benchmarking
Project Website:
https://gossis.mit.edu/
Corresponding Author
Files
- be a credentialed user
- complete required training:
- CITI Data or Specimens Only Research You may submit your training here.
- sign the data use agreement for the project