Database Contributor Review
Chest Computed Tomography for patients with sepsis in the Emergency Department
Published: Oct. 28, 2024. Version: 1.0.0
When using this resource, please cite:
(show more options)
Jin, S., & Zhang, Z. (2024). Chest Computed Tomography for patients with sepsis in the Emergency Department (version 1.0.0). PhysioNet. https://doi.org/10.13026/zne5-qh18.
Please include the standard citation for PhysioNet:
(show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
Abstract
Sepsis is a systematic inflammatory response syndrome that can impact all vital organs. Lung is the most commonly involved organ that sepsis can cause lung injury. Lung injury can have a variety clinical presentations and can be captured by chest image studies. Computed tomography (CT) has a high resolution for lung parenchyma structure and provide valuable source of data to study structural changes of sepsis-induced lung injury. This dataset provide a high-resolution clinical tabular data and Chest CT for a total of 728 admissions with sepsis. Investigators from clinical medicine, computer vision and data scientist will find this database useful to study sepsis-induced lung injury.
Background
Sepsis is a systemic inflammatory response syndrome (SIRS) caused by infection, which is also a leading cause of mortality and morbidity for hospitalized patients [1] SIRS can lead to multiple organ dysfunction in susceptible patients, and the most frequently involved organs/systems include lung, kidney and circulation [2] While the inflammatory response and immune profile of sepsis have been extensively investigated [3,4], the integrated analysis of the structural changes of lung parenchyma and clinical features is rarely reported.
Computed tomography (CT) provides high spatial resolution of the structural changes of lung parenchyma in response to the SIRS [5] Sepsis-induced acute lung injury is a well-established form of lung involvement during sepsis, which is presented as bilateral infiltrates on Chest CT. However, more subtle changes not visible to human eyes might be ignored. With the development of computer vision and deep learning technologies, complicated features can be well represented and extracted from CT images [6] These features have been shown to provide valuable insights into disease prognosis, subtyping and medical decision-making [7–9] However, due to lack of well curated publicly available CT datasets for sepsis patients, studies exploring the lung CT radiomics are scarce. Thus, current study aims to establish a publicly available lung CT datasets, together with high granularity clinical tabular data. This dataset will arouse enthusiasm for studies on sepsis by integrating medical images and clinical tabular data.
Methods
Study setting and population
The study was conducted in Zhejiang Provincial People's Hospital, Zhejiang, China from January 2019 to December 2022. All sepsis patients admitted to the emergency department of the hospital were eligible. Sepsis was defined in accordance with the sepsis-3.0 criteria, which included suspected or documented infection plus an acute rise of SOFA greater than or equal to 2 points. The study was approved by the ethics committee of Zhejiang Provincial People's Hospital (approval number: 2023-397). The study was conducted in accordance with the Declaration of Helsinki.
Informed consent was waived as determined by the institutional review board, due to the retrospective design of the study.
Database structure and development
The database comprises two types of data. One is the clinical tabular data, which is distributed as comma-separated value (CSV) files that can be managed by any relational database language such as SQL. The other is the CT image data, which is distributed as NIfTI files with nii.gz suffix. The CT image files can be linked to the clinical data by the CT2hospitalID table. Each individual patient can be identified by a series number (patient_SN) with the combination of digits and letters such as “5810787d01cf52e6973eef9819b7d2ac”. The patient_SN is deidentified. Each unique hospital stay is denoted by a Hospital_ID with examples such as "337016968172517". The unique ICU stay can be identified by the HospitalTransfer table, which contains intrahospital transfer events. All tables are linked by Hospital_ID to identify sequential medical events during an individual hospital stay.
De-identification
The Health Insurance Portability and Accountability Act (HIPAA) is employed as the standard to conduct de-identification. All protected information such as addresses, date of birth, date of hospital admission, date of medical order, personal numbers (e.g. name, phone, social security, and hospital number), date of discharge, exact age on admission (age is discretized into bins) are removed. When creating the dataset, patients were randomly assigned a unique identifier (patient_SN and hospital_ID) and the original hospital identifiers were not retained. As a result, the identifiers in the tables cannot be linked back to the original, identifiable data. All identifiers related to doctors, nurses, and pharmacists have been removed to protect the privacy of contributing providers. The CT images do not contain PHI. The serialID used to match images and tabular data is the surrogate ID. Date-time variables/columns are de-identified by showing only days in reference to hospital admission. For date times in the free text, such as those in the CT reports, the year/date/time are replaced with "****". Texts containing locations and names are removed from the dataset.
Individual chest CT files are distributed as the NIfTI format, since the format is a popular file format for storing medical imaging data and is widely used in medical research and related fields. These CT files are converted from the original DICOM file by using the SimpleITK package (v2.2.0). if there are multiple series in a CT volume, the one containing chest CT is extracted. Patients can have multiple CT scans during hospital stay and all scans are curated in the database. There are 836 CT scans for 327 hospital admissions.
Data Description
The database comprises 728 hospital visits (i.e. including outpatient visits) for 337 unique admissions from January 2012 to December 2022. The database is available at the PhysioNet repository . Table 1 shows the baseline demographics of hospital admissions (outpatient visits are excluded). There are 103 female and 234 male patients in the dataset. The length of hospital days was 22 days (Q1 to Q3: 12 to 36). Male patients showed slightly longer hospital stay.
Table 1 Demographics and discharge status of the 337 hospital admissions in the database.
Variables |
Total (n = 337) |
Female (n = 103) |
Male (n = 234) |
p |
Age_cut, n (%) |
|
|
|
0.673 |
(0,18] |
2 (1) |
0 (0) |
2 (1) |
|
(18,30] |
8 (2) |
2 (2) |
6 (3) |
|
(30,40] |
10 (3) |
2 (2) |
8 (3) |
|
(40,50] |
23 (7) |
4 (4) |
19 (8) |
|
(50,60] |
41 (12) |
11 (11) |
30 (13) |
|
(60,70] |
87 (26) |
26 (25) |
61 (26) |
|
(70,80] |
95 (28) |
30 (29) |
65 (28) |
|
(80,90] |
52 (15) |
21 (20) |
31 (13) |
|
(90,150] |
19 (6) |
7 (7) |
12 (5) |
|
DaysHospitalStay, Median (Q1,Q3) |
22 (12, 36) |
20 (12, 30) |
22 (12.75, 38) |
0.224 |
StatusOnDischarge, n (%) |
|
|
|
0.427 |
Cured |
185 (56) |
55 (53) |
130 (57) |
|
Not cured |
121 (36) |
37 (36) |
84 (37) |
|
Unknown |
26 (8) |
11 (11) |
15 (7) |
|
Individual chest CT files are distributed as the NIfTI format, since the format is a popular file format for storing medical imaging data and is widely used in medical research and related fields. These CT files are converted from the original DICOM file by using the SimpleITK package (v2.2.0). if there are multiple series in a CT volume, the one containing chest CT is extracted. Patients can have multiple CT scans during hospital stay and all scans are curated in the database. There are 836 CT scans for 327 hospital admissions. There are many packages to handle such file type. For instance, the RNifti package can be utilized to manipulate and visualize the CT images.
Classes of data
The data are organized into two categories which are clinical tabular data and NIfTI files. The structure of clinical data is quite like the ones reported previously. To keep the content of this data descriptor intact, we describe these tables again in supplemental digital contents, with more focuses on their associations with lung CT and sepsis. There are a total of 14 tables comprising patient demographic data, serial ID of Chest CT image, medical order, laboratory findings, image studies, microbiology and hospital transfer events (as shown below).
- CT2hospitalID.csv: Map CT file name to the hospital ID
- Diagnosis.csv: Diagnosis
- DrugSens.csv: Sensitivity of pathogen to antibiotics for cultured bacteria
- ExamReport.csv: Examination report including CT, ultrasound and MRI
- HospitalTransfer.csv: intrahospital transfer events
- Lab_dictionary.csv: Dictionary for laboratory events
- Lab.csv: Laboratory findings
- Medication.csv: Medication events
- MedOrder.csv: Medical order
- MicrobiologyCulture.csv: Microbiology cuture
- NursingChart_IO.csv: Fluid Input and output
- NursingChart_VitalSign.csv: Vital Sign from Nursing chart
- PtAdmiTable.csv: Patient admission table
- VitalSign.csv: Vital signs
The first column gives the file names for each table. The MD5_hashes column gives the MD5 hashes. The MD5 (message-digest algorithm) is a cryptographic protocol used for authenticating messages as well as content verification and digital signatures. MD5 is based on a hash function that verifies that a file you sent matches the file received by the person you sent it to. The NumObs column describes the number of rows in each table.
The CT2hospitalID table
The CT2hospitalID table contains information corresponding the CT file names to the hospital ID. The serialID column gives the CT serial ID, which is also the file name in the CTImage folder. CTexame_DateTime gives the days offset by the hospital admission date time. Some numbers are negative in this column because some CT scans are performed in the emergency department before hospital admission. The patient_SN and Hospital_ID give the unique identifier for each patient and hospital admission. The STUDYRESULT column gives the description of the CT finding in text. The DIAGRESULT gives the impression of diagnosis reported by radiologists.
- serialID: CT serial ID corresponding to the file name in the CTImage folder
- CTexame_DateTime: The time of CT examination in relative to the hospital admission time
- patient_SN: Patient series number: unique to each individual subject
- Hospital_ID: unique to each hospital admission
- STUDYRESULT: Description of the CT finding in text
- DIAGRESULT: Diagnosis for the CT finding in text
Patient admission record table
The patient admission table records general information for patients and their hospital admissions. Note that each patient can have multiple hospital encounters including emergency, inpatient and outpatient visits. This information can be queried in the EncounterType column. The Med_history column contains large bulk of free text, the information is left in original Chinese and can be useful for the natural lannguage learning. The columns ended with _24hr suffix are for patients who discharged within 24 hours after admission, so these columns are empty for most admissions. Chief complain is important for one hospital admission and we provided both Chinese and English translations.
- patient_SN: Patient series number: unique to each individual subject
- Hospital_ID: unique to each hospital admission
- Sex: Sex: Female and Male
- EncounterType: Hospital encounter type: inpatient and outpatient
- ChiefComplain_24hr: Chief Complain for patients who discharged within 24 hours after hospital admission
- AdmissionStatus_24hr: Admission Status for patients who discharged within 24 hours after hospital admission
- ChiefComplain_24hr_dead: Chief Complain for patients who died within 24 hours after hospital admission
- AdmissionStatus_24hr_dead: Admission Status for patients who died within 24 hours after hospital admission
- ChiefComplain: Chief Complain in Chinese
- StatusOnDischarge: Status on hospital discharge
- DiagnosisOnDeath: Diagnosis On Death
- StatusOnDischarge_DESC: Status On Discharge described in text
- Discharge_DateTime: Discharge time relative to hospital admission time as the time zero in days
- DaysHospitalStay: Days of Hospital Stay
- ChiefComplain_Eng: Chief Complain in English
- Age_cut: Age is categorized into bins for confidentiality
- PastHistory: Past history/comorbidities
Diagnosis table
This table provides ICD diagnosis for each hospital admission. Both ICD-10 code and free text diagnosis are provided. There are multiple diagnoses for each hospitalization.
- patient_SN: Patient series number: unique to each individual subject
- Hospital_ID: unique to each hospital admission
- Diagnosis_DESC: Description of diagnosis in free text
- ICD10_code: ICD-10 code
- Diagnosis_DateTime: Time for making the diagnosis relative to hospital admission time as the time zero in days
- DiagnosisName: ICD-10 name for the diagnosis
Hospital transfer table
The hospital transfer table gives intrahospital transfer events. The departments and datetime of the transfer events are given. Because the emergency ICU (EICU) is in the emergency department, the department names denoted by “Emergency medical department” or “Emergency Department” refer to the EICU.
- patient_SN: Patient series number: unique to each individual subject
- Hospital_ID: unique to each hospital admission
- TransferIn_DateTime: The date time of transfer into a department, recorded in days relative to hospital admission
- TransferOut_DateTime: The date time of transfer out of a department, recorded in days relative to hospital admission
- TransferTo_Dept_Eng: The department a patient will arrive (transfer into)
- TransferFrom_Dept_Eng: The department a patient will leave (transfer out)
The Lab table
The laboratory table contains results of laboratory findings. Similar to previous tables, the laboratory items can be matched by Hospital_ID. There are two date time columns, Lab_DateTime and the LabSampleCollect_time. The former refers to the date time of report and the latter related to the sample collecting time.
- patient_SN: Patient series number: unique to each individual subject
- Hospital_ID: unique to each hospital admission
- Lab_category: Category of lab item
- Lab_DateTime: Time of lab in days relative to hospital admission
- Lab_results: Results of the lab finding
- Unit_measure: Unit of measurement
- LabSampleCollect_time: Sample collection time in days relative to hospital admission
- Lab_ItemName: Name of lab item
- Lab_SampleName: Sample name
Microbiology culture table
The MicrobiologyCulture table contains information related to microbiology culture results. Conventional information regarding blood sample, pathogens of culture results, culture time and description of microbiology culture are provided in the table. The MicrobiologyCulture_DESC_Eng provided free text information for the culture results. Some examples are: "Poor quality (WBC<10/LP, LEC>25/LP)", "No anaerobic bacteria growth after 5 days of culture". The microbiology information might be useful to explore heterogeneity of sepsis and related lung structural changes.
- patient_SN: Patient series number: unique to each individual subject
- Hospital_ID: unique to each hospital admission
- MicrobiologyCulture_Finding: Microbiology Culture finding
- MicrobiologyCulture_DateTime: Microbiology Culture time in days relative to hospital admission
- MicrobiologyCulture_sample_Eng: Microbiology Culture sample
- MicrobiologyCulture_Category_Eng: Microbiology Culture Category
- MicrobiologyCulture_DESC_Eng: Description of Microbiology Culture
Drug sensitivity table
The DrugSens table contains information related to the drug sensitivity of obtained bacteria. Conventional information including sample, microbiology, culture time, and drug name is available in the table. The negative and positive values in the DrugSens_result column refer to the results for Ultra broad spectrum β- Lactamase or D-test. This table may be useful to explore whether there are structural differences in lung CT between drug sensitive and insensitive pathogens.
patient_SN
Patient series number: unique to each individual subject
- Hospital_ID: unique to each hospital admission
- Drug_Code: Code of the drug for sensitivity analysis
- DrugSens_result: Results for Drug Sensitivity test
- MIC: Minimum inhibitory concentration
- DrugSens_DateTime: Time for the results relative to hospital admission time as the time zero in days
- Drug_name_Eng: Name of the tested drug
- DrugSens_Microbiology_Eng: Microorganism for testing
- DrugSens_Category_Eng: Category for the test
- DrugSens_sample_Eng: Sample name
Examination report table
The ExamReport table contains information related to a variety of medical image studies, including computed topography (CT), X-ray and ultrasonography. It would be of great interest to study the correlation between CT image features and free text descriptions. Deep learning techniques are applicable to both natural language and image features.
- patient_SN: Patient series number: unique to each individual subject
- Hospital_ID: unique to each hospital admission
- ExamReport_Category: Category of examination
- ExamReport_DESC: Description of the examination in free form text
- ExamReport_Finding: Result finding
- ExamReport_DateTime: Time for the examination results relative to hospital admission time as the time zero in days
- ExamReport_item_Eng: Name of the Examination
Medical order table
The MedOrder table gives regular and stat medical orders (MedOrder_Type) prescribed by clinicians. The contents of the medical order can be found in the MedOrder_DESC column.
- patient_SN: Patient series number: unique to each individual subject
- Hospital_ID: unique to each hospital admission
- MedOrder_Type: Type of medical order: regular or stat
- MedOrder_DESC: Description of medical order in free text
- MedOrder_Start_DateTime: Start time of medication in days relative to hospital admission
- MedOrder_Stop_DateTime: Stop time of medication in days relative to hospital admission
Medication table
The medication table contains information related to medication prescribed by physicians. Differently from the MedOrder table which contains miscellaneous medical orders, this table is designed specifically for medication orders, containing columns for drug dose, frequency, unit of drug dose and route of administration. This table would be useful to extract information related to drug usage, such as the type, dosing, timing of corticosteroids use.
- patient_SN: Patient series number: unique to each individual subject
- Hospital_ID: unique to each hospital admission
- Med_category: Category of medication
- SingleDose: Single dose
- Med_Freq: Frequency of administration
- Med_unit: Unit of measurement
- Med_startTime: Start time of medication in days relative to hospital admission
- Med_stopTime: Stop time of medication in days relative to hospital admission
- Med_route_Eng: Route of administration
- Med_DESC_Eng: Medication name in text
Vital sign table
The VitalSign table provides vital sign data for each hospital admission. The VitalSign_DESC column provides categories of vital signs including diastolic blood pressure, temperature, heart rate and respiratory rate.
- patient_SN: Patient series number: unique to each individual subject
- Hospital_ID: unique to each hospital admission
- VitalSign_DESC: Vital Sign Description
- VitalSign_value: Vital Sign value
- VitalSign_unit: Vital Sign unit of measurement
- VitalSign_DateTime: Vital Sign measurement time in days relative to hospital admission
Usage Notes
This dataset holds promise for advancing research in sepsis by providing a resource that integrates medical images and clinical tabular data. Some potential usages of this dataset include, but not limited to, extraction of radiomic features of medical image, predictive analytics, exploration of heterogeneity of sepsis-induced lung injury. Some example code for medical image read and series selection can be found on the GitHub [10].
Ethics
The study was approved by the ethics committee of Zhejiang Provincial People's Hospital (Approval number: 2023-397; 浙人医人审2023其它第(397)号).
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Rudd KE, Johnson SC, Agesa KM, Shackelford KA, Tsoi D, Kievlan DR, et al. Global, regional, and national sepsis incidence and mortality, 1990-2017: analysis for the Global Burden of Disease Study. Lancet. 2020 Jan 18;395(10219):200–11.
- Li W, Li D, Chen Y, Abudou H, Wang H, Cai J, et al. Classic Signaling Pathways in Alveolar Injury and Repair Involved in Sepsis-Induced ALI/ARDS: New Research Progress and Prospect. Dis Markers. 2022;2022:6362344.
- Michels EHA, Butler JM, Reijnders TDY, Cremer OL, Scicluna BP, Uhel F, et al. Association between age and the host response in critically ill patients with sepsis. Crit Care. 2022 Dec 13;26(1):385.
- Zhang Z, Chen L, Liu H, Sun Y, Shui P, Gao J, et al. Gene signature for the prediction of the trajectories of sepsis-induced acute kidney injury. Crit Care. 2022 Dec 21;26(1):398.
- Vliegenthart R, Fouras A, Jacobs C, Papanikolaou N. Innovations in thoracic imaging: CT, radiomics, AI and x-ray velocimetry. Respirology. 2022 Oct;27(10):818–33.
- Suri JS, Agarwal S, Gupta SK, Puvvula A, Biswas M, Saba L, et al. A narrative review on characterization of acute respiratory distress syndrome in COVID-19-infected lungs using artificial intelligence. Comput Biol Med. 2021 Mar;130:104210.
- Bouchareb Y, Moradi Khaniabadi P, Al Kindi F, Al Dhuhli H, Shiri I, Zaidi H, et al. Artificial intelligence-driven assessment of radiological images for COVID-19. Comput Biol Med. 2021 Sep;136:104665.
- Ter Maat LS, van Duin IAJ, Elias SG, van Diest PJ, Pluim JPW, Verhoeff JJC, et al. Imaging to predict checkpoint inhibitor outcomes in cancer. A systematic review. Eur J Cancer. 2022 Nov;175:60–76.
- Röhrich S, Hofmanninger J, Prayer F, Müller H, Prosch H, Langs G. Prospects and Challenges of Radiomics by Using Nononcologic Routine Chest CT. Radiology: Cardiothoracic Imaging. 2020 Aug 1;2(4):e190190.
- GitHub [(https://github.com/zh-zhang1984/ChestCT_sepsis/blob/main/ChestCT_Zhejiang.Rmd)]. [Accessed 3rd Feb 2024].
Access
Access Policy:
Only credentialed users who sign the DUA can access the files. In addition, users must have individual studies reviewed by the contributor.
License (for files):
PhysioNet Contributor Review Health Data License 1.5.0
Data Use Agreement:
PhysioNet Contributor Review Health Data Use Agreement 1.5.0
Required training:
CITI Data or Specimens Only Research
Discovery
DOI (version 1.0.0):
https://doi.org/10.13026/zne5-qh18
DOI (latest version):
https://doi.org/10.13026/ct3n-e938
Topics:
sepsis
Corresponding Author
Files
- be a credentialed user
- complete required training:
- CITI Data or Specimens Only Research You may submit your training here.
- submit a request to the authors to use the data for your project