Database Credentialed Access

MIMICEL: MIMIC-IV Event Log for Emergency Department

Jia Wei Zhipeng He Chun Ouyang Catarina Moreira

Published: June 16, 2023. Version: 2.1.0


When using this resource, please cite: (show more options)
Wei, J., He, Z., Ouyang, C., & Moreira, C. (2023). MIMICEL: MIMIC-IV Event Log for Emergency Department (version 2.1.0). PhysioNet. https://doi.org/10.13026/c9yj-1t90.

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

Abstract

In this work, we extract an event log from the MIMIC-IV-ED dataset by adopting a well-established event log generation methodology, and we name this event log MIMICEL. The data tables in the MIMIC-IV-ED dataset relate to each other based on the existing relational database schema, and each table records the individual activities of patients along their journey in the emergency department (ED). While the data tables in the MIMIC-IV-ED dataset catch snapshots of a patient journey in the ED, the extracted event log MIMICEL aims to capture an end-to-end process of the patient journey. This will enable us to analyse the existing patient flows, thereby improving the efficiency of an ED process.


Background

Overcrowding has been a significant problem in emergency departments worldwide since services provided to patients fail to meet the growing needs of emergency care [1]. Therefore, it is necessary to analyse the flow of patients through emergency departments to improve efficiency and reduce overcrowding [2]. However, the nature of a healthcare process, such as a process of activities during a patient’s stay in the emergency department (referred to as an ED process), is often dynamic and complex [3]. Traditional analysis methods are time-consuming and costly [4] when used to analyse ED processes.

Process mining techniques are increasingly adopted in the healthcare domain for analysing healthcare processes and patient flows [5,6]. Process mining aims to discover, monitor, and improve processes by extracting knowledge from process data recorded in information systems [7], e.g., health information systems (HIS). Such process data is captured in so-called event logs, the availability of which is a prerequisite for applying process mining techniques.

An event log is a collection of cases, and each case consists of a sequence of events (ordered according to when they occurred) [7]. An example of a case in the context of an ED process is a patient’s single stay in the ED (which is uniquely identified by stay_id ). An event corresponds to an instance of activity that occurred during a (case of) patient’s stay in the ED. For example, an event could be a patient entering the emergency room at a particular time during a specific stay. Each event is described by a set of attributes, where case ID, activity name, and timestamp are the three mandatory attributes of an event. An event may have additional attributes. Some of these are static attributes, such as the patient’s unique identifier (recorded as subject_id ), which carry the same value within the same case. Others are dynamic attributes, such as the patient’s temperature and heart rate, of which the value can change from event to event. For an event log, static attributes are often referred to as case attributes, and dynamic attributes are known as event attributes [8].

MIMIC-IV [9,10] is an extensive, freely available database composed of de-identified health-related data of patients at Beth Israel Deaconess Medical Centre. As this work aims to extract event logs capturing the execution of ED processes, the MIMIC-IV-ED dataset is used, which “contains data for emergency department patients collected while they are in the ED”.

The data tables in the MIMIC-IV-ED dataset record individual activities of patients along their journey in the ED and are related via the existing relational database schema [9]. While these data tables catch the snapshots of a patient journey, they cannot demonstrate an end-to-end patient flow through the ED. Thus, to understand and analyse an ED process, we first need to extract an event log from the MIMIC-IV-ED dataset before any process mining techniques can be applied. The extracted event log MIMICEL can then be used as a key input for data-driven process analysis.

Our motivations for this work are to:

  • Have a clear understanding of a patient’s end-to-end process in the ED.
  • Promote the extensive use of MIMIC-IV-ED data by making it accessible in the form of an event log.
  • Enable the use of process mining techniques on the MIMIC-IV-ED data.
  • Provide the process mining and PhysioNet research communities with a dataset for demonstration and experiments.
  • Investigate the feasibility of converting data from the healthcare domain into the data format required by the event log standard.

Methods

In the existing research [11-14], the authors have proposed ways to generate event logs from relational databases. In this work, we adopt the approach that Jans et al. [14] introduced for event log extraction, which we consider to be the most systematic and complete. The method consists of nine steps with the aim of extracting event logs for a specific objective. We aim to derive a full event log that captures the end-to-end process of a patient journey in the ED by incorporating the data recorded in the MIMIC-IV-ED dataset [9]. Below, we discuss how we follow the guidelines proposed in the above approach to extract an event log capturing the execution of ED processes from the MIMIC-IV-ED dataset.

Step 1: Set a primary business goal

This step intends to “establish the business goal that is considered as a ‘must have’ by the project sponsor” [14], which determines the purpose of event log generation. Overcrowding in emergency departments is considered a significant problem in the healthcare system; thus, improving patient flow through the ED has become a top priority for healthcare providers [15]. However, before improving the efficiency of ED processes, we first need to obtain an overall understanding of a patient journey in the ED. This has motivated the main objective of this work, which is to capture an end-to-end patient’s journey in the ED based on the data recorded in the MIMIC-IV-ED dataset.

Step 2: Identify key process cornerstones

This step corresponds to “determining the boundaries of the process under investigation and the core activities that are of interest to the project stakeholders” [14]. This step identifies the key activities of a process as informed by domain knowledge. In our work, we refer to a widely-adopted conceptual model of emergency department crowding [16] as relevant domain knowledge to identify key activities of an ED process.

Step 3: Identify key tables

In this step, the key process activities identified in Step 2 are used to guide the selection of key tables. In our work, all tables in the MIMIC-IV-ED dataset are relevant to the key activities of an ED process and therefore are included. 

Step 4: Identify relationships between tables

This step focuses on determining the relationships between the tables selected in Step 3. In this work, the relationships between the tables are given by the existing relational database schema of the MIMIC-IV-ED dataset. 

Step 5: Select the process instance (case) document

This step aims to determine the boundaries of process instances (i.e., cases) by identifying the start document that triggers an instance of the process and/or the end document that completes the process instance. In this work, most of the tables in the MIMIC-IV-ED dataset contain temporal information, which helps us to identify the start and end activity in an ED process.

At the start of an ED stay, different activities may occur. For example, if a patient’s arrival in the ED happens first, then the edstays table, which “tracks patient admissions to the ED,” is considered a start document that triggers an instance of the ED process. In another example, the measurement of a patient’s routine vital signs, the medicine reconciliation, or the medicine dispensation occurs first, e.g., in an ambulance before the patient arrives in the ED. In this scenario, the corresponding table vitalsign, medrecon, or pyxis will be considered the start document.

At the end of an ED stay, patients are discharged from the ED. Information about the patient’s diagnoses is provided upon discharge and is used for billing purposes. This information is recorded in the diagnosis table, which can be regarded as the end document of a patient’s ED process.

Step 6: Select process instance (case) level

As aforementioned, an event log is a collection of process instances (i.e., cases) [7]. This step focuses on deciding the granularity of case levels in the extracted event log. In an ED scenario, a case can be related to a single patient which may have multiple ED stays or can be associated with a single ED stay. The focus of this work is on single ED stays, where each ED stay is uniquely identified by stay_id (i.e., the case ID attribute of the extracted event log). The process instance documents identified in Step 5 represent the start and/or end activity of a single ED stay. 

Step 7: Identify activities

This step aims to select relevant activities for the case level (identified in Step 6) given the data recorded in the key tables (identified in Step 3). In this work, we identify all potential activities in a single ED stay, using the data stored in all tables in the MIMIC-IV-ED dataset.

According to the guidelines [14], candidate activities are expected to have the time information stored in the database. Whilst triage is one of the key activities in an ED process [16], the triage table does not provide timestamps. As part of the MIMIC-IV-ED dataset the description of the triage table states that “the closest approximation to triage time is the intime of the patient from the edstays table” [9]. Based on this statement, we make the following assumption to provide an artificial timestamp for a triage activity. We decide to add “one second” to the time when the patient enters the ED (i.e., intime of the edstays table) since it will not affect the time of any subsequent activities.

Step 8: Identify attributes

This step focuses on identifying all relevant attributes in addition to the three mandatory attributes (i.e., case ID, activity name and timestamp) of an event log, given the data recorded in the datasets. In this work, we include all data attributes stored in the MIMIC-IV-ED dataset. 

Step 9: Relate attributes to activities

The final step relates the attributes identified in Step 8 to a case or an event, i.e., case attribute or event attribute (of an event log). A detailed description of the case attributes and event attributes of the extracted event log is provided in the next section.


Data Description

In this work, we generate the event log MIMICEL in two formats: one as a CSV file and the other in XES (Extensible Event Stream) format [17].

mimicel.csv contains 7,568,824 events and 425,028 cases, capturing the ED stays of 205,466 patients recorded in the MIMIC-IV-ED dataset [9]. Each row of the CSV file represents an execution of an event during an ED stay, and each column corresponds to an event’s attribute. We describe each of the columns in detail.

Firstly, MIMICEL has three mandatory attributes represented by the following three columns.

stay_id: the unique identifier of an ED stay, which is the case ID of MIMICEL

activity: the activity name of an event during an ED stay. The names of all activities in MIMICEL are listed below.

  • “Enter the ED”, recorded in the edstays table
  • “Triage in the ED”, recorded in the triage table
  • “Take routine vital signs”, recorded in the vitalsign table
  • “Medicine reconciliation”, recorded in the medrecon table
  • “Medicine dispensation”, recorded in the pyxis table
  • “Discharge from the ED”, recorded in the edstays table

timestamps: the time at which an event (i.e., an instance of one of the above activities) was executed. The timestamps of each of the above activities are listed below.

  • Timestamp of activity “Enter the ED”, given by intime in the edstays table
  • Timestamp of activity “Triage in the ED”, given by (intime + 1 second) (refer to Step 7 in the Methods section)
  • Timestamp of activity “Take routine vital signs”, given by charttime in the vitalsign table
  • Timestamp of activity “Medicine reconciliation”, given by charttime in the medrecon table
  • Timestamp of activity “Medicine dispensation”, given by charttime in the pyxis table
  • Timestamp of activity “Discharge from the ED”, given by outtime in the edstays table 

Secondly, MIMICEL has case attributes represented by the following columns taken directly from the corresponding data tables in MIMIC-IV-ED.

  • subject_id, recorded in all tables
  • hadm_id, recorded in the edstays table
  • gender, recorded in the edstays table
  • race, recorded in the edstays table
  • acuity, recorded in the triage table
  • chiefcomplaint, recorded in the triage table

Thirdly, MIMICEL has event attributes represented by the following columns taken directly from the corresponding data tables in MIMIC-IV-ED.

  • arrival_transport, recorded in the edstays table
  • disposition, recorded in the edstays table
  • temperature, recorded in the triage and vitalsign tables
  • heartrate, recorded in the triage and vitalsign tables
  • resprate, recorded in the triage and vitalsign tables
  • o2sat, recorded in the triage and vitalsign tables
  • sbp, recorded in the triage and vitalsign tables
  • dbp, recorded in the triage and vitalsign tables
  • pain, recorded in the triage and vitalsign tables
  • rhythm, recorded in the vitalsign table
  • ndc, recorded in the medrecon table
  • etc_rn, recorded in the medrecon table
  • etccode, recorded in the medrecon table
  • etcdescription, recorded in the medrecon table
  • name, recorded in the medrecon and pyxis tables
  • gsn, recorded in the medrecon and pyxis tables
  • med_rn, recorded in the pyxis table
  • gsn_rn, recorded in the pyxis table
  • seq_num, recorded in the diagnosis table
  • icd_code, recorded in the diagnosis table
  • icd_version, recorded in the diagnosis table
  • icd_title, recorded in the diagnosis table

The above columns’ descriptions can be found on PhysioNet [9,18].

mimicel.xes applies an XML schema for event logs, known as XES (which stands for “eXtensible Event Stream”) [17]. XES is an open standard for storing and managing event log data and is machine-readable. XES maintains the general structure of an event log and uses the term “trace” instead of “case”. It is supported by the majority of process mining tools.

XES defines the following schema:

⟨trace⟩
     ⟨! −− Trace attributes −−⟩
     ⟨event⟩
           ⟨! −− Event attributes −−⟩
     ⟨/event⟩
      ...
⟨/trace⟩

mimicel.xes is converted from mimicel.csv by Python library PM4PY [19], where the case attributes in mimicel.csv become the trace attributes in mimicel.xes.


Usage Notes

The MIMICEL event log generated in this study originates from the MIMIC-IV-ED dataset [9], so valid access to the MIMIC-IV-ED dataset is required to make sensible use of the MIMICEL event log. The missing data presented in MIMICEL is inherited from MIMIC-IV-ED, and no data pre-processing was performed to ensure the consistency between MIMICEL and MIMIC-IV-ED. Documentation and instructions for generating the MIMICEL event log (both CSV and XES files) are available from the project repository on GitHub [20].

The MIMICEL event log, especially mimicel.xes, allows the data to be accessed by many process mining tools. These include commercial process mining tools such as Disco [21] and SAP Celonis Process Mining [22], as well as open-source process mining tools such as ProM [23] and PM4Py [19].


Release Notes

MIMICEL v2.1.0 - 2023-06-01

Changed
  • Removed 59 cases with zero or negative ED length of stay, meaning the event "Enter the ED" occurred at the same time or after the event "Discharge from the ED" in the same ED stay. This cleaning operation is implemented in a new SQL script named 4_clean.sql
Fixed
  • Fixed a bug when integrating the diagnosis table with the activity "Discharge from the ED". LEFT JOIN is utilised (instead of INNER JOIN), which influences 1098 cases. This fix is implemented in 2_activity.sql
  • Fixed a bug to remove events occurred at the same time or after the event “Discharge from the ED” in a single ED stay, due to the fact that discharge should represent the unique end of an ED stay (refer to Step 5 in the Methods section). This fix is implemented in 2_activity.sql

 

MIMICEL v2.0.0 - 2023-01-31

Added
  • Add case attributes (gender and race) and event attributes (arrival_transport and disposition) to MIMICEL based on updates introduced in MIMIC-IV-ED version 2.2.
Fixed
  • Remove “diagnosis (with the value of) seq_num” from the activity name “Discharge from the ED + diagnosis (with the value of) seq_num”, because the diagnosis seq_num is captured as an event attribute in MIMICEL.

MIMICEL v1.1.0 - 2022-11-28

Added
  • Add descriptions of standard name labels for xes in README
  • Add CHANGELOG.md to record the changes
Fixed
  • Fix bugs when converting csv to xes
  • Fix the syntax errors in mimicel.xes

MIMICEL v1.0.0 - 2022-07-06

Added
  • Add sql scripts for extracting event logs into csv
  • Add python code and jupyter notebook for converting csv into xes format
  • Add README.md and LICENSE

Ethics

MIMICEL is a reconstructed version of MIMIC-IV-ED and exists under the same IRB. 


Conflicts of Interest

The author(s) have no conflicts of interest to declare.


References

  1. Savioli G, Ceresa IF, Gri N, Bavestrello Piccini G, Longhitano Y, Zanza C, et al. (2022). Emergency Department Overcrowding: Understanding the Factors to Find Corresponding Solutions. Journal of Personalized Medicine. 12(2):279.
  2. Brenner S, Zeng Z, Liu Y, Wang J, Li J, Howard PK. (2010). Modeling and analysis of the emergency department at University of Kentucky Chandler Hospital using simulations. Journal of Emergency Nursing. 36(4):303–310.
  3. Rebuge Á, Ferreira DR. (2012). Business process analysis in healthcare environments: A methodology based on process mining. Information Systems. 37(2):99–116.
  4. Delias P, Manolitzas P, Grigoroudis E, Matsatsinis N. (2014). Applying Process Mining to the Emergency Department. In: Encyclopedia of Business Analytics and Optimization. IGI Global. p. 168–178.
  5. Munoz-Gama J, Martin N, Fernández-Llatas C, Johnson OA, Sepúlveda M, Helm E, et al. (2022). Process mining for healthcare: Characteristics and challenges. Journal of Biomedical Informatics. 127:103994.
  6. Martin N, De Weerdt J, Fernández-Llatas C, Gal A, Gatta R, Ibáñez G, et al. (2020). Recommendations for enhancing the usability and understandability of process mining in healthcare. Artificial Intelligence in Medicine. 109:101962.
  7. van der Aalst WMP (2016). Process Mining: Data Science in Action. Springer.
  8. Teinemaa I, Dumas M, Rosa ML, Maggi FM. (2019). Outcome-Oriented Predictive Process Monitoring: Review and Benchmark. ACM Transactions on Knowledge Discovery from Data. 13(2):1–57.
  9. Johnson A, Bulgarelli L, Pollard T, Celi LA, Mark R, Horng S. (2023). MIMIC-IV-ED (version 2.2). PhysioNet. Available from: https://doi.org/10.13026/77z6-9w59.
  10. Johnson A, Bulgarelli L, Pollard T, Horng S, Celi LA, Mark R. (2021). MIMIC-IV (version 1.0). PhysioNet. Available from: https://doi.org/10.13026/s6n6-xd98.
  11. Remy S, Pufahl L, Sachs JP, Böttinger E, Weske M. (2020). Event log generation in a health system: A case study. In: International Conference on Business Process Management. Springer. p. 505–522.
  12. Andrews R, van Dun CG, Wynn MT, Kratsch W, Röglinger M, ter Hofstede AH. (2020). Quality-informed semi-automated event log generation for process mining. Decision Support Systems. 132:113265.
  13. Rojas E, Sepúlveda M, Munoz-Gama J, Capurro D, Traver V, Fernandez-Llatas C. (2017). Question-driven methodology for analyzing emergency room processes using process mining. Applied Sciences. 7(3):302.
  14. Jans M, Soffer P, Jouck T. (2019). Building a valuable event log for process mining: an experimental exploration of a guided process. Enterprise Information Systems. 13(5):601–630.
  15. Andrews R, Suriadi S, Wynn M, ter Hofstede AH, Rothwell S. (2018). Improving Patient Flows at St. Andrew’s War Memorial Hospital’s Emergency Department Through Process Mining. In: Business Process Management Cases. Springer. p. 311–333.
  16. Asplin BR, Magid DJ, Rhodes KV, Solberg LI, Lurie N, Camargo Jr CA. (2003). A conceptual model of emergency department crowding. Annals of Emergency Medicine. 42(2):173–180.
  17. IEEE. (2016). IEEE Standard for eXtensible Event Stream (XES) for Achieving Interoperability in Event Logs and Event Streams. IEEE Std 1849-2016. p. 1–50.
  18. Goldberger AL, Amaral LAN, Glass L, Hausdorff JM, Ivanov PC, Mark RG, et al. (2000). PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation[online]. 101(23):e215–e220.
  19. Berti A, van Zelst SJ, van der Aalst W. (2019). Process Mining for Python (PM4Py): Bridging the Gap Between Process- and Data Science. In: Proceedings of the ICPM Demo Track 2019, co-located with the 1st International Conference on Process Mining (ICPM 2019). p. 13–16.
  20. He Z. (2022). MIMIC-IV Event Log Extraction for ED. Available from: https://github.com/ZhipengHe/MIMIC-IV-event-log-extraction-for-ED.
  21. Fluxicon (2022). Disco [Internet]. Available from: https://fluxicon.com/disco/.
  22. Celonis (2022). Celonis [Internet]. Available from: https://www.celonis.com/.
  23. ProM (2022). ProM [Internet]. Available from: http://www.promtools.org/doku.php.

Parent Projects
MIMICEL: MIMIC-IV Event Log for Emergency Department was derived from: Please cite them when using this project.
Share
Access

Access Policy:
Only credentialed users who sign the DUA can access the files.

License (for files):
PhysioNet Credentialed Health Data License 1.5.0

Data Use Agreement:
PhysioNet Credentialed Health Data Use Agreement 1.5.0

Required training:
CITI Data or Specimens Only Research

Discovery

DOI (version 2.1.0):
https://doi.org/10.13026/c9yj-1t90

DOI (latest version):
https://doi.org/10.13026/j8yc-gg94

Corresponding Author
You must be logged in to view the contact information.
Versions

Files