Skip to content

Commit

Permalink
Merge pull request #190 from MIT-LCP/mimiciv_v2_1_updates
Browse files Browse the repository at this point in the history
MIMIC-IV v2.1 updates
  • Loading branch information
tompollard authored Nov 14, 2022
2 parents 9fc66cf + e008ff1 commit 9d2f661
Show file tree
Hide file tree
Showing 37 changed files with 532 additions and 535 deletions.
2 changes: 1 addition & 1 deletion content/en/docs/IV/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,6 @@ MIMIC-IV is separated into "modules" to reflect the provenance of the data. Ther
MIMIC-Note is currently not publicly available and the structure is subject to change.
{{% /pageinfo %}}

All patients across all datasets are in `mimic_core`. However, not all ICU patients have ED data, not all ICU patients have CXRs, not all ED patients have hospital data, and so on. Within an individual dataset, there are also incomplete tables as certain electronic systems did not exist in the past. For example, eMAR data is only available from 2015 onward.
All patients across all datasets are in the [hosp](/docs/iv/modules/hosp) module. However, not all ICU patients have ED data, not all ICU patients have CXRs, not all ED patients have hospital data, and so on. Within an individual dataset, there are also incomplete tables as certain electronic systems did not exist in the past, particularly the eMAR system.

Tables for each module are detailed in the respective sections.
54 changes: 52 additions & 2 deletions content/en/docs/IV/about/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,59 @@ description: >
Changes between releases of MIMIC-IV.
---

The latest version of MIMIC-IV is v1.0.
The latest version of MIMIC-IV is v2.1.

This page lists changes implemented in sequential updates to the MIMIC-IV database. Issues are tracked using a unique issue number, usually of the form #100, #101, etc (this issue number relates to a private 'building' repository).
This page lists changes implemented in sequential updates to the MIMIC-IV database. Issues are tracked using a unique issue number, usually of the form #100, #101, etc. Note that some of these issues are only accessible in a private 'building' repository.

### MIMIC-IV v2.1

MIMIC-IV v2.1 was released on November 14, 2022. It removed a subset of subject_id which will be retained internally as a test set. Future data releases will exclude these patients.

#### Major changes

* A subset of patients were removed from the dataset. 15,748 subject_id were removed from the patients table. 23,093 hadm_id were removed from the admissions table. 3,762 stay_id were removed from the icustays table.

### MIMIC-IV v2.0

MIMIC-IV v2.0 was released on June 12, 2022. It focused on expanding the data elements available for patients within MIMIC-IV v1.0. Additional data available includes out-of-hospital date of death, information from the online medical record system (which includes height and weight), and more detail for continuous infusions in the ICU.

#### Major changes

* The core module has been removed to simplify the schema. The _admissions__patients_, and _transfers_ tables are now in the hosp module.
* Neonates have been removed from the dataset. Neonatal data will be released in a separate project with data from the neonatal intensive care unit.

#### icu module

* _icustays_
* Around 700 stays (~1%) have changed due to the changes in the _patients_ table.
* _chartevents, d\_items_
* The problem list from MetaVision has been added. All problems are documented with the same `itemid` now present in _d\_items_: 220001. There are just over 1,000 unique problems. Most documented problems are related to the care plan for the patient and documented during nurse shift changes (either 7am or 7pm). Less frequently, the ongoing issues are documented here.
* _ingredientevents_
* This is a new table associated with _inputevents_. Each intravenous administration tracked in _inputevents_ is associated with a set of ingredients. These ingredients include water content, caloric information, and so on. The goal of the _inputevents_ table is to support nutrition research and to provide a mechanism for estimating fluid input via summing all instances of the water ingredient. These ingredients have been separated from the _inputevents_ table to simplify analysis and reduce the size of _inputevents_.
* _inputevents_
* Removed a single column which contained only null values: `cancelreason`.
* _procedureevents_
* Removed columns which contained only null values: `totalamount`, `totalamountuom`, `cancelreason`, `comments_editedby`, `comments_canceledby`, `comments_date`, `secondaryordercategoryname`.

#### hosp module

* _admissions_
* Fixed an issue where hospitalizations were missing _edregtime_ and _edouttime_ when the patient was admitted via the ED (reported in [#1247](https://github.com/MIT-LCP/mimic-code/issues/1247), thanks [@MEladawi](https://github.com/MEladawi)).
* _patients_
* `dod` is now populated with out-of-hospital mortality from state death records. For patients admitted to the ICU, this change has increased capture of date of death from 8,223 records to 23,844 (i.e. we now have out-of-hospital mortality for an additional 15,621 ICU patients).
* The mechanism for determining patients included in MIMIC was changed. For the most part this has resulted in an improvement, particularly regarding the logic for merging patients who had distinct medical record numbers. As a result of this change, most tables have had a change in the data content. Approximately 1% of stays were affected.
* _transfers_
* Fixed a bug where the `outtime` for ED stays with no associated `hadm_id` (i.e. an ED stay where the individual was not admitted to the hospital) was incorrect. This resulted in all _transfers_ rows with a NULL `hadm_id` having an apparent stay of minutes or less. The `outtime` column has now been corrected.
* _labevents, d\_labitems_
* The `itemid` for _d\_labitems_ has been changed for 43 items. These are extremely infrequently documented and each `itemid` has fewer than 100 observations in _labevents_. The exact `itemid` are provided in the changelog file CHANGELOG.txt.
* Errors were found in the current values of `loinc_code` (reported in [#938](https://github.com/MIT-LCP/mimic-code/issues/938), thanks [@Mauvila](https://github.com/Mauvila)). In order to enable collaborative improvement, the `loinc_code` column has been removed, and will now be collaboratively developed in the [MIMIC Code Repository](https://github.com/MIT-LCP/mimic-code/). Initial values will be sourced from the hospital system.
* A number of labs which previously had the value in the comments field now have the value in the value field (reported in [#941](https://github.com/MIT-LCP/mimic-code/issues/941), thanks [@Mauvila](https://github.com/Mauvila)). This change makes the _labevents_ table more consistent with MIMIC-III, which had these values in the value field.
* _microbiologyevents_
* New organisms, tests, specimens, and antibiotics have been added.
* _omr_
* A new table has been added: _omr._ The source of this data is the Online Medical Record, and it contains miscellaneous information useful for understanding an individual's health. As of v2.0, the _omr_ table has the following information: blood pressure, height, weight, body mass index, and Estimated Glomerular Filtration Rate (eGFR). These values are available from both inpatient and outpatient visits, and in many cases a "baseline" value from before a patient's hospitalization is available.
* _prescriptions_
* The `formulary_drug_cd` table has been added back (was previously in MIMIC-III). This column has the same set of values as the `product_code` column of emar\_detail.

### MIMIC-IV v1.0

Expand Down
3 changes: 2 additions & 1 deletion content/en/docs/IV/about/concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ The *transfers* table contains information for each unique `transfer_id`. `trans
## `stay_id`

The *transfers* table also contains the `stay_id`. This is an artificially generated identifier which groups reasonably contiguous episodes of care.
The `stay_id` present in *icustays* is derived from the `stay_id` values in the *transfers* table.

# date and times

Expand Down Expand Up @@ -83,7 +84,7 @@ For events which occur over a period of time, `starttime` and `endtime` provide

### `dod`

`dod` is the patient's date of death sourced from the hospital database.
`dod` is the patient's date of death sourced from one of two sources: the hospital database or a state death database. See the [*patients*](/docs/iv/modules/hosp/patients) documentation for more detail.

### `transfertime`

Expand Down
14 changes: 12 additions & 2 deletions content/en/docs/IV/modules/_index.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,18 @@
---
title: "Tables"
title: "Modules"
linkTitle: "Modules"
weight: 3
date: 2020-08-10
description: >
Description of the data contained in the MIMIC-IV tables.
Description of the data contained in each of the the MIMIC-IV modules.
---

Data within the modules are available on PhysioNet:

* hosp, icu: [MIMIC-IV](https://physionet.org/content/mimiciv/)
* ed: [MIMIC-IV-ED](https://physionet.org/content/mimic-iv-ed/)
* note: [MIMIC-IV-Note](https://physionet.org/content/mimic-iv-note/)
* cxr: [MIMIC-CXR](https://physionet.org/content/mimic-cxr/)

The sections below describe data within each module.
<!-- Subfolder content is automatically placed here. -->
2 changes: 1 addition & 1 deletion content/en/docs/IV/modules/ed/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,5 @@ linkTitle: "ED"
date: 2020-08-10
weight: 40
description: >
The ED module contains data for emergency department patients collected while they are in the ED. Information includes reason for admission, triage assessment, vital signs, and medicine reconciliaton. Patient identifiers allow MIMIC-ED to be linked to other MIMIC-IV modules.
The ED module contains data for emergency department patients collected while they are in the ED. Information includes reason for admission, triage assessment, vital signs, and medicine reconciliaton. The `subject_id` and `hadm_id` identifiers allow MIMIC-IV-ED to be linked to other MIMIC-IV modules.
---
4 changes: 2 additions & 2 deletions content/en/docs/IV/modules/ed/diagnosis.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ The *diagnosis* table provides billed diagnoses for patients. Diagnoses are dete

**Table purpose:** Track patient admissions to the emergency department.

**Number of rows:** 949,172
**Number of rows:** 899,050

**Links to:**

Expand All @@ -29,7 +29,7 @@ Name | Postgres data type
`seq_num` | INTEGER NOT NULL
`icd_code` | VARCHAR(10) NOT NULL
`icd_version` | INTEGER NOT NULL
`icd_title` | VARCHAR(255) NOT NULL
`icd_title` | TEXT NOT NULL

## `subject_id`

Expand Down
53 changes: 47 additions & 6 deletions content/en/docs/IV/modules/ed/edstays.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ It provides the time the patient entered the emergency department and the time t

**Table purpose:** Track patient admissions to the emergency department.

**Number of rows:** 448,972
**Number of rows:** 425,087

**Links to:**

Expand All @@ -25,11 +25,16 @@ It provides the time the patient entered the emergency department and the time t

Name | Postgres data type
---- | ----
`subject_id` | INTEGER NOT NULL
`hadm_id` | INTEGER NOT NULL
`stay_id` | INTEGER NOT NULL
`intime` | TIMESTAMP(0) NOT NULL
`outtime` | TIMESTAMP(0) NOT NULL
`subject_id` | INTEGER NOT NULL
`hadm_id` | INTEGER NOT NULL
`stay_id` | INTEGER NOT NULL
`intime` | TIMESTAMP(0) NOT NULL
`outtime` | TIMESTAMP(0) NOT NULL
`gender` | VARCHAR(1) NOT NULL
`race` | VARCHAR(60)
`arrival_transport` | VARCHAR(50) NOT NULL
`disposition` | VARCHAR(255)


## `subject_id`

Expand All @@ -48,3 +53,39 @@ An identifier which uniquely identifies a single emergency department stay for a
## `intime`, `outtime`

The admission datetime (`intime`) and discharge datetime (`outtime`) of the given emergency department stay.

## `gender`

The patient's administrative gender as documented in the hospital system.

## `race`

The patient's self-reported race. Race is aggregated into higher level categories for very small groups.
As of MIMIC-IV-ED v2.1, there were 33 unique categories for race.

## `arrival_transport`

The method through which the individual arrived at the ED. A count of the possible entries is provided below.

arrival_transport | count
--- | ---
WALK IN | 251849
AMBULANCE | 155752
UNKNOWN | 15352
OTHER | 1266
HELICOPTER | 868

## `disposition`

The method through which the individual left the ED. Of the non-null methods, the possibilities include:

disposition | count
--- | ---
HOME | 241632
ADMITTED | 158010
TRANSFER | 7025
LEFT WITHOUT BEING SEEN | 6155
OTHER | 4297
LEFT AGAINST MEDICAL ADVICE | 1881
ELOPED | 5710
EXPIRED | 377
2 changes: 1 addition & 1 deletion content/en/docs/IV/modules/ed/medrecon.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ On admission to the emergency departments, staff will ask the patient what curre

**Table purpose:** Document medications a patient is currently taking.

**Number of rows:** 3,147,294
**Number of rows:** 2,987,342

**Links to:**

Expand Down
3 changes: 1 addition & 2 deletions content/en/docs/IV/modules/ed/pyxis.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Note that as the same medication may have multiple `gsn` values, each row does *

**Table purpose:** Track medicine administrations.

**Number of rows:** 1,674,652
**Number of rows:** 1,586,053

**Links to:**

Expand All @@ -33,7 +33,6 @@ Name | Postgres data type
`charttime` | TIMESTAMP(0)
`med_rn` | SMALLINT NOT NULL
`name` | VARCHAR(255)
`ifu` | VARCHAR(255)
`gsn_rn` | SMALLINT NOT NULL
`gsn` | VARCHAR(10)

Expand Down
59 changes: 51 additions & 8 deletions content/en/docs/IV/modules/ed/triage.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,18 +16,61 @@ All fields in *triage* were originally free-text. For deidentification purposes,

**Table source:** Emergency department information system.

**Table purpose:**
**Table purpose:** Store information collected on triage to the emergency department.

**Number of rows:**
**Number of rows:** 425,087

**Links to:**

* *edstays* on `stay_id`

# Important considerations

* There is no time associated with triage observations. The closest approximation to triage time is the `intime` of the patient from the *edstays* table.

## Important considerations

There is no time associated with triage observations. The closest approximation to triage time is the `intime` of the patient from the *edstays* table.

The numeric entries in this table were originally stored as free-text. As a result, the columns required deidentification. Free-text entries which could not be converted trivially were removed. Normally, the application of deidentification in MIMIC-IV is indicated using three underscores (`___`) to make it clear to users that we have modified the data. However, due to the data type restriction, we were unable to do this in this case. As a result, **missing data in the numeric columns indicates either deidentified data or no data recorded**. However, this is usually rare. Below is a table demonstrating how often data were removed for deidentification purposes:

Column | Number of NULL values inserted for deidentification | Number of rows missing data in v2.1
--- | --- | ---
`temperature` | 680 | 23415
`heartrate` | 292 | 17090
`resprate` | 223 | 20353
`o2sat` | 414 | 20596
`sbp` | 238 | 18291
`dbp` | 214 | 19091
`acuity` | 0 | 6987

From the above, we can see that of the 23415 rows missing a `temperature` value, only 680 had a free-text value which was deleted during deidentification (~3%).

<!--
SQL queries to generate the above:
select
COUNT(tr_phi.temp) - COUNT(tr.temperature) AS temperature
, COUNT(tr_phi.hr) - COUNT(tr.heartrate) AS heartrate
, COUNT(tr_phi.rr) - COUNT(tr.resprate) AS resprate
, COUNT(tr_phi.sao2) - COUNT(tr.o2sat) AS o2sat
, COUNT(tr_phi.sbp) - COUNT(tr.sbp) AS sbp
, COUNT(tr_phi.dbp) - COUNT(tr.dbp) AS dbp
, COUNT(tr_phi.acuity) - COUNT(tr.acuity) AS acuity
from ed_phi.triage tr
left join sh.triage tr_phi
using (fiscal_num_ed)
-- if you want total rows, union to the below
UNION ALL
select
COUNT(*) - COUNT(tr.temperature) AS temperature
, COUNT(*) - COUNT(tr.heartrate) AS heartrate
, COUNT(*) - COUNT(tr.resprate) AS resprate
, COUNT(*) - COUNT(tr.o2sat) AS o2sat
, COUNT(*) - COUNT(tr.sbp) AS sbp
, COUNT(*) - COUNT(tr.dbp) AS dbp
, COUNT(*) - COUNT(tr.acuity) AS acuity
from ed_phi.triage tr
;
-->
# Table columns

Name | Postgres data type
Expand All @@ -40,9 +83,9 @@ Name | Postgres data type
`o2sat` | NUMERIC(10, 4)
`sbp` | NUMERIC(10, 4)
`dbp` | NUMERIC(10, 4)
`pain` | NUMERIC(10, 4)
`pain` | TEXT
`acuity` | NUMERIC(10, 4)
`chiefcomplaint` | TEXT
`chiefcomplaint` | VARCHAR(255)

## `subject_id`

Expand Down
49 changes: 45 additions & 4 deletions content/en/docs/IV/modules/ed/vitalsign.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,56 @@ Patients admitted to the emergency department have routine vital signs taken eve

**Table purpose:** Provides nurse documented vital signs.

**Number of rows:** 1,651,119.
**Number of rows:** 1,564,610

**Links to:**

* *edstays* on `stay_id`

<!-- # Important considerations -->

# Table columns
## Important considerations

The numeric entries in this table were originally stored as free-text. As a result, the columns required deidentification. Free-text entries which could not be converted trivially were removed. Normally, the application of deidentification in MIMIC-IV is indicated using three underscores (`___`) to make it clear to users that we have modified the data. We decided it was better to omit this modification than to add confusion and difficulty to users by sharing the majority numeric data as text. As a result, **missing data in the numeric columns indicates either deidentified data or no data recorded**. However, for the most part, missing data indicates that no information was documented. Below is a table demonstrating how often data were removed for deidentification purposes:

Column | Number of NULL values inserted for deidentification | Number of rows missing data in v2.1
--- | --- | ---
`temperature` | 11048 | 564968
`heartrate` | 3282 | 69710
`resprate` | 1330 | 89393
`o2sat` | 46620 | 135836
`sbp` | 2854 | 81256
`dbp` | 2854 | 81256

From the above, we can see that of the 564968 rows missing a `temperature` value, 11048 had a free-text value which was deleted during deidentification (~2%).

<!--
SQL queries to generate the above:
select
COUNT(vs_phi.temp) - COUNT(vs.temperature) AS temperature
, COUNT(vs_phi.pulse) - COUNT(vs.heartrate) AS heartrate
, COUNT(vs_phi.rr) - COUNT(vs.resprate) AS resprate
, COUNT(vs_phi.o2sat) - COUNT(vs.o2sat) AS o2sat
, COUNT(vs_phi.bp) - COUNT(vs.sbp) AS sbp
, COUNT(vs_phi.bp) - COUNT(vs.dbp) AS dbp
from ed_phi.vitalsign vs
left join sh.vitalsign vs_phi
using (fiscal_num_ed, charttime)
-- if you want total rows, union to the below
UNION ALL
select
COUNT(*) - COUNT(vs.temperature) AS temperature
, COUNT(*) - COUNT(vs.heartrate) AS heartrate
, COUNT(*) - COUNT(vs.resprate) AS resprate
, COUNT(*) - COUNT(vs.o2sat) AS o2sat
, COUNT(*) - COUNT(vs.sbp) AS sbp
, COUNT(*) - COUNT(vs.dbp) AS dbp
from ed_phi.vitalsign vs
;
-->

## Table columns

Name | Postgres data type
---- | ----
Expand Down
Loading

0 comments on commit 9d2f661

Please sign in to comment.