Unified-ClinicalBERT-VGNN

This repository contains all the necessary scripts for training and preprocessing the data for a Variationally Regularized Graph Neural Network (VGNN) model¹ with ClinicalBERT² features integrated. It is designed to handle and process EHR, specifically the MIMIC-III dataset, and utilizes features extracted through ClinicalBERT as additional input for the VGNN.

Repository Structure

Below is a brief overview of the main components of the repository:

Unified-ClinicalBERT-VGNN
├── model.py
├── train_combo.py
└── utils.py

preprocess
├── preprocess_GNN
│   └── preprocess_mimic.py
├── preprocess_BERT
│   ├── mk_X_BERT_matched.py
│   └── preprocess_NOTEEVENTS.py
└── get_cls.py

BERT
└── train_BERT.py

data
└── MIMIC-III data

Files and Directories

Unified-ClinicalBERT-VGNN : Folder containing training script for ClinicalBert and VGNN combo
model.py: This file contains the main structure of the Graph Neural Network (GNN) model.
train_combo.py: This script is used for training the GNN model.
utils.py: This file contains utility functions used throughout the repository.
preprocess: This directory should contain your preprocessing files for GNN and ClinicalBERT (MIMIC-III dataset).
get_cls.py: A script for obtaining the [CLS] token as embeddings from the MIMIC data to be fed to the GNN model.
preprocess_GNN: This directory contains script (preprocess_mimic.py) for preprocessing MIMIC-III data for GNN and creating mappings of labels for ClinicalBERT.
preprocess_BERT: This directory contains scripts (mk_X_BERT.py, mk_X_BERT_matched.py, preprocess_NOTEEVENTS.py) for preprocessing data with ClinicalBERT.
BERT: This directory contains script (train_BERT.py) which is used for fine-tuning training the ClinicalBERT model.
data: This directory should contain your input data (MIMIC-III dataset).

Usage

First, make sure you download your MIMIC-III data into the data directory. The scripts are designed to read from this location. Then, follow the sequence of steps below:

Run preprocess_mimic.py in preprocess_GNN, and preprocess_NOTEEVENTS.py, and mk_X_BERT_matched.py in preprocess_BERT to preprocess your MIMIC-III data.
Run train_BERT.py in BERT to finetune the ClinicalBERT for the same prediction task as the VGNN.
Run get_cls.py in preprocess to obtain the additional features generated by ClinicalBERT.
Run train_combo.py in Unified-ClinicalBERT-VGNN to train your ClinicalBERT + VGNN Combo model.

Requirements

Please ensure you have the necessary dependencies installed. If not, install them with:

pip install -r requirements.txt

Footnotes

This code was modified from the following repo https://github.com/NYUMedML/GNN_for_EHR/tree/master ↩
This code was modified from the following repo https://github.com/EmilyAlsentzer/clinicalBERT ↩

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Unified-ClinicalBERT-VGNN

Repository Structure

Files and Directories

Usage

Requirements

Files

README.md

Latest commit

History

README.md

File metadata and controls

Unified-ClinicalBERT-VGNN

Repository Structure

Files and Directories

Usage

Requirements

Footnotes