GitHub

eMOSAIC: Multi-modal Out-of-distribution Uncertainty Quantification Streamlines Large-scale Polypharmacology

Contains the code for training, testing our proposed eMOSAIC model for prediction of binding affinity and the uncertainty associated with it.

Dataset

eMOSAIC requires the following for training:

A csv file containing the protein (with it's UniProt ID and Pfam family), SMILES, pKi value (binding affinity).
A directory of ESM-2 and/or ESMFold generated embeddings.

The dataset folder contains more information and scripts to help generate the embeddings and create the dataset.

Environment

CUDA : 11.3 Python : 3.10.9
The other dependencies can be installed with the requirements.txt file present under the environment folder

Usage

We train the binding affinity prediction model present in code/BindingAffinityModule/ folder using the following:

python main.py

Once we have the trained binding affinity prediction model, we train eMOSAIC for uncertainty quantification (from code/AnomalyDetection/), this extracts the embeddings, clusters them and then learns the residue for accurate uncertainty quantification:

python main.py

For using pretrained model, use the following (from code/BindingAffinityModule/):

python predict.py --smiles_list "Cc1cc(Oc2ccc(/C=C3\\SC(=O)N([C@@H](Cc4ccccc4)C(=O)O)C3=O)cc2)cc(C)c1Cl, Cc1cc(Oc2ccc(/C=C3\\SC(=O)N([C@@H](Cc4ccccc4)C(=O)O)C3=O)cc2)cc(C)c1Cl, COC(=O)c1cccc(COc2ccc3[nH]c(SCC(=O)c4ccc(O)c(O)c4)nc3c2)c1" --uniprot_ids "Q07817, Q07820, P47871"

For uncertainty quantification, as well pKi prediction, we can use the predict_pKi_uncertainty.py file:

python predict_pki_uncertainty.py --smiles_list "Cc1cc(Oc2ccc(/C=C3\\SC(=O)N([C@@H](Cc4ccccc4)C(=O)O)C3=O)cc2)cc(C)c1Cl, Cc1cc(Oc2ccc(/C=C3\\SC(=O)N([C@@H](Cc4ccccc4)C(=O)O)C3=O)cc2)cc(C)c1Cl, COC(=O)c1cccc(COc2ccc3[nH]c(SCC(=O)c4ccc(O)c(O)c4)nc3c2)c1" --uniprot_ids "Q07817, Q07820, P47871" --data_split=scaffold --num_clusters=50 --iters=10 --scaling=True --seed=42 --checkpoint_dir="/results/logs/exp08-02-2024-05-02-20/"

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
code		code
environment		environment
mapping		mapping
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

eMOSAIC: Multi-modal Out-of-distribution Uncertainty Quantification Streamlines Large-scale Polypharmacology

Dataset

Environment

Usage

About

Releases

Packages

Languages

XieResearchGroup/eMOSAIC

Folders and files

Latest commit

History

Repository files navigation

eMOSAIC: Multi-modal Out-of-distribution Uncertainty Quantification Streamlines Large-scale Polypharmacology

Dataset

Environment

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages