PortalCG: Structure-enhanced Deep Meta-learning Predicts Billions of Uncharted Chemical-Protein Interactions on a Genome-scale
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
This repository provides script to replicate experiments in PortalCG paper.
Our paper put emphasize on proposing new OOD algorithm. We didn't carry out extensive tuning or cherry picking the best one set of model parameters. If you want to use the model for a specific application, we encourage you to follow the described training procedure and carry out model tuning in a swiping style for the best performance.
Necessary packages to be installed have been specified in environment_setup.txt
. Docker file can also be found in the folderenvironment/
You can download data and pretrained weights from here
NOTE: PortalCG has three steps with twice transfer learning. Three large databases are used, Pfam, PDB, ChEMBL, all fairly large. A full replication from scratch will take a month on a single GPU. Here, we provide demo with trained weights at final step as well as instructions to replicate from scratch in each step.
PortalCG has two major components, (a) STL and (b) OOC-ML with 3 steps in total. Pls run the 3 steps in order.
The first step is built on a published work, DISAE (published on JCIM), with replication instructions. In this step, a protein language model will be trained on Pfam with MSA-distilled triplets representation. The whole pfam knowlege will be tranfered to step2.
The second step will train on PDB dataset to predict binding site residue-atom contact map with protein descriptor pretrained in step1 and to be further tuned in this step.
- To run from scratch:
python train_DTI_distance.py --batch_size=128 --eval_at=200 --global_step=40000
In this final stage, there are 4 splits of data: OOD-train, iid-dev, OOD-dev,OOD-test. In this step, protein descriptor and chemical descriptor are pretrained in step2.
- To run for a short demo for only 20 steps with trained weights to check OOD-test AUC scores reported in Table 2:
python train_MAML_4split.py
- To run from scratch:
python train_MAML_4split.py --fr_scratch=True --global_step=60000 --global_eval_step=80