ProteinSolver is a deep neural network which learns to solve (ill-defined) constraint satisfaction problems (CSPs) from training data. It has shown promising results both on a toy problem of learning how to solve Sudoku puzzles and on a real-world problem of designing protein sequences that fold into a predetermined geometric shape.
The following notebooks can be used to explore the basic functionality of proteinsolver
.
Other notebooks in the notebooks/
directory show how to perform more extensive validations of the networks and how to train new networks.
Docker images with all required dependencies are provided at: https://gitlab.com/ostrokach/proteinsolver/container_registry.
To evaluate a proteinsolver network from a Jupyter notebook, we can run the following:
docker run -it --rm -p 8000:8000 registry.gitlab.com/ostrokach/proteinsolver:v0.1.25 jupyter notebook --ip 0.0.0.0 --port 8000
We recommend installing proteinsolver
into a clean conda environment using the following command:
conda create -n proteinsolver -c pytorch -c conda-forge -c kimlab -c ostrokach-forge proteinsolver
conda activate proteinsolver
First, use conda
to install proteinsolver
into a new conda environment. This will also install all dependencies.
conda create -n proteinsolver -c pytorch -c conda-forge -c kimlab -c ostrokach-forge proteinsolver
conda activate proteinsolver
Second, run pip install --editable .
inside the root directory of this package. This will force Python to use the development version of our code.
cd path/to/proteinsolver
pip install --editable .
Pre-trained models can be downloaded using wget
by running the following command in the root folder of the proteinsolver
repository:
wget -r -nH --cut-dirs 1 --reject "index.html*" "http://models.proteinsolver.org/v0.1/"
For an example of how to use a pretrained ProteinSolver models in downstream applications (such as mutation ΔΔG prediction), see the elaspic/elaspic2
repository, and in particular the src/elaspic2/plugins/proteinsolver
module.
Data used to train and validate the "proteinsolver" network to solve Sudoku puzzles and reconstruct protein sequences can be downloaded from http://deep-protein-gen.data.proteinsolver.org/:
wget -r -nH --reject "index.html*" "http://deep-protein-gen.data.proteinsolver.org/"
The generation of the training and validation datasets was carried out in our predecessor project: ostrokach/protein-adjacency-net
.
DATAPKG_DATA_DIR
- Location of training and validation data.
- Strokach A, Becerra D, Corbi-Verge C, Perez-Riba A, Kim PM. Fast and flexible protein design using deep graph neural networks. Cell Systems (2020); 11: 1–10. doi: 10.1016/j.cels.2020.08.016