Official implementation of ICLR2024 Oral Unified Generative Modeling of 3D Molecules with Bayesian Flow Networks
Please refer to our recent work for applying GeoBFN on Structure-based Drug Design(SBDD) at MolCRAFT: Structure-Based Drug Design in Continuous Parameter Space (ICML2024) with code available at https://github.com/AlgoMole/MolCRAFT.
You will need to have a host machine with gpu, and have a docker with nvidia-container-runtime
enabled.
Tip
- This repo provide an easy to use script to install docker and nvidia-container-runtime, in
./GeoBFN/docker
runsudo ./setup_docker_for_host.sh
to setup your host machine. - You can also refer to install guide if you don't have them installed.
Clone the repo with git clone
,
git clone https://github.com/AlgoMole/GeoBFN.git
setup environment with docker,
cd ./GeoBFN/docker
make # a make is all you need
Note
-
The
make
will automatically build the docker image and run the container. with your host home directory mounted to the${HOME}/home
directory inside the container. highly recommended -
If you need to setup the environment manually, please refer to files
docker/Dockerfile
,docker/asset/requirements.txt
anddocker/asset/apt_packages.txt
.
Inside container, find path to your repo. inside GeoBFN/
run
make -f train.mk
Note
- this command will automatically attempt to download dataset if not exist, and run training script
python geobfn_train.py --config_file configs/bfn4molgen.yaml --epochs 3000
on a default gpu, if you want to change the default gpu, you runexport CUDA_VISIBLE_DEVICES=<gpu_id>
before themake -f train.mk
command. - Comment/delete the
--no_wandb
option intrain.mk
if you want to use wandb to log the training process. You probably will be prompted to enter your wandb api key.
Caution
-
You could encounter connection error if your server is in China, you can manually download the dataset from baidu netdisk and put it in
./GeoBFN
directory withscp <path/to/local/qm9.tar.gz> <username>@<remotehost>:<path/to/remote/GeoBFN/>
. run the script block again after the dataset is downloaded. -
Alternatively you can use a proxy to alow the script download the dataset automatically.
Tip
-
Better run the training command inside a tmux session, as it takes long time to finish training.
-
exiting from container wound't stop the container, run
make
from host atGeoBFN/docker
to log in the running container again. if really need to kill the container runmake kill
fromGeoBFN/docker
.
If you find the idea or code useful for your research, please consider citing
@article{song2024unified,
title={Unified Generative Modeling of 3D Molecules via Bayesian Flow Networks},
author={Song, Yuxuan and Gong, Jingjing and Qu, Yanru and Zhou, Hao and Zheng, Mingyue and Liu, Jingjing and Ma, Wei-Ying},
journal={arXiv preprint arXiv:2403.15441},
year={2024}}