MolPLA

Abstract (submitted to ISMB 2024 Proceedings Track)

Molecular core structures and R-groups are essential concepts in drug development. Integration of these concepts with conventional graph pre-training approaches can promote deeper understanding in molecules. We propose MolPLA, a novel pre-training framework that employs masked graph contrastive learning in understanding the underlying decomposable parts in molecules that implicate their core structure and peripheral R-groups. Furthermore, we formulate an additional framework that grants MolPLA the ability to help chemists find replaceable R-groups in lead optimization scenarios. Experimental results on molecular property prediction show that MolPLA exhibits predictability comparable to current state-of-the-art models. Qualitative analysis implicate that MolPLA is capable of distinguishing core and R-group sub-structures, identifying decomposable regions in molecules and contributing to lead optimization scenarios by rationally suggesting R-group replacements given various query core templates.

How to run the experiments

Step 1. Edit the confguration file settings.yaml.

example:
  dev_mode: 
    debugging: 
    toy_test: 

  wandb:
    project_name: example_project_name
    session_name: example_session_name
    group_name: 

  ddp:
    port: 13000

  path:
    dataset: /path/to/folder/named/datasets
    checkpoint: /path/to/folder/named/checkpoints

  dataprep:
    dataset: geom
    version: v11
    subsample: 1.0

  experiment:
    testing_mode: false
    random_seed: 911012
    which_best: loss

  model_params: 
    model_type: molpla
    hidden_dim: 300
    dropout_rate: 0.0
    graph_encoder: GNN
    gnn_params:
      aggr: add
      JK: concat
      gnn_type: gin
      num_layer: 3

    graph_pooling: add
    graph_projector: mlp
    link_decoder: mlp

    stop_gradient_arms:    False
    stop_gradient_core:    False
    separate_linker_nodes: False

    prop_conditioned: arms

    faiss_metric: inner_product

  train_params:
    batch_size: 4096
    num_epochs: 200

    optimizer: adam
    scheduler: CyclicLR

    learning_rate: 0.00001
    weight_decay: 0.0

    early_stopping: loss
    early_patience: 30

    pretraining:
      main_graph_contrastive:
        loss_coef: 0.1
        score_func: dualentropy
        tau: 0.1
      dcpd_graph_contrastive:
        loss_coef: 0.1
        score_func: dualentropy
        tau: 0.05
      linker_node_contrastive:
        loss_coef: 0.8
        score_func: dualentropy
        tau: 0.01

example_bench:
  dataprep:
    dataset: 
    version: 
    subsample:

  experiment:
    testing_mode: false
    random_seed: 8888
    which_best: loss

  model_params:
    dropout_rate: 0.1

  train_params:
    batch_size: 256
    num_epochs: 100

    optimizer: adam
    scheduler: dummy

    learning_rate: 0.0001
    weight_decay: 0.0

    early_stopping: 
    early_patience: 100

    finetuning:
      from_pretrained:   pretrained_geom_v11
      freeze_pretrained: False

Possible arguments for
- example.model_params.model_type: molpla
- example.train_params.scheduler: dummy, CyclicLR
- example.train_params.pretraining.linker_node_contrastive: dualentropy
All experiment reports are uploaded to your WANDB account.
You can download the datasets from our Google Drive. Current version is v11.

Step 2. Run the following script

python run.py -sn main -mg {GPU indices separated by comma}

This script will pretrain the molecule representation model and then perform benchmark experiments (finetune-and-test) on various molecule property prediction datasets including freesolv, lipophilicity, esol, toxcast, tox21, sider, bbbp, bace and clintox.
If you want to skip the pretraining phase, add -sp to the above script.
If you want to run only the pretraining code to either adjust the hyperparameters or look into the R-Group Retrieval Task, run this code instead.

python run_pretrain.py -sn example -mg {GPU indices separated by comma}

Downloading the Preprocessed Dataset and Pretrained Model

The dataset contains all pre-processed data that was used to pre-train MoLPLA and perform benchmark test on molecule property prediction. GOOGLE DRIVE DOWNLOAD LINK
This repository in Google Drive contains all the files including the model checkpoints containing pre-trained parameters. Note that you might have to edit the directory configuration inside model_config.pkl. GOOGLE DRIVE DOWNLOAD LINK

Contributors

Name	Affiliation	Email
Mogan Gim†	Data Mining and Information Systems Lab, Korea University, Seoul, South Korea	[email protected]
Jueon Park†	Data Mining and Information Systems Lab, Korea University, Seoul, South Korea	[email protected]
Soyon Park	Data Mining and Information Systems Lab, Korea University, Seoul, South Korea	[email protected]
Sanghoon Lee	Data Mining and Information Systems Lab, Korea University, Seoul, South Korea	[email protected]
Seungheun Baek	Data Mining and Information Systems Lab, Korea University, Seoul, South Korea	[email protected]
Junhyun Lee	Data Mining and Information Systems Lab, Korea University, Seoul, South Korea	[email protected]
Ngoc-Quang Nguyen	Data Mining and Information Systems Lab, Korea University, Seoul, South Korea	[email protected]
Jaewoo Kang*	Data Mining and Information Systems Lab, Korea University, Seoul, South Korea	[email protected]

†: Equal Contributors
*: Corresponding Author

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
figures		figures
src		src
.gitignore		.gitignore
README.md		README.md
maven.yml		maven.yml
run.py		run.py
run_experiments.sh		run_experiments.sh
settings.yaml		settings.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MolPLA

Abstract (submitted to ISMB 2024 Proceedings Track)

How to run the experiments

Step 1. Edit the confguration file settings.yaml.

Step 2. Run the following script

Downloading the Preprocessed Dataset and Pretrained Model

Contributors

About

Releases

Packages

Languages

dmis-lab/MolPLA

Folders and files

Latest commit

History

Repository files navigation

MolPLA

Abstract (submitted to ISMB 2024 Proceedings Track)

How to run the experiments

Step 1. Edit the confguration file settings.yaml.

Step 2. Run the following script

Downloading the Preprocessed Dataset and Pretrained Model

Contributors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages