Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unimpl_tools issue about “molecule property prediction” #242

Open
Golden-proteogenomics opened this issue Jun 27, 2024 · 8 comments
Open

Comments

@Golden-proteogenomics
Copy link

hello:
I want to know this code in unimol_tools molecule property prediction `from unimol_tools import MolTrain, MolPredict
clf = MolTrain(task='classification',
data_type='molecule',
epochs=10,
batch_size=16,
metrics='auc',
)
pred = clf.fit(data = data)

currently support data with smiles based csv/txt file, and

custom dict of {'atoms':[['C','C],['C','H','O']], 'coordinates':[coordinates_1,coordinates_2]}

clf = MolPredict(load_model='../exp')
res = clf.predict(data = data)`.
This code is a api to use unimol that confuse me.
The thoer question is about one function "molecule property prediction" which why have many version code to do, however, all those not description to different.

@Naplessss
Copy link
Collaborator

MolTrain is used for training models with different types of data, including SMILES-based and 3D coordinates based. For example, in bioactivity prediction, you can use docking or FEP conformations as input, which is more suitable than SMILES based.
MolPredict provides prediction services using models trained with MolTrain. This means you can train your model with MolTrain and then use MolPredict for inference services.

@Golden-proteogenomics
Copy link
Author

This is a error when I use this code to predict 'mol_test.csv'. The following is detail information. So, how can I do about this.
图片
python shi.py 2024-06-27 06:08:16.615493: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-06-27 06:08:16.662706: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. Traceback (most recent call last): File "shi.py", line 22, in <module> clf = MolPredict(load_model='./weights') File "/sunjs/Softwares/Uni-Mol-main/unimol_tools/unimol_tools/predict.py", line 34, in __init__ self.config = YamlHandler(config_path).read_yaml() File "/sunjs/Softwares/Uni-Mol-main/unimol_tools/unimol_tools/utils/config_handler.py", line 24, in __init__ raise FileExistsError(OSError) FileExistsError: <class 'OSError'> python shi.py 2024-06-27 06:08:16.615493: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-06-27 06:08:16.662706: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. Traceback (most recent call last): File "shi.py", line 22, in <module> clf = MolPredict(load_model='./weights') File "/sunjs/Softwares/Uni-Mol-main/unimol_tools/unimol_tools/predict.py", line 34, in __init__ self.config = YamlHandler(config_path).read_yaml() File "/sunjs/Softwares/Uni-Mol-main/unimol_tools/unimol_tools/utils/config_handler.py", line 24, in __init__ raise FileExistsError(OSError) FileExistsError: <class 'OSError'>

@Naplessss
Copy link
Collaborator

you should load model from your save_path.
MolPredict(load_model='./exp')

@Golden-proteogenomics
Copy link
Author

yes,the "./weights" is my models directory.
图片
.
Would it be better to use the "./exp" directory based on your advice?
Or is there any other advice that I haven't considered?

@Naplessss
Copy link
Collaborator

Use './weights' for the initial pretrained weights, which are the default weights provided by UniMol. For your fine-tuned model weights, use './exp'. If you only need to utilize the representation capabilities of UniMol, you can simply use UniMolRepr:

from unimol_tools import UniMolRepr
# single smiles unimol representation
clf = UniMolRepr(data_type='molecule', remove_hs=False)
smiles = 'c1ccc(cc1)C2=NCC(=O)Nc3c2cc(cc3)[N+](=O)[O]'
smiles_list = [smiles]
unimol_repr = clf.get_repr(smiles_list, return_atomic_reprs=True)

if you want to train model with your own dataset, the best practice is:

  1. fit your own data with MolTrain;
  2. predict with your training model by use MolPredict load from your saving path, such as './exp' fold here.

@Golden-proteogenomics
Copy link
Author

yes, I use that code
`
from unimol_tools import UniMolRepr

single smiles unimol representation

clf = UniMolRepr(data_type='molecule', remove_hs=False)
smiles = 'c1ccc(cc1)C2=NCC(=O)Nc3c2cc(cc3)N+[O]'
smiles_list = [smiles]
unimol_repr = clf.get_repr(smiles_list, return_atomic_reprs=True)
`
there is a error
图片
,
this right?how

@Naplessss
Copy link
Collaborator

It seems the smiles is illegal for generate conformations

@longkunxuluke
Copy link

Hi @Naplessss , I have a similar question about how to do zero-shot property prediction with pretrained models using unimol_tools. For example, if I want to predict HOMO-LUMO gap of a bunch of molecules with their SMILES in a csv file, can I use a trained model (e.g., mol_pre_all_h_220816.pt in huggingface) without finetuning to directly make predictions? Is there any example code for this? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants