unimpl_tools issue about “molecule property prediction” #242

Golden-proteogenomics · 2024-06-27T01:18:05Z

hello:
I want to know this code in unimol_tools molecule property prediction `from unimol_tools import MolTrain, MolPredict
clf = MolTrain(task='classification',
data_type='molecule',
epochs=10,
batch_size=16,
metrics='auc',
)
pred = clf.fit(data = data)

currently support data with smiles based csv/txt file, and

custom dict of {'atoms':[['C','C],['C','H','O']], 'coordinates':[coordinates_1,coordinates_2]}

clf = MolPredict(load_model='../exp')
res = clf.predict(data = data)`.
This code is a api to use unimol that confuse me.
The thoer question is about one function "molecule property prediction" which why have many version code to do, however, all those not description to different.

Naplessss · 2024-06-27T03:32:53Z

MolTrain is used for training models with different types of data, including SMILES-based and 3D coordinates based. For example, in bioactivity prediction, you can use docking or FEP conformations as input, which is more suitable than SMILES based.
MolPredict provides prediction services using models trained with MolTrain. This means you can train your model with MolTrain and then use MolPredict for inference services.

Golden-proteogenomics · 2024-06-27T06:16:08Z

This is a error when I use this code to predict 'mol_test.csv'. The following is detail information. So, how can I do about this.

python shi.py 2024-06-27 06:08:16.615493: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-06-27 06:08:16.662706: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. Traceback (most recent call last): File "shi.py", line 22, in <module> clf = MolPredict(load_model='./weights') File "/sunjs/Softwares/Uni-Mol-main/unimol_tools/unimol_tools/predict.py", line 34, in __init__ self.config = YamlHandler(config_path).read_yaml() File "/sunjs/Softwares/Uni-Mol-main/unimol_tools/unimol_tools/utils/config_handler.py", line 24, in __init__ raise FileExistsError(OSError) FileExistsError: <class 'OSError'> python shi.py 2024-06-27 06:08:16.615493: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-06-27 06:08:16.662706: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. Traceback (most recent call last): File "shi.py", line 22, in <module> clf = MolPredict(load_model='./weights') File "/sunjs/Softwares/Uni-Mol-main/unimol_tools/unimol_tools/predict.py", line 34, in __init__ self.config = YamlHandler(config_path).read_yaml() File "/sunjs/Softwares/Uni-Mol-main/unimol_tools/unimol_tools/utils/config_handler.py", line 24, in __init__ raise FileExistsError(OSError) FileExistsError: <class 'OSError'>

Naplessss · 2024-06-28T07:16:52Z

you should load model from your save_path.
MolPredict(load_model='./exp')

Golden-proteogenomics · 2024-07-05T02:21:33Z

yes，the "./weights" is my models directory.

.
Would it be better to use the "./exp" directory based on your advice?
Or is there any other advice that I haven't considered?

Naplessss · 2024-07-05T02:58:47Z

Use './weights' for the initial pretrained weights, which are the default weights provided by UniMol. For your fine-tuned model weights, use './exp'. If you only need to utilize the representation capabilities of UniMol, you can simply use UniMolRepr:

from unimol_tools import UniMolRepr
# single smiles unimol representation
clf = UniMolRepr(data_type='molecule', remove_hs=False)
smiles = 'c1ccc(cc1)C2=NCC(=O)Nc3c2cc(cc3)[N+](=O)[O]'
smiles_list = [smiles]
unimol_repr = clf.get_repr(smiles_list, return_atomic_reprs=True)

if you want to train model with your own dataset, the best practice is:

fit your own data with MolTrain；
predict with your training model by use MolPredict load from your saving path, such as './exp' fold here.

Golden-proteogenomics · 2024-07-12T09:26:38Z

yes, I use that code
`
from unimol_tools import UniMolRepr

single smiles unimol representation

clf = UniMolRepr(data_type='molecule', remove_hs=False)
smiles = 'c1ccc(cc1)C2=NCC(=O)Nc3c2cc(cc3)N+[O]'
smiles_list = [smiles]
unimol_repr = clf.get_repr(smiles_list, return_atomic_reprs=True)
`
there is a error

,
this right？how

Naplessss · 2024-07-17T05:22:54Z

It seems the smiles is illegal for generate conformations

longkunxuluke · 2024-10-24T06:58:56Z

Hi @Naplessss , I have a similar question about how to do zero-shot property prediction with pretrained models using unimol_tools. For example, if I want to predict HOMO-LUMO gap of a bunch of molecules with their SMILES in a csv file, can I use a trained model (e.g., mol_pre_all_h_220816.pt in huggingface) without finetuning to directly make predictions? Is there any example code for this? Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unimpl_tools issue about “molecule property prediction” #242

unimpl_tools issue about “molecule property prediction” #242

Golden-proteogenomics commented Jun 27, 2024

Naplessss commented Jun 27, 2024

Golden-proteogenomics commented Jun 27, 2024

Naplessss commented Jun 28, 2024

Golden-proteogenomics commented Jul 5, 2024

Naplessss commented Jul 5, 2024

Golden-proteogenomics commented Jul 12, 2024

Naplessss commented Jul 17, 2024

longkunxuluke commented Oct 24, 2024

unimpl_tools issue about “molecule property prediction” #242

unimpl_tools issue about “molecule property prediction” #242

Comments

Golden-proteogenomics commented Jun 27, 2024

currently support data with smiles based csv/txt file, and

custom dict of {'atoms':[['C','C],['C','H','O']], 'coordinates':[coordinates_1,coordinates_2]}

Naplessss commented Jun 27, 2024

Golden-proteogenomics commented Jun 27, 2024

Naplessss commented Jun 28, 2024

Golden-proteogenomics commented Jul 5, 2024

Naplessss commented Jul 5, 2024

Golden-proteogenomics commented Jul 12, 2024

single smiles unimol representation

Naplessss commented Jul 17, 2024

longkunxuluke commented Oct 24, 2024