Reproduction of table 1 #2

mrTsjolder · 2023-05-05T15:23:57Z

I stumbled upon this paper and would like to reproduce some of the results in table 1.
However, when running the code as indicated in the README, values seem to be quite off.
Should it be possible to reproduce the results in table 1 with this codebase?
If yes, what arguments are necessary to get these results.

Concretely, I tried to reproduce the ZINC results by following the README (as close as possible).
After setting up the environment and downloading the zinc250k.csv file from moflow, I was able to run the data_preprocess.py script.

After downloading the models, I managed to run the following scripts (if I remember correctly):

python chemspace.py --gpu 0 --data_name zinc250k --random
python train_boundary_zinc.py
python chemspace.py --gpu 0 --data_name zinc250k --traverse

However, it might be that I already had to fix the mflow import statements at this stage and ran the generate_prop_ranges.py script at this point.

After creating the zinc250k.txt file from zinc250k.csv and after running generate_prop_ranges.py I should have been able to run calculate_statistics_single_prop.py --mani_range 1, although this also might have required some changes to the original code already.

After some further modifications (most notably by creating directories that were missing for the code to work), I also managed to run the random and largest baselines as follows:

python chemspace.py --gpu 0 --data_name zinc250k --traverse --baseline random
python chemspace.py --gpu 0 --data_name zinc250k --largest
python chemspace.py --gpu 0 --data_name zinc250k --traverse --baseline largest

which allowed me to run calculate_statiscs_single_prop.py on these baselines as well.

All of this eventually provided me with the following results:

QED	strict	relaxed local	relaxed global
random	12.5	15.0	18.0
largest	17.0	18.0	24.5
chemspace	69.0	69.0	73.5

whereas table 1 (together with tables 5 and 6) in the paper seems to suggest something closer to

QED	strict	relaxed local	relaxed global
random	1.5	3.5	6.0
largest	1.5	3.0	4.5
chemspace	52.0	53.5	57.0

Any chance you could provide me with some papers (or explain the discrepancies)?

The text was updated successfully, but these errors were encountered:

yuanqidu · 2023-05-12T16:21:15Z

Thanks for your interest in our paper! We have refactored the code before we release it. From first glance the results make sense that ChemSpacE outperforms the baseline methods by a large margin as they are very simple. I will try to find some time to look through it but I think the results are not very surprising despite different than what we reported in the paper.

mrTsjolder changed the title ~~Commandseproduction of table 1~~ Reproduction of table 1 May 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproduction of table 1 #2

Reproduction of table 1 #2

mrTsjolder commented May 5, 2023

yuanqidu commented May 12, 2023

Reproduction of table 1 #2

Reproduction of table 1 #2

Comments

mrTsjolder commented May 5, 2023

yuanqidu commented May 12, 2023