This follows the guideline on keep a changelog
- Fragment generators using transformers
- CAZP scoring component
- Filename issue on Windows which lead to termination
- General scoring component for all 210 RDKit descriptors
- Optional cwd for run_command()
- Collect all names for remote monitoring*
- Pass data to request as dict rather than a JSON string
- Pair generator multiprocessing in TL is supported on Linux, Windows, and MacOS
- The number of cpus is optional and could be specified in the toml/json configuration file through the parameter
number_of_cpus
- Added missing second mask in inception call to scoring function
- Fixed cuda out of memory for Reinvent batch sampling
- Handle the case when there are no non-cached SMILES and thus the scpring function does not need to run.
- Improved type safety in
value_mapping
- Number of cpus can be specified in toml/json config files for TL jobs
- Check for CUDA before checking GPU memory otherwise will fail on CPU
- Removed obsolete code which broke TL with Reinvent
- Windows support: correct signal handling
- Scoring component MolVolume to compure molecular volume via RDKit
- Minimal SMILES pre-processing for scoring to allow keeping of stereochemistry and only choose largest fragment, and use the general RDKit cleanup/sanitation/hydrogen. Skip heavy filtering on molecules size, allowed atoms, tokens, vocabulary, etc. This faciliates situation where only scoring is desired.
- Allow zero weights to only display a component score. This will have no effect on aggregation but the component score is still computed. So, be careful with computationally expensive components.
- Flag to purge diversity filter memories after each staged learning stage. This is useful in multiple stage runs and is equivalent to
use_checkpoint
for single stage reruns.
- The CSV file from RL has controlled output precision: 7 for total score and transformed scores, 4 for all other floating point values
- Critical: all scores of duplicate SMILES including the first occurence where set to zero rather than the computed value
- Scores of duplicates are now correctly copied over from first occurence
- All tests support both cpu and gpu
- Contidional import of
resource
to allow running on Windows
- Some rudimentary information on GPU memory usage in staged learning
- Bug in sampling related to the way the sampled.nlls object was treated. Now it is always pytorch tensor object without gradient on cpu
- Moved TPSA to separate component to enable TPSA calculation for polar S and P, the original RDKit implementation does not consider those atoms and the default is still to leave those out from TPSA calculation
- Issue with NaNs being in the raw and transformed code that would not allow to compute the mean
- Explicit serialization of JSON string because the internal one from requests may fail
- Added a patch to fix a bug on the native implementation of Pytorch related to the histogram functionality of the Tensorboard report
- Added a check which raises an exception if the user enters scaffolds with multiple attachement points connected to the same atom (Libinvent, Linkinvent). This will be lifted in a future update
- Fixed report format (TL)
- Normalize SMILES before passing to string-based scoring component because the SMILES may still contain lables (Libinvent)
- Fixed fragment effective length ratio error when fragment has single atom (e.g. "[]N[]") with max graph length=0
- For Compound Sampling, removed explicit dependency on matplotlib: report histgram and scatter plot to tensorboard only if matplotlib is available.
- For Compound Sampling, introduced a new parameter unique_molecules to do canonicalize smiles deduplication instead of using sequemce deduplication which can be confusing
- warn if sequence deduplication is requested
- TL integration test fixed (no impact on GUI or core)
- TL reporting of epochs
- multiple scoring components are reported as means again rather than as lists
- fixed graph length fragment component
- fixes for fragment scoring components
- scores reported for filters and penalties as in REINVENT3
- initial release of REINVENT4
- RL rewards MASCOF, MAULI, SDAP (inefficient in practice)
- sequence deduplication (interferes with diversity filter and SMILES deduplication)