Transformer-model-for-prediction-in-low-chemical-data-regimes

This is the code for "Data augmentation and transfer learning strategies for reaction prediction in low chemical data regimes" paper. The preprint of this paper can be found in ChemRxiv with https://doi.org/10.26434/chemrxiv.13383275.v1

Python 2.7

Tensorflow 1.11

RDkit 2019.03.4

Dataset

The dataset we used is named as general chemical reaction dataset, which contains approximately 380,000 chemical reactions. These reaction examples were originally sourced from Lowe's dataset, which were extracted from United States Patent and Trademark Office (USPTO) patents, and then subjected to a collection of pre-reatments in which all the reagents and conditions were deleted. The input data for training and validation was in the tmp folder.

Generate data

We preprocess the input data by running the datagen.sh script, and put the output data in the t2t_data folder.

Data augmentation

We use a Python program data_augmentation.py to perform data augmentation on the training data set of the Baeyer Villiger reaction data set with the SMILES form.

Train

Model use the train.sh script to start training.

Test

Model use the decode.sh script to start testing.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Transfer -learing-data		Transfer -learing-data
USPTO-380K		USPTO-380K
bin		bin
data_generators		data_generators
insights		insights
layers		layers
mesh_tensorflow		mesh_tensorflow
models		models
rl		rl
serving		serving
utils		utils
venv		venv
visualization		visualization
README.md		README.md
__init__.py		__init__.py
__init__.pyc		__init__.pyc
average_ckpts.sh		average_ckpts.sh
ckpt_lookup.py		ckpt_lookup.py
data augmentation.py		data augmentation.py
datagen.sh		datagen.sh
datagen_mrpc.sh		datagen_mrpc.sh
decode.sh		decode.sh
decodecopy.sh		decodecopy.sh
problems.py		problems.py
problems.pyc		problems.pyc
train.sh		train.sh
train_nli.sh		train_nli.sh
traincopy.sh		traincopy.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transformer-model-for-prediction-in-low-chemical-data-regimes

Python 2.7

Tensorflow 1.11

RDkit 2019.03.4

Dataset

Generate data

Data augmentation

Train

Test

About

Releases

Packages

Languages

hongliangduan/Transformer-model-for-prediction-in-low-chemical-data-regimes

Folders and files

Latest commit

History

Repository files navigation

Transformer-model-for-prediction-in-low-chemical-data-regimes

Python 2.7

Tensorflow 1.11

RDkit 2019.03.4

Dataset

Generate data

Data augmentation

Train

Test

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages