Skip to content

Latest commit

 

History

History
137 lines (105 loc) · 4.97 KB

README.md

File metadata and controls

137 lines (105 loc) · 4.97 KB

In silico derivatization

Overview

This package performs in-silico MeOX + TMS derivatization (as described e.g. in https://doi.org/10.1021/acs.analchem.7b01010):

  • Methoximation: ketone R(C=O)R' and aldehyde (-HC=O) carbonyl groups are substituted with -C=NOCH3
  • Trimethylsilylation: the acidic hydrogen in -OH, -SH, -COOH, -NH2, -NHR, =NH, the hydrogen is substituted with -Si(CH3)3 The substitution doesn't always have to happen; its probability is currently hardcoded in the package. Typically, multiple substitution attempts are run on each input molecule, and all distinct results are gathered.

Known limitation is methoximation on cycles which should be broken. This is not implemented yet.

Installation

There are a few ways to install gc-meox-tms:

  1. Install in a new conda environment (recommended):
$ conda create -n gc-meox-tms -c bioconda gc-meox-tms
$ conda activate gc-meox-tms
  1. Install from pip:
$ pip install gc-meox-tms
  1. From source by cloning the repository and installing the package with pip as follows:
$ git clone https://github.com/RECETOX/gc-meox-tms.git

# install the package:
$ python -m pip install gc-meox-tms

# if you want to run examples in the Jupyter notebook, install with this command:
$ python -m pip install gc-meox-tms[eda]

Usage

Command-Line Tool

gc-meox-tms can be used as a command line tool to produce all MeOX/TMS derivatives of given compounds. To use it via the command line you will need one or more txt files with chemical compounds represented as SMILES (one SMILES per line). The tool can output results in flat txt format(one compound per line) or tab separated tsv format (all derivatives of a given molecule per line).

$ python -m gc_meox_tms \
-f <path to write flat txt result> \
-t <path to write tab separated result> \
<paths to input txt files>

More parameters can be specified, such as number of cores or repeats. For more information run:

$ python -m gc_meox_tms --help

Python Package

Package provides functions:

  • is_derivatized() checks whether the molecule contains MeOX or TMS groups that are likely to be result of derivatization
  • remove_derivatization_groups() removes the suspected groups, reconstructing the original molecule
  • add_derivatization_groups() does the substitution above
from gc_meox_tms import add_derivatization_groups, is_derivatized, remove_derivatization_groups
from rdkit.Chem import MolToSmiles

# Example compounds in SMILES format
compounds = ["CC=O", "CC=NOC", "CCO[Si](C)(C)C"]

# Check derivatization
[is_derivatized(smiles=smiles) for smiles in compounds]
>>> [False, True, True]

# Remove derivatization groups from derivatized molecules
underivatized = [remove_derivatization_groups(smiles=smiles) for smiles in compounds[1:]]
print([MolToSmiles(mol) for mol in underivatized])
>>> ["CC=O", "CCO"]

# Convert molecules back to derivatized forms
rederivatized = [add_derivatization_groups(mol=mol) for mol in underivatized]
print([MolToSmiles(mol) for mol in rederivatized])
>>> ['CC=NOC', 'CCO[Si](C)(C)C']

Note that your results may differ from the presented since add_derivatization_groups is not deterministic. If you rerun the function enough times you will get all possible derivatizations. The number of reruns to obtain all possible conformations is individual for each compound (depends on possible conversion degrees etc.).

See also the Jupyter notebook in example/ directory for more examples.

Developer documentation

Installation

Create a virtual environment of your choice (e.g., conda or venv). The development version can be installed with conda or pip as follows:

# 1. Fork and clone the repository
$ git clone https://github.com/<YOUR_GITHUB_USERNAME>/gc-meox-tms.git
$ cd gc-meox-tms

# 2a. To create a conda env run from the package directory:
$ conda env create -f conda/environment-dev.yaml
$ conda activate gc-meox-tms-dev

# 2b. Alternatively, install using python venv:
$ python3 -m venv gc-meox-tms-dev
$ source gc-meox-tms-dev/bin/activate
$ pip install -e .[dev]

Contributing

Before opening a PR make sure all the tests are passing by running pytest from within the package directory:

$ pytest

It may happen that some tests which are dependent on probabilistic logic may fail. If that occurs, try rerunning the tests. Usually one rerun is enough.

We strongly advise you to add new tests for the functionality that you want to contribute. If you want to check whether your changes are covered with tests, run $ pytest --cov and examine the output to see what parts may need better test coverage.

Run linter, to make sure all is nicely formatted:

$ flake8

# if you use venv, exclude venv directory from linting
$ flake8 --exclude 'gc-meox-tms-dev'

Lastly make sure the Python imports are in a proper order:

$ isort gc_meox_tms