Skip to content

Descriptors notation

Pavel edited this page Jul 20, 2016 · 6 revisions

The name of descriptors consists of several sections separated by vertical bar symbol.

# Reaction labela (optional) Descriptor typeb Descriptor valuec (reserved) (reserved) Number of atoms (reserved) (reserved) Atom labelingd SMILES
Possible values p/r/pr S/M n/p integer elm/none/... SMILES
Examples
1 ` S n 6 none A-A(-A)-A.A=A`
2 ` M n 5 elm C.C(-C=O)-H`
3 `pr M p 4 elm C-C-C.O` pr

a p or r is product or reactant mixture descriptor, pr is a difference between products and reactants mixture descriptors.
b S means a single compound descriptor, M - a mixture descriptor.
a n means the count of fragments (weighted by compound amount in case of mixture descriptors), p means probability of fragment occurring (calculated as a ratio of a descriptor value to sum of all descriptors. Note: single and mixture descriptors processed separately).
d elm means atoms labeled by element, none - no labels to encode just topology of molecules, other values are allowed based on used defined atom properties in the input sdf-file.

Examples description:

  1. Descriptor denotes the number of six atom fragments in a single compound (or a component of a mixture) disregarding atom labels (topology fragment descriptor).
  2. Descriptor denotes the number of concurrence of fragments C and C(C=O)H belonging to two different components in a mixture.
  3. Descriptor denotes the difference between product and reactant relative occurrence frequency of two fragments CCC and O belonging to two different reactants and two different products (mixtures of reactants and products considered separately).
Clone this wiki locally