-
Notifications
You must be signed in to change notification settings - Fork 10
Descriptors notation
Pavel edited this page Jul 20, 2016
·
6 revisions
The name of descriptors consists of several sections separated by vertical bar symbol.
# | Reaction labela (optional) | Descriptor typeb | Descriptor valuec | (reserved) | (reserved) | Number of atoms | (reserved) | (reserved) | Atom labelingd | SMILES | |
---|---|---|---|---|---|---|---|---|---|---|---|
Possible values | p/r/pr | S/M | n/p | integer | elm/none/... | SMILES | |||||
Examples | |||||||||||
1 | ` | S | n | 6 | none | A-A(-A)-A.A=A` | |||||
2 | ` | M | n | 5 | elm | C.C(-C=O)-H` | |||||
3 | `pr | M | p | 4 | elm | C-C-C.O` | pr |
a p or r is product or reactant mixture descriptor, pr is a difference between products and reactants mixture descriptors.
b S means a single compound descriptor, M - a mixture descriptor.
a n means the count of fragments (weighted by compound amount in case of mixture descriptors), p means probability of fragment occurring (calculated as a ratio of a descriptor value to sum of all descriptors. Note: single and mixture descriptors processed separately).
d elm means atoms labeled by element, none - no labels to encode just topology of molecules, other values are allowed based on used defined atom properties in the input sdf-file.
Examples description:
- Descriptor denotes the number of six atom fragments in a single compound (or a component of a mixture) disregarding atom labels (topology fragment descriptor).
- Descriptor denotes the number of concurrence of fragments C and C(C=O)H belonging to two different components in a mixture.
- Descriptor denotes the difference between product and reactant relative occurrence frequency of two fragments CCC and O belonging to two different reactants and two different products (mixtures of reactants and products considered separately).