Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Big smiles #358

Closed
wants to merge 25 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
40b89af
infrastructure for big smile parsing
fgrunewald Jan 19, 2024
05ed045
optional dep. for pysmiles
fgrunewald Jan 19, 2024
82a2acc
add a processor that reads a big smile string and returns a full meta…
fgrunewald Jan 19, 2024
257b76b
atest-big-smile parsing part I
fgrunewald Jan 20, 2024
061c8ef
fix hcount for single atom; fix nexted branches
fgrunewald Jan 22, 2024
20e2e49
tests for smile iter and test nested branches
fgrunewald Jan 22, 2024
f505129
add pysmiles to test requrm.
fgrunewald Jan 22, 2024
0c67ecc
add tests for bonding descriptor evaluation
fgrunewald Jan 22, 2024
52235c9
add tests for big smile molecule prc
fgrunewald Jan 23, 2024
9a0a674
allow multiple bonding per atom; fix bugs
fgrunewald Jan 23, 2024
ceccc3d
remove mpl import
fgrunewald Jan 24, 2024
158fd37
add changed tests for multiple bonding per atom
fgrunewald Jan 24, 2024
8f2887f
delete old processor file
fgrunewald Jan 24, 2024
7cb3b4c
add H and ] as special characters in big smile parser
fgrunewald Mar 3, 2024
097ec84
account for explicit hydrogen in the smiles string input
fgrunewald Mar 3, 2024
514ba1b
test accounting for explicit hydrogen in the smiles string input
fgrunewald Mar 3, 2024
d97632d
adjust doc string
fgrunewald Mar 4, 2024
c4f1652
parse force-field in molprocessor, adjust hydrogen reconstruction
fgrunewald Mar 6, 2024
7a5dd1f
fix tests
fgrunewald Mar 6, 2024
2b9e7a9
Apply suggestions from code review
fgrunewald Mar 6, 2024
b6d891f
allow nested branch expansion
fgrunewald Mar 7, 2024
a867329
test branch expansion
fgrunewald Mar 7, 2024
b6f5cc0
add comments all over residue expansion functions
fgrunewald Mar 7, 2024
f965e1d
address comments
fgrunewald Mar 7, 2024
0335956
allow for ionic bonds with . syntax
fgrunewald Mar 7, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
143 changes: 143 additions & 0 deletions polyply/src/big_smile_mol_processor.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
import re
import networkx as nx
import pysmiles
from polyply.src.big_smile_parsing import (res_pattern_to_meta_mol,
force_field_from_fragments)
from polyply.src.map_to_molecule import MapToMolecule

VALENCES = pysmiles.smiles_helper.VALENCES
VALENCES.update({"H":(1,)})
fgrunewald marked this conversation as resolved.
Show resolved Hide resolved

def compatible(left, right):
"""
Check bonding descriptor compatibility according
to the BigSmiles syntax convetions.

Parameters
----------
left: str
right: str

Returns
-------
bool
"""
if left == right and left not in '> <':
return True
l, r = left[0], right[0]
if (l, r) == ('<', '>') or (l, r) == ('>', '<'):
return left[1:] == right[1:]
return False

def generate_edge(source, target, bond_attribute="bonding"):
"""
Given a source and a target graph, which have bonding
descriptors stored as node attributes, find a pair of
matching descriptors and return the respective nodes.
The function also returns the bonding descriptors. If
no bonding descriptor is found an instance of LookupError
is raised.

Parameters
----------
source: :class:`nx.Graph`
target: :class:`nx.Graph`
bond_attribute: `abc.hashable`
under which attribute are the bonding descriptors
stored.

Returns
-------
((abc.hashable, abc.hashable), (str, str))
the nodes as well as bonding descriptors

Raises
------
LookupError
if no match is found
"""
source_nodes = nx.get_node_attributes(source, bond_attribute)
target_nodes = nx.get_node_attributes(target, bond_attribute)
for source_node in source_nodes:
for target_node in target_nodes:
#print(source_node, target_node)
bond_sources = source_nodes[source_node]
bond_targets = target_nodes[target_node]
for bond_source in bond_sources:
for bond_target in bond_targets:
#print(bond_source, bond_target)
if compatible(bond_source, bond_target):
return ((source_node, target_node), (bond_source, bond_target))
fgrunewald marked this conversation as resolved.
Show resolved Hide resolved
raise LookupError

Check warning on line 71 in polyply/src/big_smile_mol_processor.py

View check run for this annotation

Codecov / codecov/patch

polyply/src/big_smile_mol_processor.py#L71

Added line #L71 was not covered by tests

class DefBigSmileParser:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Def?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of class? I like the class organization a bit more

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why is it called DefBigSmileParser? The Def looks strange to me. Also, "smiles" with s ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh yes; So the thing is that the format is not exactly a BigSMILE. This is a leftover (indeed good catch) of the original idea to call it Defined-Big-SMILE in short DefBigSmile but I think we're gravitating towards another name now. I'll leave this open and once it's decided adjust it

"""
Parse an a string instance of a defined BigSmile,
which describes a polymer molecule.
"""

def __init__(self, force_field):
self.force_field = force_field
self.meta_molecule = None
self.molecule = None

def edges_from_bonding_descrpt(self):
"""
Make edges according to the bonding descriptors stored
in the node attributes of meta_molecule residue graph.
If a bonding descriptor is consumed it is removed from the list,
however, the meta_molecule edge gets an attribute with the
bonding descriptors that formed the edge. Later uncomsumed
fgrunewald marked this conversation as resolved.
Show resolved Hide resolved
bonding descriptors are replaced by hydrogen atoms.
"""
for prev_node, node in nx.dfs_edges(self.meta_molecule):
prev_graph = self.meta_molecule.nodes[prev_node]['graph']
node_graph = self.meta_molecule.nodes[node]['graph']
edge, bonding = generate_edge(prev_graph,
node_graph)
# this is a bit of a workaround because at this stage the
# bonding list is actually shared between all residues of
# of the same type; so we first make a copy then we replace
# the list sans used bonding descriptor
fgrunewald marked this conversation as resolved.
Show resolved Hide resolved
prev_bond_list = prev_graph.nodes[edge[0]]['bonding'].copy()
prev_bond_list.remove(bonding[0])
prev_graph.nodes[edge[0]]['bonding'] = prev_bond_list
node_bond_list = node_graph.nodes[edge[1]]['bonding'].copy()
node_bond_list.remove(bonding[1])
node_graph.nodes[edge[1]]['bonding'] = node_bond_list
self.meta_molecule.molecule.add_edge(edge[0], edge[1], bonding=bonding)

def replace_unconsumed_bonding_descrpt(self):
"""
We allow multiple bonding descriptors per atom, which
however, are not always consumed. In this case the left
over bonding descriptors are replaced by hydrogen atoms.
"""
for node in self.meta_molecule.nodes:
graph = self.meta_molecule.nodes[node]['graph']
bonding = nx.get_node_attributes(graph, "bonding")
for node, bondings in bonding.items():
element = graph.nodes[node]['element']
hcount = VALENCES[element][0] -\
self.meta_molecule.molecule.degree(node) + 1
Comment on lines +121 to +122
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The VALENCES[element][0] doesn't always work. You need to select the smallest valence that is larger than the current number of bonds present.

Also, why +1?

attrs = {attr: graph.nodes[node][attr] for attr in ['resname', 'resid']}
attrs['element'] = 'H'
for new_id in range(1, hcount):
new_node = len(self.meta_molecule.molecule.nodes) + 1
graph.add_edge(node, new_node)
attrs['atomname'] = "H" + str(new_id + len(graph.nodes))
fgrunewald marked this conversation as resolved.
Show resolved Hide resolved
graph.nodes[new_node].update(attrs)
self.meta_molecule.molecule.add_edge(node, new_node)
self.meta_molecule.molecule.nodes[new_node].update(attrs)

def parse(self, big_smile_str):
res_pattern, residues = re.findall(r"\{[^\}]+\}", big_smile_str)
self.meta_molecule = res_pattern_to_meta_mol(res_pattern)
self.force_field = force_field_from_fragments(residues)
MapToMolecule(self.force_field).run_molecule(self.meta_molecule)
self.edges_from_bonding_descrpt()
self.replace_unconsumed_bonding_descrpt()
return self.meta_molecule

# ToDo
# - clean copying of bond-list attributes L100
fgrunewald marked this conversation as resolved.
Show resolved Hide resolved
Loading
Loading