Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert itp files to ff files #327

Open
wants to merge 86 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
86 commits
Select commit Hold shift + click to select a range
5968f83
init draft itp to ff
fgrunewald Jun 13, 2023
c577052
imporve graph matching
fgrunewald Jun 15, 2023
7eff22a
fragment finder with prints
fgrunewald Jun 19, 2023
95c4b87
add tests for fragment finder
fgrunewald Jun 19, 2023
ae2794c
add test for 100% coverage
fgrunewald Jun 20, 2023
101d2b7
refactor graph matchin post isomorph check
fgrunewald Jun 20, 2023
6261186
add check on node naming
fgrunewald Jun 20, 2023
a8ce5a1
add pysmiles to tests
fgrunewald Jun 20, 2023
b8dfa7b
tests for ffoutput
fgrunewald Jun 20, 2023
b3ea5ac
use tmp-file for testing ffoutput
fgrunewald Jun 20, 2023
79c38fb
modify extract block and use in itp_to_ff
fgrunewald Jun 20, 2023
77dfe16
update test for generate templates accordingly
fgrunewald Jun 20, 2023
214f5f2
add isomorphism naming
fgrunewald Jun 21, 2023
ef70012
properly check if interactions are equal
fgrunewald Jun 21, 2023
2410b0a
read itp files
fgrunewald Jun 21, 2023
450ebc4
draft round robin tests
fgrunewald Jun 21, 2023
888515b
fix input types
fgrunewald Jun 21, 2023
44cc867
add test print
fgrunewald Jun 26, 2023
0877210
Merge branch 'master' into itp_to_ff
fgrunewald Jun 29, 2023
32cd8f8
clean up output
fgrunewald Nov 22, 2023
a8d1bb9
methods to deal with charges
fgrunewald Nov 22, 2023
37bad71
methods to deal with charges
fgrunewald Nov 22, 2023
7409098
methods to deal with charges
fgrunewald Nov 22, 2023
362372e
adjust test
fgrunewald Nov 22, 2023
4ed2979
move extract block to molecule utils
fgrunewald Nov 22, 2023
fa32f76
small fix
fgrunewald Nov 22, 2023
5207801
allow for charged residues and make pysmiles optional import
fgrunewald Nov 23, 2023
737b45c
make mass optional
fgrunewald Nov 23, 2023
9a1e800
add doc-strings and rename equalize_charge
fgrunewald Nov 23, 2023
c9621a3
remove print
fgrunewald Nov 24, 2023
fdae3db
remove martini2 from ffoutput test as it fails on GH
fgrunewald Nov 24, 2023
e50e232
add test for extract links
fgrunewald Nov 24, 2023
6c94485
add test for extract links with redundant interaction
fgrunewald Nov 24, 2023
e343211
test for charge balancing
fgrunewald Nov 24, 2023
7db6462
test for charge balancing
fgrunewald Nov 24, 2023
c9dadac
implement tolerances for charge balancing
fgrunewald Nov 24, 2023
b693735
add integration tests itp_to_ff and adjust CLI
fgrunewald Nov 24, 2023
1c542cb
fix bug in integration tests itp_to_ff
fgrunewald Nov 24, 2023
aa0865d
complex integration test itp_to_ff plus charged mol
fgrunewald Nov 24, 2023
a39c3ac
use top file for ACOL test and fix bug in test
fgrunewald Nov 24, 2023
39f7ad4
fix toplevel itp_to_ff parser
fgrunewald Dec 28, 2023
8f80a99
bigsmile_draft
fgrunewald Jan 15, 2024
40b89af
infrastructure for big smile parsing
fgrunewald Jan 19, 2024
05ed045
optional dep. for pysmiles
fgrunewald Jan 19, 2024
82a2acc
add a processor that reads a big smile string and returns a full meta…
fgrunewald Jan 19, 2024
257b76b
atest-big-smile parsing part I
fgrunewald Jan 20, 2024
061c8ef
fix hcount for single atom; fix nexted branches
fgrunewald Jan 22, 2024
20e2e49
tests for smile iter and test nested branches
fgrunewald Jan 22, 2024
f505129
add pysmiles to test requrm.
fgrunewald Jan 22, 2024
0c67ecc
add tests for bonding descriptor evaluation
fgrunewald Jan 22, 2024
52235c9
add tests for big smile molecule prc
fgrunewald Jan 23, 2024
9a0a674
allow multiple bonding per atom; fix bugs
fgrunewald Jan 23, 2024
ceccc3d
remove mpl import
fgrunewald Jan 24, 2024
158fd37
add changed tests for multiple bonding per atom
fgrunewald Jan 24, 2024
8f2887f
delete old processor file
fgrunewald Jan 24, 2024
08021c2
have charge balancing for itps but raise error when bond length is mi…
fgrunewald Jan 24, 2024
8d48cb2
Merge branch 'big_smiles' into itp_to_ff
fgrunewald Jan 25, 2024
681004f
add closing bracket to special characters
fgrunewald Feb 29, 2024
3537239
only balance charges for blocks with at least 2 atoms
fgrunewald Feb 29, 2024
929b5d1
refactor fragment finder
fgrunewald Feb 29, 2024
87510bb
refactor fragment itp_to_ff
fgrunewald Feb 29, 2024
05df2e5
change input for itp_to_ff to allow bigmsiles
fgrunewald Feb 29, 2024
0ebfa6a
take most central fragment
fgrunewald Mar 1, 2024
a7cd590
add special links for terminal modifications
fgrunewald Mar 1, 2024
8e0c257
type the charges to float in itp to ff
fgrunewald Mar 3, 2024
7cb3b4c
add H and ] as special characters in big smile parser
fgrunewald Mar 3, 2024
097ec84
account for explicit hydrogen in the smiles string input
fgrunewald Mar 3, 2024
514ba1b
test accounting for explicit hydrogen in the smiles string input
fgrunewald Mar 3, 2024
3e4a737
read provided ff file and use these blocks instead of making new ones
fgrunewald Mar 4, 2024
d97632d
adjust doc string
fgrunewald Mar 4, 2024
b6acc73
skip termini mods if none atoms are different
fgrunewald Mar 4, 2024
6095dc6
Merge branch 'big_smiles' into itp_to_ff
fgrunewald Mar 5, 2024
9d9ee89
redo hydrogen based on valency not based on how many bonding descript…
fgrunewald Mar 6, 2024
c4f1652
parse force-field in molprocessor, adjust hydrogen reconstruction
fgrunewald Mar 6, 2024
7a5dd1f
fix tests
fgrunewald Mar 6, 2024
2b9e7a9
Apply suggestions from code review
fgrunewald Mar 6, 2024
b6d891f
allow nested branch expansion
fgrunewald Mar 7, 2024
a867329
test branch expansion
fgrunewald Mar 7, 2024
b6f5cc0
add comments all over residue expansion functions
fgrunewald Mar 7, 2024
f965e1d
address comments
fgrunewald Mar 7, 2024
0335956
allow for ionic bonds with . syntax
fgrunewald Mar 7, 2024
1f25c32
Merge branch 'big_smiles' into itp_to_ff
fgrunewald Mar 7, 2024
47fef23
fix previous issue with link appending
fgrunewald Mar 7, 2024
7f7fe21
update itp_to_ff tests
fgrunewald Mar 7, 2024
7268663
update tests for fragment finder
fgrunewald Mar 7, 2024
15be6a6
remove leftover files
fgrunewald Mar 7, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 41 additions & 15 deletions polyply/src/big_smile_mol_processor.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,13 @@
import re
import networkx as nx
import pysmiles
from polyply.src.big_smile_parsing import (res_pattern_to_meta_mol,
force_field_from_fragments)
from polyply.src.map_to_molecule import MapToMolecule

VALENCES = pysmiles.smiles_helper.VALENCES
VALENCES.update({"H":(1,)})

def compatible(left, right):
"""
Check bonding descriptor compatibility according
Expand All @@ -19,15 +24,12 @@
"""
if left == right and left not in '> <':
return True
if left[0] == "<" and right[0] == ">":
if left[1:] == right[1:]:
return True
if left[0] == ">" and right[0] == "<":
if left[1:] == right[1:]:
return True
l, r = left[0], right[0]
if (l, r) == ('<', '>') or (l, r) == ('>', '<'):
return left[1:] == right[1:]
return False

def generate_edge(source, target, bond_type="bonding"):
def generate_edge(source, target, bond_attribute="bonding"):
"""
Given a source and a target graph, which have bonding
descriptors stored as node attributes, find a pair of
Expand All @@ -40,7 +42,7 @@
----------
source: :class:`nx.Graph`
target: :class:`nx.Graph`
bond_type: `abc.hashable`
bond_attribute: `abc.hashable`
under which attribute are the bonding descriptors
stored.

Expand All @@ -54,8 +56,8 @@
LookupError
if no match is found
"""
source_nodes = nx.get_node_attributes(source, bond_type)
target_nodes = nx.get_node_attributes(target, bond_type)
source_nodes = nx.get_node_attributes(source, bond_attribute)
target_nodes = nx.get_node_attributes(target, bond_attribute)
for source_node in source_nodes:
for target_node in target_nodes:
#print(source_node, target_node)
Expand All @@ -66,7 +68,7 @@
#print(bond_source, bond_target)
if compatible(bond_source, bond_target):
return ((source_node, target_node), (bond_source, bond_target))
raise LookupError

Check warning on line 71 in polyply/src/big_smile_mol_processor.py

View check run for this annotation

Codecov / codecov/patch

polyply/src/big_smile_mol_processor.py#L71

Added line #L71 was not covered by tests

class DefBigSmileParser:
"""
Expand All @@ -83,9 +85,10 @@
"""
Make edges according to the bonding descriptors stored
in the node attributes of meta_molecule residue graph.
If a bonding descriptor is consumed it is set to None,
If a bonding descriptor is consumed it is removed from the list,
however, the meta_molecule edge gets an attribute with the
bonding descriptors that formed the edge.
bonding descriptors that formed the edge. Later uncomsumed
bonding descriptors are replaced by hydrogen atoms.
"""
for prev_node, node in nx.dfs_edges(self.meta_molecule):
prev_graph = self.meta_molecule.nodes[prev_node]['graph']
Expand All @@ -104,14 +107,37 @@
node_graph.nodes[edge[1]]['bonding'] = node_bond_list
self.meta_molecule.molecule.add_edge(edge[0], edge[1], bonding=bonding)

def replace_unconsumed_bonding_descrpt(self):
"""
We allow multiple bonding descriptors per atom, which
however, are not always consumed. In this case the left
over bonding descriptors are replaced by hydrogen atoms.
"""
for node in self.meta_molecule.nodes:
graph = self.meta_molecule.nodes[node]['graph']
bonding = nx.get_node_attributes(graph, "bonding")
for node, bondings in bonding.items():
element = graph.nodes[node]['element']
hcount = VALENCES[element][0] -\
self.meta_molecule.molecule.degree(node) + 1
attrs = {attr: graph.nodes[node][attr] for attr in ['resname', 'resid']}
attrs['element'] = 'H'
for new_id in range(1, hcount):
new_node = len(self.meta_molecule.molecule.nodes) + 1
graph.add_edge(node, new_node)
attrs['atomname'] = "H" + str(new_id + len(graph.nodes))
graph.nodes[new_node].update(attrs)
self.meta_molecule.molecule.add_edge(node, new_node)
self.meta_molecule.molecule.nodes[new_node].update(attrs)

def parse(self, big_smile_str):
res_pattern, residues = big_smile_str.split('.')
res_pattern, residues = re.findall(r"\{[^\}]+\}", big_smile_str)
self.meta_molecule = res_pattern_to_meta_mol(res_pattern)
self.force_field = force_field_from_fragments(residues)
MapToMolecule(self.force_field).run_molecule(self.meta_molecule)
self.edges_from_bonding_descrpt()
self.replace_unconsumed_bonding_descrpt()
return self.meta_molecule

# ToDo
# - replace non consumed bonding descrpt by hydrogen
# -
# - clean copying of bond-list attributes L100
173 changes: 133 additions & 40 deletions polyply/src/big_smile_parsing.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@
import numpy as np
try:
import pysmiles
except ImportError:
except ImportError as error:
msg = ("You are using a functionality that requires "

Check warning on line 7 in polyply/src/big_smile_parsing.py

View check run for this annotation

Codecov / codecov/patch

polyply/src/big_smile_parsing.py#L6-L7

Added lines #L6 - L7 were not covered by tests
"the pysmiles package. Use pip install pysmiles ")
raise ImportError(msg)
raise ImportError(msg) from error

Check warning on line 9 in polyply/src/big_smile_parsing.py

View check run for this annotation

Codecov / codecov/patch

polyply/src/big_smile_parsing.py#L9

Added line #L9 was not covered by tests
import networkx as nx
from vermouth.forcefield import ForceField
from vermouth.molecule import Block
Expand All @@ -24,6 +24,21 @@
return idx+start
return np.inf

def _expand_branch(meta_mol, current, anchor, recipe):
prev_node = anchor
for bdx, (resname, n_mon) in enumerate(recipe):
if bdx == 0:
anchor = current
for _ in range(0, n_mon):
connection = [(prev_node, current)]
meta_mol.add_monomer(current,
resname,
connection)
prev_node = current
current += 1
prev_node = anchor
return meta_mol, current, prev_node

def res_pattern_to_meta_mol(pattern):
"""
Generate a :class:`polyply.MetaMolecule` from a
Expand All @@ -41,7 +56,7 @@
'{' + [#resname_1][#resname_2]... + '}'

In addition to plain enumeration any residue may be
followed by a '|' and an integern number that
followed by a '|' and an integer number that
specifies how many times the given residue should
be added within a sequence. For example, a pentamer
of PEO can be written as:
Expand All @@ -52,10 +67,10 @@

{[#PEO]|5}

The block syntax also applies to branches. Here the convetion
The block syntax also applies to branches. Here the convention
is that the complete branch including it's first anchoring
residue is repeated. For example, to generate a PMA-g-PEG
polymer the following syntax is permitted:
polymer containing 15 residues the following syntax is permitted:

{[#PMA]([#PEO][#PEO])|5}

Expand All @@ -70,28 +85,54 @@
"""
meta_mol = MetaMolecule()
current = 0
branch_anchor = 0
# stores one or more branch anchors; each next
# anchor belongs to a nested branch
branch_anchor = []
# used for storing composition protocol for
# for branches; each entry is a list of
# branches from extending from the anchor
# point
recipes = defaultdict(list)
# the previous node
prev_node = None
# do we have an open branch
branching = False
# each element in the for loop matches a pattern
# '[' + '#' + some alphanumeric name + ']'
for match in re.finditer(PATTERNS['place_holder'], pattern):
start, stop = match.span()
# new branch here
# we start a new branch when the residue is preceded by '('
# as in ... ([#PEO] ...
if pattern[start-1] == '(':
branching = True
branch_anchor = prev_node
recipie = [(meta_mol.nodes[prev_node]['resname'], 1)]
branch_anchor.append(prev_node)
# the recipe for making the branch includes the anchor; which
# is hence the first residue in the list
recipes[branch_anchor[-1]] = [(meta_mol.nodes[prev_node]['resname'], 1)]
# here we check if the atom is followed by a expansion character '|'
# as in ... [#PEO]|
if stop < len(pattern) and pattern[stop] == '|':
# eon => end of next
# we find the next character that starts a new residue, ends a branch or
# ends the complete pattern
eon = _find_next_character(pattern, ['[', ')', '(', '}'], stop)
# between the expansion character and the eon character
# is any number that correspnds to the number of times (i.e. monomers)
# that this atom should be added
n_mon = int(pattern[stop+1:eon])
else:
n_mon = 1

# the resname starts at the second character and ends
# one before the last according to the above pattern
resname = match.group(0)[2:-1]
# collect all residues in branch
# if this residue is part of a branch we store it in
# the recipe dict together with the anchor residue
# and expansion number
if branching:
recipie.append((resname, n_mon))
recipes[branch_anchor[-1]].append((resname, n_mon))

# add the new residue
# new we add new residue as often as required
connection = []
for _ in range(0, n_mon):
if prev_node is not None:
Expand All @@ -102,58 +143,99 @@
prev_node = current
current += 1

# terminate branch and jump back to anchor
# here we check if the residue considered before is the
# last residue of a branch (i.e. '...[#residue])'
# that is the case if the branch closure comes before
# any new atom begins
branch_stop = _find_next_character(pattern, ['['], stop) >\
_find_next_character(pattern, [')'], stop)
if stop <= len(pattern) and branch_stop and branching:

# if the branch ends we reset the anchor
# and set branching False unless we are in
# a nested branch
if stop <= len(pattern) and branch_stop:
branching = False
prev_node = branch_anchor
# we have to multiply the branch n-times
prev_node = branch_anchor.pop()
if branch_anchor:
branching = True
#========================================
# expansion for branches
#========================================
# We need to know how often the branch has
# to be added so we first identify the branch
# terminal character ')' called eon_a.
eon_a = _find_next_character(pattern, [')'], stop)
# Then we check if the expansion character
# is next.
if stop+1 < len(pattern) and pattern[eon_a+1] == "|":
# If there is one we find the beginning
# of the next branch, residue or end of the string
# As before all characters inbetween are a number that
# is how often the branch is expanded.
eon_b = _find_next_character(pattern, ['[', ')', '(', '}'], eon_a+1)
# -1 because one branch has already been added at this point
for _ in range(0,int(pattern[eon_a+2:eon_b])-1):
for bdx, (resname, n_mon) in enumerate(recipie):
if bdx == 0:
anchor = current
for _ in range(0, n_mon):
connection = [(prev_node, current)]
meta_mol.add_monomer(current,
resname,
connection)
prev_node = current
current += 1
prev_node = anchor
# the outermost loop goes over how often a the branch has to be
# added to the existing sequence
for idx in range(0,int(pattern[eon_a+2:eon_b])-1):
prev_anchor = None
skip = 0
# in principle each branch can contain any number of nested branches
# each branch is itself a recipe that has an anchor atom
for ref_anchor, recipe in list(recipes.items())[len(branch_anchor):]:
# starting from the first nested branch we have to do some
# math to find the anchor atom relative to the first branch
# we also skip the first residue in recipe, which is the
# anchor residue. Only the outermost branch in an expansion
# is expanded including the anchor. This allows easy description
# of graft polymers.
if prev_anchor:
offset = ref_anchor - prev_anchor
prev_node = prev_node + offset
skip = 1
# this function simply adds the residues of the paticular
# branch
meta_mol, current, prev_node = _expand_branch(meta_mol,
current=current,
anchor=prev_node,
recipe=recipe[skip:])
# if this is the first branch we want to set the anchor
# as the base anchor to which we jump back after all nested
# branches have been added
if prev_anchor is None:
base_anchor = prev_node
# store the previous anchor so we can do the math for nested
# branches
prev_anchor = ref_anchor
# all branches added; then go back to the base anchor
prev_node = base_anchor
# if all branches are done we need to reset the lists
# when all nested branches are completed
if len(branch_anchor) == 0:
recipes = defaultdict(list)
return meta_mol

def _big_smile_iter(smile):
for token in smile:
yield token

def tokenize_big_smile(big_smile):
"""
Processes a BigSmile string by storing the
the BigSmile specific bonding descriptors
in a dict with refernce to the atom they
in a dict with reference to the atom they
refer to. Furthermore, a cleaned smile
string is generated with the BigSmile
specific syntax removed.

Parameters
----------
smile: str
a BigSmile smile string
a BigSmile smiles string

Returns
-------
str
a canonical smile string
a canonical smiles string
dict
a dict mapping bonding descriptors
to the nodes within the smile
to the nodes within the smiles string
"""
smile_iter = _big_smile_iter(big_smile)
smile_iter = iter(big_smile)
bonding_descrpt = defaultdict(list)
smile = ""
node_count = 0
Expand Down Expand Up @@ -204,8 +286,15 @@
else:
for node in mol_graph.nodes:
if mol_graph.nodes[node].get('bonding', False):
hcount = mol_graph.nodes[node]['hcount']
mol_graph.nodes[node]['hcount'] = hcount - len(mol_graph.nodes[node]['bonding'])
# get the degree
ele = mol_graph.nodes[0]['element']
# hcoung is the valance minus the degree minus
# the number of bonding descriptors
hcount = pysmiles.smiles_helper.VALENCES[ele][0] -\
mol_graph.degree(node) -\
len(mol_graph.nodes[node]['bonding'])

mol_graph.nodes[node]['hcount'] = hcount

pysmiles.smiles_helper.add_explicit_hydrogens(mol_graph)
return mol_graph
Expand Down Expand Up @@ -274,3 +363,7 @@
mol_block = Block(mol_graph)
force_field.blocks[resname] = mol_block
return force_field

# ToDos
# - remove special case hydrogen line 327ff
# - check rebuild_h and clean up
Loading
Loading