Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove duplicate paths #23

Open
espottesmith opened this issue Nov 19, 2020 · 2 comments
Open

Remove duplicate paths #23

espottesmith opened this issue Nov 19, 2020 · 2 comments
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@espottesmith
Copy link
Collaborator

Because of prerequisite solving and the use of split reaction nodes, often, there are multiple chemically equivalent pathways that can be taken from the same reactants to the same products. When performing pathfinding, one will often encounter these duplicate pathways, which are uninteresting and redundant.

Ideally, there would be some procedure to prune the shortest path list to return only unique pathways. This shouldn't require any major changes to how MR.Net conducts pathfinding; rather, this can just be a postprocessing step, where the reactions in each path are compared.

@espottesmith espottesmith added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed labels Nov 19, 2020
@samblau
Copy link
Member

samblau commented Dec 10, 2020

I know we probably want to be clever here, and I think that Daniel is working on a strategy for catching duplicate paths from kMC that would work, but after spending some time digging through paths the past couple of days, there is something pretty obvious that should work - just prune paths with identical costs. For example:

0 3.4620678646635246894
1 3.4620678646635246894
2 3.4736261363211290221
3 3.75510952395117456
4 3.7861278042505688611
5 3.7861278042505688611
6 4.2238517847547954525
7 4.3857509970409720313
8 4.3857509970409720313
9 4.477374250818638558
10 4.477374250818638558

paths 0 and 1 are duplicates, paths 4 and 5 are duplicates, paths 7 and 8 are duplicates, paths 9 and 10 are duplicates.

@danielbarter
Copy link
Collaborator

danielbarter commented Jan 8, 2021

Here is how i solve this: I have a integer index for each reaction. A pathway is then a list of indices which you can convert into a frozen set. frozen sets are hashable, so you can easily detect duplicates and count how many times each pathway happens

I think i like Sam's solution better for path finding though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants