-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
extend path algorithm (extend_edgesV2) #2938
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2938 +/- ##
==========================================
+ Coverage 89.71% 89.72% +0.01%
==========================================
Files 29 29
Lines 31355 31567 +212
Branches 6077 6113 +36
==========================================
+ Hits 28130 28324 +194
- Misses 1840 1852 +12
- Partials 1385 1391 +6
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Proposal for the basis for a test: here is a definition of what should be extendable by this method; so we can check if there remain no extendable edges (when run until no further changes are made). A right extendable path from
The nodes Why is this the right definition? Well, first it's clear we can't extend nodes that are in the next tree. Next, suppose that there is a |
I've added two things:
Next step: compare it to the non-naive version. |
cde307c
to
41d14e6
Compare
40e49ad
to
172317d
Compare
This is hopefully ready for you to look at, @nspope! One question is that currently |
|
Okay - now it's ready for reals, @nspope. The 'partial coverage' things that codecov flags are cases where we're like
but then not all four combinations of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, caught a few places where return codes should be checked
Okay - @jeromekelleher, I think this is ready for a quick look (and then we can merge it! 🎉) |
Wait two more things:
|
06c5705
to
e633258
Compare
Okay - now I've legit hit everything we can with tests (and simplified some things along the way). @jeromekelleher - could you just throw a quick eyeball at this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Very nice, I like the new design. I spotted a few small problems and a couple of suggestions on layout.
If you're looking for performance, I think there's a modest gain on the table from taking pointers to stuff within a function rather than lots of self->x->y
within loops. I wouldn't bother for now though, except in places where it'll also improve readability.
c/tskit/trees.c
Outdated
|
||
tsk_memset(&edge_list_heap, 0, sizeof(edge_list_heap)); | ||
tsk_memset(&tree_pos, 0, sizeof(tree_pos)); | ||
double *sites_position = tables->sites.position; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should probably be const
?
c/tskit/trees.c
Outdated
self->edges_out_head = NULL; | ||
self->edges_out_tail = NULL; | ||
|
||
tsk_memset(&self->edge_list_heap, 0, sizeof(self->edge_list_heap)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to memset the edge_list_heap to 0 here as its already been done above when you zero'd the top level struct.
c/tskit/trees.c
Outdated
return ret; | ||
} | ||
|
||
// static void |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to make the print_state
a part of the public interface so that it can be tested and it stays up to date. I guess the struct here is not exposed in the header so there's no way to access it. We worked around this for simplify by adding a TSK_DEBUG
option here
} else { | ||
self->near_side = self->edges->right; | ||
self->far_side = self->edges->left; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe allocate the new edge list node here and pass to edge_list_append_entry
rather than passing around the edge_list_heap
? It's not much extra code, and will save you some a few unreachable lines. It seemed odd to me on reading to pass the allocator in rather than the actual pointer.
c/tskit/trees.c
Outdated
|
||
p_out = self->parent_out[c]; | ||
p_in = self->parent_in[c]; | ||
t_out = self->ts->tables->nodes.time[p_out]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here - you don't want to be chasing 3 levels of pointers here in a loop. Grab a const pointer at the start of the function and the compiler will be able to do a way better job.
c/tskit/trees.c
Outdated
edge_list_t *ex_in; | ||
tsk_id_t e_in, c, e; | ||
tsk_size_t num_edges; | ||
tsk_bool_t *keep; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to initialise keep
to NULL
Excellent! I think you just need to cast to (int) for the printfs, give the commits a bit of a squash and it's ready to go in. |
Argh, yes - I just missed one cast. Ready now, hopefully! |
renamed to extend_haplotypes
6b66ed3
to
659218e
Compare
New extension (pun intended) to the$a \to b \to c$ in tree $T_1$ and edge $a \to d \to c$ in $T_2$ , assuming $d\notin T_1$ and $b\notin T_2$ and $t(b)>t(d)$ we should have extended path $a\to b \to d \to c$ in tree $T_2$ .
extend_edges
algorithm now calledextend_paths
(subject to change). We noticed that certain examples within unary regions would not be extended simply with extend edges. We propose this algorithm to handle examples like the following:For edge path
Unit tests still need to be built, and the algorithm needs to be cleaned.
We hypothesize that using this alongside extend edges could be the optimal strategy for minimizing edges in tree sequences.