Releases: tskit-dev/tskit
Python 0.6.0
Breaking Changes
- The definition of
TreeSequence.genetic_relatedness
and
TreeSequence.genetic_relatedness_weighted
are changed
to average over sample sets, rather than summing over them.
For computation with diploid sample sets, this will change the result
by a factor of four; for larger sample sets it will now produce
sensible values that are comparable between sample sets of different sizes.
The default for these methods is also changed topolarised=True
,
but the output is unchanged forcentre=True
(the default).
See the documentation for these methods for more discussion.
(@petrelharp, @mmosmond, #1623)
Bugfixes
-
Fix to
TreeSequence.genetic_relatedness
withindexes=None
and
proportion=True
. (@petrelharp, #2984, #1623) -
Fix to
TreeSequence.general_stat
when using non-strict summary functions
in the presence of non-ancestral material (very rare).
(@petrelharp, #2983, #1623) -
Printing
tskit.MetadataSchema(schema=None)
now shows"Null_schema"
rather
thanNone
, to avoid confusion (@hyanwong, #2720) -
Limit output HTML when a tree sequence is displayed that has a large amount of metadata.
(@benjeffery, #2999) -
Fix warning in
draw_svg
to use correct warnings module.
(@duncanMR, #2870, #2871)
Features
-
Add the
centre
option toTreeSequence.genetic_relatedness
and
TreeSequence.genetic_relatedness_weighted
.
(@petrelharp, @mmosmond, #1623) -
Edges now have an
.interval
attribute returning atskit.Interval
object.
(@hyanwong, #2531) -
Variants now have a
states()
method that returns the genotypes as an
(inefficient) array of strings, rather than integer indexes, to
aid comparison of genetic variation (@hyanwong, #2617) -
Added
distance_between
that calculates the total distance between two nodes in a tree.
(@Billyzhang1229, #2771) -
Added
genetic_relatedness_matrix
method to compute
pairwise genetic relatedness between sample sets.
(@jeromekelleher, @petrelharp, #2823) -
Add
TreeSequence.extend_haplotypes
method that extends ancestral haplotypes
using recombination information, leading to unary nodes in many trees and
fewer edges. (@petrelharp, @hfr1tz3, :user:nspope
,
@avabamf, #2651, #2938) -
Add
Table.drop_metadata
to make clearing metadata from tables easy.
(@jeromekelleher, #2944) -
Add
Interval.mid
andTree.mid
properties to return the midpoint of the interval.
(@currocam, #2960) -
Added
genetic_relatedness_vector
method to compute product of genetic relatedness
matrix and weight vector.
(@petrelharp, #2980) -
Added
pair_coalescence_counts
method to calculate coalescence events per node or time
interval,pair_coalescence_quantiles
method to estimate quantiles of pair
coalescence times using empirical CDF inversion, andpair_coalescence_rates
method to
estimate instantaneous rates of pair coalescence within time intervals from the empirical CDF.
(@nspope, #2915, #2976, #2985) -
Add provenance information to the HTML notebook representation of a tree sequence.
(@benjeffery, #3001) -
The
.draw_svg()
methods can add annotated genomic regions (e.g. genes) to the
x-axis. (@hyanwong, #3002) -
Added a
node_titles
and amutation_titles
parameter to.draw_svg()
methods
which assigns a string to node and mutation symbols, commonly shown on mouseover. This
can reduce label clutter while retaining useful info (@hyanwong, #3007) -
Added (currently undocumented) use of the
order
parameter inTree.draw_svg()
to
pass a subset of nodes, so subtrees can be visually collapsed. Additionally, an option
pack_untracked_polytomies
allows large polytomies involving untracked samples to
be summarised as a dotted line (@hyanwong, #3011 #3010, #3012) -
Added a
title
parameter to.draw_svg()
methods (@hyanwong, #3015) -
Add comma separation to all display numbers. (@benjeffery, #3017, #3018)
-
Add
resources
section to provenance schema. (@benjeffery, #3016) -
Add
Tree.rf_distance
method to calculate the unweighted Robinson-Foulds distance
between two trees. (@Billyzhang1229, #995, #2643, #3032)
C API C_1.1.3
Features
- Add the
tsk_treeseq_extend_haplotypes
method that can compress a tree sequence
by extending edges into adjacent trees and thus creating unary nodes in those
trees (@petrelharp, @hfr1tze, @avabamf, #2651, #2938).
Python 0.5.8
- Add support for numpy 2 (@jeromekelleher, @benjeffery, #2964)
Python 0.5.7
Breaking Changes
- The VCF writing methods (
ts.write_vcf
,ts.as_vcf
) now error if a site with
position zero is encountered. The VCF spec does not allow zero position sites.
Suppress this error with theallow_position_zero
argument.
(@benjeffery, #2901, #2838)
Bugfixes
- Fix to the folded, expected allele frequency spectrum (i.e.,
TreeSequence.allele_frequency_spectrum(mode="branch", polarised=False)
,
which was half as big as it should have been. (@petrelharp,
@nspope, #2933)
Python 0.5.6
Breaking Changes
- tskit now requires Python 3.8, as Python 3.7 became end-of-life on 2023-06-27
Features
-
Tree.trmca
now accepts >2 nodes and returns nicer errors
(@hyanwong, :pr:2808, #2801, #2070, #2611) -
Add
TreeSequence.genetic_relatedness_weighted
stats method.
(@petrelharp, @brieuclehmann, @jeromekelleher,
#2785, #1246) -
Add
TreeSequence.impute_unknown_mutations_time
method to return an
array of mutation times based on the times of associated nodes
(@duncanMR, #2760, #2758) -
Add
asdict
to all dataclasses. These are returned when you access a row or
other tree sequence object. (@benjeffery, #2759, #2719)
Bugfixes
- Fix incompatibility with
jsonschema>4.18.6
which caused
AttributeError: module jsonschema has no attribute _validators
(@benjeffery, #2844, #2840)
Python 0.5.5
Performance improvements
- Methods like ts.at() which seek to a specified position on the sequence from
a new Tree instance are now much faster (@molpopgen, #2661).
Features
-
Add
__repr__
for variants to return a string representation of the raw data
without spewing megabytes of text (@chriscrsmith, #2695, #2694) -
Add
keep_rows
method to table classes to support efficient in-place
table subsetting (@jeromekelleher, #2700)
Bugfixes
- Fix
UnicodeDecodeError
when callingVariant.alleles
on theemscripten
platform.
(@benjeffery, #2754, #2737)
C API C_1.1.2
Performance improvements
- tsk_tree_seek is now much faster at seeking to arbitrary points along
the sequence from the null tree (@molpopgen, #2661).
Features
-
The struct
tsk_treeseq_t
now has the variablesmin_time
andmax_time
,
which are the minimum and maximum among the node times and mutation times,
respectively.min_time
andmax_time
can be accessed using the functions
tsk_treeseq_get_min_time
andtsk_treeseq_get_max_time
, respectively.
(@szhan, #2612, #2271) -
Add the
TSK_SIMPLIFY_NO_FILTER_NODES
option to simplify to allow unreferenced
nodes be kept in the output (@jeromekelleher, @hyanwong,
#2606, #2619). -
Add the
TSK_SIMPLIFY_NO_UPDATE_SAMPLE_FLAGS
option to simplify which ensures
no node sample flags are changed to allow calling code to manage sample status.
(@jeromekelleher, #2662, #2663). -
Guarantee that unfiltered tables are not written to unnecessarily
during simplify (@jeromekelleher #2619). -
Add
x_table_keep_rows
methods to provide efficient in-place table subsetting
(@jeromekelleher, #2700). -
Add
tsk_tree_seek_index
function
Python 0.5.4
Features
-
A new
Tree.is_root
method avoids the need to to search the potentially
large list ofTree.roots
(@hyanwong, #2669, #2620) -
The
TreeSequence
object now has the attributesmin_time
andmax_time
,
which are the minimum and maximum among the node times and mutation times,
respectively. (@szhan, #2612, #2271) -
The
draw_svg
methods now have amax_num_trees
parameter to truncate
the total number of trees shown, giving a readable display for tree
sequences with many trees (@hyanwong, #2652) -
The
draw_svg
methods now accept acanvas_size
parameter to allow
extra room on the canvas e.g. for long labels or repositioned graphical
elements (@hyanwong, #2646, #2645) -
The
Tree
object now has the methodsiblings
to get
the siblings of a node. It returns an empty tuple if the node
has no siblings, is not a node in the tree, is the virtual root,
or is an isolated non-sample node.
(@szhan, #2618, #2616) -
The
msprime.RateMap
class has been ported into tskit: functionality should
be identical to the version in msprime, apart from minor changes in the formatting
of tabular text output (@hyanwong, @jeromekelleher, #2678) -
Tskit now supports and has wheels for Python 3.11. This Python version has a significant performance boost. (@benjeffery , #2624 , #2248 )
Breaking Changes
Python 0.5.3
Fixes
-
The
Variant
object can now be initialized with 64 bit numpy ints as
returned e.g. from np.where (@hyanwong, #2518, #2514) -
Fix
tree.mrca
for the case of a tree with multiple roots.
(@benjeffery, #2533, #2521)
Features
-
The
ts.nodes
method now takes anorder
parameter so that nodes
can be visited in time order (@hyanwong, #2471, #2370) -
Add
samples
argument toTreeSequence.genotype_matrix
.
Default isNone
, where all the sample nodes are selected.
(@szhan, #2493, #678) -
ts.draw
and thedraw_svg
methods now have an optionalomit_sites
parameter, aiding drawing large trees with many sites and mutations
(@hyanwong, #2519, #2516)
Breaking Changes
-
Single statistics computed with
TreeSequence.general_stat
are now
returned as numpy scalars if windows=None, AND; samples is a single
list or None (for a 1-way stat), OR indexes is None or a single list of
length k (instead of a list of length-k lists).
(@gtsambos, #2417, #2308) -
Accessor methods such as ts.edge(n) and ts.node(n) now allow negative
indexes (@hyanwong, #2478, #1008) -
ts.subset()
produces valid tree sequences even if nodes are shuffled
out of time order (@hyanwong, #2479, #2473), and the
same fortables.subset()
(@hyanwong, #2489). This involves
sorting the returned tables, potentially changing the returned edge order.
Performance improvements
Python 0.5.2
Fixes
-
Iterating over
ts.variants()
could cause a segfault in tree sequences
with large numbers of alleles or very long alleles
(@jeromekelleher, #2437, #2429). -
Various circular references fixed, lowering peak memory usage
(@jeromekelleher, #2424, #2423, #2427). -
Fix bugs in VCF output when there isn't a 1-1 mapping between individuals
and sample nodes (@jeromekelleher, #2442, #2257,
#2446, #2448).
Performance improvements
-
TreeSequence.site position search performance greatly improved, with much lower
memory overhead (@jeromekelleher, #2424). -
TreeSequence.samples time/population search performance greatly improved, with
much lower memory overhead (@jeromekelleher, #2424, #1916). -
The
timeasc
andtimedesc
orders forTree.nodes
have much
improved performance and lower memory overhead
(@jeromekelleher, #2424, #2423).
Features
-
Variant objects now have a
.num_missing
attribute and.counts()
and
.frequencies
methods (@hyanwong, #2390 #2393). -
Add the
Tree.num_lineages(t)
method to return the number of lineages present
at time t in the tree (@jeromekelleher, #386, #2422) -
Efficient array access to table data now provided via attributes like
TreeSequence.nodes_time
, etc (@jeromekelleher, #2424).
Breaking Changes
- Previously, accessing (e.g.)
tables.edges
returned a different instance of
EdgeTable each time. This has been changed to return the same instance
for the lifetime of a given TableCollection instance. This is technically
a breaking change, although it's difficult to see how code would depend
on the property that (e.g.)tables.edges is not tables.edges
.
(@jeromekelleher, #2441, #2080).