Releases · tskit-dev/tskit

16 Oct 15:31

github-actions

0.6.0

8342e74

Python 0.6.0 Latest

Latest

Breaking Changes

The definition of TreeSequence.genetic_relatedness and
TreeSequence.genetic_relatedness_weighted are changed
to average over sample sets, rather than summing over them.
For computation with diploid sample sets, this will change the result
by a factor of four; for larger sample sets it will now produce
sensible values that are comparable between sample sets of different sizes.
The default for these methods is also changed to polarised=True,
but the output is unchanged for centre=True (the default).
See the documentation for these methods for more discussion.
(@petrelharp, @mmosmond, #1623)

Bugfixes

Fix to TreeSequence.genetic_relatedness with indexes=None and
proportion=True. (@petrelharp, #2984, #1623)
Fix to TreeSequence.general_stat when using non-strict summary functions
in the presence of non-ancestral material (very rare).
(@petrelharp, #2983, #1623)
Printing tskit.MetadataSchema(schema=None) now shows "Null_schema" rather
than None, to avoid confusion (@hyanwong, #2720)
Limit output HTML when a tree sequence is displayed that has a large amount of metadata.
(@benjeffery, #2999)
Fix warning in draw_svg to use correct warnings module.
(@duncanMR, #2870, #2871)

Features

Add the centre option to TreeSequence.genetic_relatedness and
TreeSequence.genetic_relatedness_weighted.
(@petrelharp, @mmosmond, #1623)
Edges now have an .interval attribute returning a tskit.Interval object.
(@hyanwong, #2531)
Variants now have a states() method that returns the genotypes as an
(inefficient) array of strings, rather than integer indexes, to
aid comparison of genetic variation (@hyanwong, #2617)
Added distance_between that calculates the total distance between two nodes in a tree.
(@Billyzhang1229, #2771)
Added genetic_relatedness_matrix method to compute
pairwise genetic relatedness between sample sets.
(@jeromekelleher, @petrelharp, #2823)
Add TreeSequence.extend_haplotypes method that extends ancestral haplotypes
using recombination information, leading to unary nodes in many trees and
fewer edges. (@petrelharp, @hfr1tz3, :user: nspope,
@avabamf, #2651, #2938)
Add Table.drop_metadata to make clearing metadata from tables easy.
(@jeromekelleher, #2944)
Add Interval.mid and Tree.mid properties to return the midpoint of the interval.
(@currocam, #2960)
Added genetic_relatedness_vector method to compute product of genetic relatedness
matrix and weight vector.
(@petrelharp, #2980)
Added pair_coalescence_counts method to calculate coalescence events per node or time
interval, pair_coalescence_quantiles method to estimate quantiles of pair
coalescence times using empirical CDF inversion, and pair_coalescence_rates method to
estimate instantaneous rates of pair coalescence within time intervals from the empirical CDF.
(@nspope, #2915, #2976, #2985)
Add provenance information to the HTML notebook representation of a tree sequence.
(@benjeffery, #3001)
The .draw_svg() methods can add annotated genomic regions (e.g. genes) to the
x-axis. (@hyanwong, #3002)
Added a node_titles and a mutation_titles parameter to .draw_svg() methods
which assigns a string to node and mutation symbols, commonly shown on mouseover. This
can reduce label clutter while retaining useful info (@hyanwong, #3007)
Added (currently undocumented) use of the order parameter in Tree.draw_svg() to
pass a subset of nodes, so subtrees can be visually collapsed. Additionally, an option
pack_untracked_polytomies allows large polytomies involving untracked samples to
be summarised as a dotted line (@hyanwong, #3011 #3010, #3012)
Added a title parameter to .draw_svg() methods (@hyanwong, #3015)
Add comma separation to all display numbers. (@benjeffery, #3017, #3018)
Added Tree.ancestors(u) method. (@hyanwong, #2706, #3021)
Add resources section to provenance schema. (@benjeffery, #3016)
Add Tree.rf_distance method to calculate the unweighted Robinson-Foulds distance
between two trees. (@Billyzhang1229, #995, #2643, #3032)

Assets 2

16 Oct 15:12

github-actions

C_1.1.3

8342e74

C API C_1.1.3

Features

Add the tsk_treeseq_extend_haplotypes method that can compress a tree sequence
by extending edges into adjacent trees and thus creating unary nodes in those
trees (@petrelharp, @hfr1tze, @avabamf, #2651, #2938).

Assets 4

27 Jun 13:53

github-actions

0.5.8

c5b7311

Python 0.5.8

Add support for numpy 2 (@jeromekelleher, @benjeffery, #2964)

Assets 2

17 Jun 17:26

github-actions

0.5.7

308fb01

Python 0.5.7

Breaking Changes

The VCF writing methods (ts.write_vcf, ts.as_vcf) now error if a site with
position zero is encountered. The VCF spec does not allow zero position sites.
Suppress this error with the allow_position_zero argument.
(@benjeffery, #2901, #2838)

Bugfixes

Fix to the folded, expected allele frequency spectrum (i.e.,
TreeSequence.allele_frequency_spectrum(mode="branch", polarised=False),
which was half as big as it should have been. (@petrelharp,
@nspope, #2933)

Assets 2

10 Oct 10:55

github-actions

0.5.6

4874177

Python 0.5.6

Breaking Changes

tskit now requires Python 3.8, as Python 3.7 became end-of-life on 2023-06-27

Features

Tree.trmca now accepts >2 nodes and returns nicer errors
(@hyanwong, :pr:2808, #2801, #2070, #2611)
Add TreeSequence.genetic_relatedness_weighted stats method.
(@petrelharp, @brieuclehmann, @jeromekelleher,
#2785, #1246)
Add TreeSequence.impute_unknown_mutations_time method to return an
array of mutation times based on the times of associated nodes
(@duncanMR, #2760, #2758)
Add asdict to all dataclasses. These are returned when you access a row or
other tree sequence object. (@benjeffery, #2759, #2719)

Bugfixes

Fix incompatibility with jsonschema>4.18.6 which caused
AttributeError: module jsonschema has no attribute _validators
(@benjeffery, #2844, #2840)

Assets 2

17 May 20:09

github-actions

0.5.5

fd72573

Python 0.5.5

Performance improvements

Methods like ts.at() which seek to a specified position on the sequence from
a new Tree instance are now much faster (@molpopgen, #2661).

Features

Add __repr__ for variants to return a string representation of the raw data
without spewing megabytes of text (@chriscrsmith, #2695, #2694)
Add keep_rows method to table classes to support efficient in-place
table subsetting (@jeromekelleher, #2700)

Bugfixes

Fix UnicodeDecodeError when calling Variant.alleles on the emscripten platform.
(@benjeffery, #2754, #2737)

Assets 2

17 May 19:47

github-actions

C_1.1.2

fd72573

C API C_1.1.2

Performance improvements

tsk_tree_seek is now much faster at seeking to arbitrary points along
the sequence from the null tree (@molpopgen, #2661).

Features

The struct tsk_treeseq_t now has the variables min_time and max_time,
which are the minimum and maximum among the node times and mutation times,
respectively. min_time and max_time can be accessed using the functions
tsk_treeseq_get_min_time and tsk_treeseq_get_max_time, respectively.
(@szhan, #2612, #2271)
Add the TSK_SIMPLIFY_NO_FILTER_NODES option to simplify to allow unreferenced
nodes be kept in the output (@jeromekelleher, @hyanwong,
#2606, #2619).
Add the TSK_SIMPLIFY_NO_UPDATE_SAMPLE_FLAGS option to simplify which ensures
no node sample flags are changed to allow calling code to manage sample status.
(@jeromekelleher, #2662, #2663).
Guarantee that unfiltered tables are not written to unnecessarily
during simplify (@jeromekelleher #2619).
Add x_table_keep_rows methods to provide efficient in-place table subsetting
(@jeromekelleher, #2700).
Add tsk_tree_seek_index function

Assets 4

13 Jan 20:29

github-actions

0.5.4

4bad5ec

Python 0.5.4

Features

A new Tree.is_root method avoids the need to to search the potentially
large list of Tree.roots (@hyanwong, #2669, #2620)
The TreeSequence object now has the attributes min_time and max_time,
which are the minimum and maximum among the node times and mutation times,
respectively. (@szhan, #2612, #2271)
The draw_svg methods now have a max_num_trees parameter to truncate
the total number of trees shown, giving a readable display for tree
sequences with many trees (@hyanwong, #2652)
The draw_svg methods now accept a canvas_size parameter to allow
extra room on the canvas e.g. for long labels or repositioned graphical
elements (@hyanwong, #2646, #2645)
The Tree object now has the method siblings to get
the siblings of a node. It returns an empty tuple if the node
has no siblings, is not a node in the tree, is the virtual root,
or is an isolated non-sample node.
(@szhan, #2618, #2616)
The msprime.RateMap class has been ported into tskit: functionality should
be identical to the version in msprime, apart from minor changes in the formatting
of tabular text output (@hyanwong, @jeromekelleher, #2678)
Tskit now supports and has wheels for Python 3.11. This Python version has a significant performance boost. (@benjeffery , #2624 , #2248 )

Breaking Changes

the filter_populations, filter_individuals, and filter_sites
parameters to simplify previously defaulted to True but now default
to None, which is treated as True. Previously, passing None
would result in an error. (@hyanwong, #2609, #2608)

Assets 2

03 Oct 18:50

github-actions

0.5.3

1919abe

Python 0.5.3

Fixes

The Variant object can now be initialized with 64 bit numpy ints as
returned e.g. from np.where (@hyanwong, #2518, #2514)
Fix tree.mrca for the case of a tree with multiple roots.
(@benjeffery, #2533, #2521)

Features

The ts.nodes method now takes an order parameter so that nodes
can be visited in time order (@hyanwong, #2471, #2370)
Add samples argument to TreeSequence.genotype_matrix.
Default is None, where all the sample nodes are selected.
(@szhan, #2493, #678)
ts.draw and the draw_svg methods now have an optional omit_sites
parameter, aiding drawing large trees with many sites and mutations
(@hyanwong, #2519, #2516)

Breaking Changes

Single statistics computed with TreeSequence.general_stat are now
returned as numpy scalars if windows=None, AND; samples is a single
list or None (for a 1-way stat), OR indexes is None or a single list of
length k (instead of a list of length-k lists).
(@gtsambos, #2417, #2308)
Accessor methods such as ts.edge(n) and ts.node(n) now allow negative
indexes (@hyanwong, #2478, #1008)
ts.subset() produces valid tree sequences even if nodes are shuffled
out of time order (@hyanwong, #2479, #2473), and the
same for tables.subset() (@hyanwong, #2489). This involves
sorting the returned tables, potentially changing the returned edge order.

Performance improvements

TreeSequence.link_ancestors no longer continues to process edges once all
of the sample and ancestral nodes have been accounted for, improving memory
overhead and overall performance
(@gtsambos, #2456, #2442)

Assets 2

29 Jul 18:27

github-actions

0.5.2

6c2e27c

Python 0.5.2

Fixes

Iterating over ts.variants() could cause a segfault in tree sequences
with large numbers of alleles or very long alleles
(@jeromekelleher, #2437, #2429).
Various circular references fixed, lowering peak memory usage
(@jeromekelleher, #2424, #2423, #2427).
Fix bugs in VCF output when there isn't a 1-1 mapping between individuals
and sample nodes (@jeromekelleher, #2442, #2257,
#2446, #2448).

Performance improvements

TreeSequence.site position search performance greatly improved, with much lower
memory overhead (@jeromekelleher, #2424).
TreeSequence.samples time/population search performance greatly improved, with
much lower memory overhead (@jeromekelleher, #2424, #1916).
The timeasc and timedesc orders for Tree.nodes have much
improved performance and lower memory overhead
(@jeromekelleher, #2424, #2423).

Features

Variant objects now have a .num_missing attribute and .counts() and
.frequencies methods (@hyanwong, #2390 #2393).
Add the Tree.num_lineages(t) method to return the number of lineages present
at time t in the tree (@jeromekelleher, #386, #2422)
Efficient array access to table data now provided via attributes like
TreeSequence.nodes_time, etc (@jeromekelleher, #2424).

Breaking Changes

Previously, accessing (e.g.) tables.edges returned a different instance of
EdgeTable each time. This has been changed to return the same instance
for the lifetime of a given TableCollection instance. This is technically
a breaking change, although it's difficult to see how code would depend
on the property that (e.g.) tables.edges is not tables.edges.
(@jeromekelleher, #2441, #2080).

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: tskit-dev/tskit

Python 0.6.0

C API C_1.1.3

Python 0.5.8

Python 0.5.7

Python 0.5.6

Python 0.5.5

C API C_1.1.2

Python 0.5.4

Python 0.5.3

Python 0.5.2