Releases: bcgsc/abyss
2.1.3
This release fixes a SAM-formatting bug that broke the ABySS-LR pipeline (Tigmint/ARCS).
abyss-bloom:
- Added
graph
command for visualizing neighbourhoods of the
Bloom filter de Bruijn graph (produces GraphViz)
abyss-fixmate-ssq:
- Fixed missing tab in SAM output which broke ABySS linked reads
pipeline (Tigmint/ARCS)
2.1.2
This release improves scaffold N50 on human by ~10%, due to implementation of a new --median
option for DistanceEst
(thanks to @lcoombe!). This release also adds a new --max-cost
option for konnector
and abyss-sealer
that curbs indeterminately long running times, particularly at low k values.
abyss-pe:
- Use the new
DistanceEst --median
option as the
default for the scaffolding stage
Dockerfile:
- Fix OpenMPI setup
DistanceEst:
- Added
--median
option
konnector:
- Added
--max-cost
option to bound running time
sealer:
- Added
--max-cost
option to bound running time
2.1.1
This release provides bug fixes and modest improvements to
Bloom filter assembly contiguity/correctness. Parallelization
of Sealer has also been improved, thanks to contributions by
@vlad0x00.
abyss-bloom-dbg:
- upgrade to most recent version of ntHash to reduce
some assembly/hashing artifacts. On a human assembly, this
reduced QUAST major misassemblies by 5% and increased
scaffold contiguity by 10% kc
parameter now also applies to MPI assemblies (see below)
abyss-fac:
- change N20 and N80 to N25 and N75, respectively
ABYSS-P:
- add
--kc
option, with implements a hard minimum k-mer
multiplicity cutoff
abyss-pe:
- fix
zsh: no such option: pipefail
error with
old versions ofzsh
(fallback tobash
instead) - adding
time=1
now times all assembly commands
abyss-sealer:
- parallelize gap sealing with OpenMP (thanks to
@vlad0x00!) - add
--gap-file
option (thanks to @vlad0x00!)
DistanceEst:
- add support for GFA output
2.1.0
This release adds support for misassembly correction and scaffolding
using linked reads, using Tigmint and ARCS. (Tigmint and
ARCS must be installed separately.) In addition, simultaneous optimization
of s
(seed length) and n
(min supporting read pairs / Chromium barcodes)
is now supported during scaffolding.
abyss-longseqdist:
- Fix hang on input SAM containing no alignments with MAPQ > 0
abyss-pe:
- New
lr
parameter. Provide linked reads (i.e. 10x Genomics
Chromium reads) via this parameter to perform misassembly
correction and scaffolding using Chromium barcode information.
Requires Tigmint and ARCS tools to be installed in addition
to ABySS. - Fix bug where
j
(threads) was not being correctly passed to
tobgzip
/pigz
- Fix bug where
zsh
time/memory profiling was not being used,
even whenzsh
was available
abyss-scaffold:
- Simultaneous optimization of
n
ands
using line search
or grid search [default]
SimpleGraph:
- add options
-s
and-n
to filter paired-end paths by
seed length and edge weight, respectively
2.0.3
This minor release provides bug fixes and improved reliability for both MPI assemblies and Bloom filter assemblies on large datasets. In addition, many usability improvements have been made to the abyss-samtobreak
program for misasssembly assessment.
overall:
- Many compiler fixes for GCC >= 6, Boost >= 1.64
- Read and write GFA 2 assembly graphs with
abyss-pe graph=gfa2
- Support reading CRAM via samtools
abyss-bloom:
- New
abyss-bloom build -t rolling-hash
option, to
pre-build input Bloom filters forabyss-bloom-dbg
- Fix incorrect output of
abyss-bloom kmers -r
(thanks to @notestaff!)
abyss-bloom-dbg:
- New
-i
option to read Bloom filter files built by
abyss-bloom build -t rolling-hash
- Improved error branch trimming (reduces number of
small output sequences) - Fix intermittent segfaults caused by non-null-terminated
strings
abyss-map:
- Append BX tag to SAM output (Chromium 10x Genomics data)
ABYSS-P:
- Increase default number of sparsehash buckets from
200,000,000 => 1,000,000,000 - Benefit: Allows larger datasets to be assembled without
time-consuming sparsehash resize operations (e.g. H. sapiens) - Caveat: Increases minimum memory requirement per
CPU core from 89 MB to 358 MB
abyss-pe:
- Parallelize
gzip
withpigz
, if available - Report time/memory for each program with
zsh
, if available - Fix: use
N
instead ofn
for scaffold stage,
when set by user
abyss-samtobreak:
- New
--alignment-length
(-a
) option to exclude alignments
shorter than a given length - New
--contig-length
(-l
) option to exclude contigs
shorter than a given length - New
--genome-size
(-G
) option, for contiguity metrics
that depend on the reference genome size - New
--mapq
(-q
) option for minimum MAPQ score - New
--patch-gaps
(-g
) option to join alignments
separated by small gaps - New TSV output format with additional contiguity
stats (e.g. L50, NG50) - Fix handling of hard-clipped alignments
abyss-todot:
- New
--add-complements
option
abyss-tofastq:
- New
--bx
option to copy BX tag from from SAM/BAM
to FASTQ header comment (Chromium 10x Genomics
data)
2.0.2
2.0.1
Summary
This release resolves some licensing issues with that were pointed out in 2.0.0. As of 2.0.1, ABySS is now available under a standard GPL-3 license, and the libraries included under lib/rolling-hash
and lib/bloomfilter
are now also licensed under GPL-3. For alternative licensing terms, please contact Patrick Rebstein (prebstein at bccancer.bc.ca).
2.0.0
Summary
This release introduces a new Bloom filter assembly mode that enables large genome assemblies with minimal memory (e.g. 34 GB for H. sapiens with 76X coverage bfc-corrected reads). Bloom filter assemblies are currently less contiguous than the default (MPI) assembly mode but are still of high quality (e.g. 3.5 Mbp vs. 4.8 Mbp scaffold NG50 for H. sapiens). Bloom filter assembly mode is enabled by adding three 'abyss-pe' parameters (B = Bloom filter size, H = number of Bloom filter hash functions, kc = k-mer coverage threshold). See 'README.md' for an example.
This release also updates several 'abyss-pe' parameter defaults to be more suitable for large genome assemblies with recent Illumina data. In addition, ABySS 2.0.0 includes minor usability improvements for 'abyss-sealer' and removes an unnecessary build dependency on sqlite3.
ChangeLog
2016-08-30 Ben Vandervalk [email protected]
- Release version 2.0.0
- New Bloom filter mode for assembly => assemble large genomes
with minimal memory (e.g. 34G for H. sapiens) - Update param defaults for modern Illumina data
- Make sqlite3 an optional dependency
abyss-bloom:
- New 'compare' command for bitwise comparison of Bloom filters
(thanks to @bschiffthaler!) - New 'kmers' command for printing k-mers that match a Bloom filter
(thanks to @bschiffthaler!)
abyss-bloom-dbg:
- New preunitig assembler that uses Bloom filter
- Add 'B' param (Bloom filter size) to 'abyss-pe' command to enable
Bloom filter mode - See README.md and '--help' for further instructions
abyss-fatoagp:
- Mask scaftigs shorter than 50bp with 'N's (short scaftigs
were causing problems with NCBI submission)
abyss-pe:
- Update default parameter values for modern Illumina data
- Change 'l=k' => 'l=40'
- Change 's=200' => 's=1000'
- Change 'S=s' => 'S=1000-10000' (do a param sweep of 'S')
- Use 'DistanceEst --mean' for scaffolding stage, instead of
the default '--mle'
abyss-sealer:
- New '--max-gap-length' ('-G') option to replace unintuitive
'--max-frag'; use of '--max-frag' is now deprecated - Require user to explicitly specify Bloom filter size (e.g.
'-b40G') - Report false positive rate (FPR) when building/loading Bloom
filters - Don't require input FASTQ files when using pre-built Bloom
filter files
konnector:
- Fix bug causing output read 2 file to be empty
- New percent sequence identity options ('-x' and '-X')
- New '--alt-paths-mode' option to output alternate connecting
paths between read pairs
README.md:
- Fix documentation of ABYSS and abyss-pe parameters
(thanks to @nsoranzo!)
1.9.0
Summary
This release introduces a new paired de Bruijn graph mode for assembly. In paired de Bruijn graph mode, ordinary k-mers are replaced by k-mer pairs, where each k-mer pair is separated by a fixed-size gap. The primary advantage of paired de Bruijn graph mode is that the span of a k-mer pair can be arbitrarily wide without consuming additional memory, and thus provides improved scalability for assemblies of long sequencing reads.
This release also introduces a new tool called Sealer for closing scaffold gaps, new Konnector functionality for producing long pseudo-reads, and support for the DIDA (Distributed Indexing Dispatched Alignment) parallel alignment framework.
ChangeLog
2015-05-28 Ben Vandervalk [email protected]
- Release version 1.9.0
- New paired de Bruijn graph mode for assembly.
- First official release of Sealer, a tool for closing
scaffold gaps by navigating a Bloom filter de Bruijn graph. - New outward extension feature for Konnector to generate
long pseudo-reads. - Support for the DIDA (Distributed Indexing Dispatched
Alignment) framework, for computing sequence alignments
in parallel across multiple machines. - Unit tests can now be run easily with 'make check', without
external dependencies.
abyss-bloom:
- abyss-bloom 'build' command now supports
-j
option for
multi-threaded Bloom filter construction.
abyss-map:
- New
--protein
option for mapping protein sequences.
abyss-pe:
- New paired de Bruijn graph mode for assembly. Enable by
settingk
to the k-mer pair span andK
to size of an
individual k-mer in a k-mer pair. See README.md for further
details. - New
aligner=dida
option for using the DIDA parallel alignment
framework. See the DIDA section of the abyss-pe man page
for usage details. - New
graph=gfa
option to use the GFA (Graphical
Fragment Assembly) format for intermediate graph files.
abyss-sealer:
- New tool for closing scaffold gaps by navigating a Bloom
filter de Bruijn graph - See Sealer/README.md or abyss-sealer man page for details
and examples.
konnector:
- New
--extend
option for extending merged and unmerged
reads outwards in the de Bruijn graph.
1.5.2
Summary
In this release we introduce Konnector, a fast and memory-efficient tool to fill the gap between paired-end reads. Konnector determines the intervening sequence by building a Bloom filter de Bruijn graph and searching for paths between paired-end reads within the graph. A companion tool called abyss-bloom is also provided which can be used to construct reusable bloom filter files for input to Konnector; otherwise, Konnector will build an in-memory Bloom filter for one-time use. In addition to Konnector, we have fixed bugs related to compiling with GCC 4.8+ and parsing BWA output SAM files.
ChangeLog
2014-07-09 Anthony Raymond [email protected]
- Release version 1.5.2
- First official release of Konnector and abyss-bloom.
- More GCC 4.8+ fixes! Modified Boost install instructions.
- Fixed rare bug when parsing output of BWA.
ABYSS:
- New option, --mask-cov, use kmers with lowercased bases, but
don't count them towards multiplicity.
abyss-bloom:
- Construct reusable Bloom filter files for use with Konnector.
- Perform boolean operations on two or more bloom filters.
Currently supports union and intersection operations.
abyss-fixmate:
- Check for boost 1.43+ when using
unordered_map::quick_erase
. - New option, --all, to report all alignments.
- Set mate unmapped flag for mateless reads.
abyss-longseqdist:
- Fixed
error: invalid CIGAR
when reading BWA output.
configure:
- Include mpi and boost libraries as system libraries. Silences
warnings (treated as errors) when compiling with GCC 4.8+.
konnector:
- Merge read pairs into a single sequence (pseudoread) by
building a Bloom filter de Bruijn graph and searching for paths
between the paired end reads. Input reads may be
FASTA/FASTQ/SAM/BAM. The input files must be sorted by read name
and may not contain orphan reads.