Skip to content

Releases: samtools/htsjdk

2.20.2

14 Aug 20:46
2.20.2
Compare
Choose a tag to compare

This is a bug fix release which fixes a regression in 2.20.0 that caused problems when using Kryo serialization to serialize AbstractFastaSequenceFile.

Changelist:

9401637 Make AbstractFastaSequenceFile serializable by Kryo for Spark. (#1408)
cea307e Added option to log with suppliers (#1406)

2.20.1

05 Aug 21:27
2.20.1
Compare
Choose a tag to compare

This is a bug fix release which fixes #1404, a minor regression introduced in 2.20.0

833a0e2 fix bug in VariantContextBuilder (#1403)

2.20.0

26 Jul 21:25
1d4a316
Compare
Choose a tag to compare

This release includes significant new features and numerous bug fixes. We recommend that users should upgrade to this version.

Compatibility Notes:

There are a few minor incompatibilities between this and 2.19.0. Most users should have no problems upgrading.
Incompatible changes:

  • There continue to be major changes to the unstable htsjdk.samtools.cram package.
  • BCFVersion became final invalidating any subclasses.
  • There were several changes to methods in the class CRAMBAIIndexer

New features:

Support Reading VCF4.3

Htsjdk can now read version 4.3 vcfs produced by other software. It can still only produce version 4.2. (#1381, #1359)

Write Compressed References:

You can now write indexes fasta.gz files with FastaReferenceWriterBuilder (#1340)

IntervalList improvements:

  • There's a new IntervalListCodec which lets you parse .interval_list files using tribble (#1327)
  • Improved performance of some interval operations when operating on large IntervalLists (#1356, #1384)

Developer documentation improvements:

  • We added a Code of Conduct to explicitly state our goals of being a friendly, safe, and professional project where we hope anyone will feel comfortable contributing. (#1390)
  • Added style guides for Intellij and Eclipse. Running automated style checking of newly submitted or edited code is encouraged.(#1386, #1391)
  • Added a description of how to interpret the version number and what it means for compatibility (#1392)
  • Updating issue and PR templates. (#1393)
  • We now run Spotbugs as part of our build to identify and reject certain classes of bugs automatically. This is still new and we may continue to experiment with which rules to apply. (#1330, #1331)

Other Improvements:

Complete change list:

1d4a316 Make VariantContextBuilder safer (#1344)
45cfc08 Handle PEDIGREE header lines differently for vcf4.2 vs vcf4.3. (#1401)
c5ed6b7 Insertions accumulating in AbstractLocusIterator bugfix (#903)
edb8c60 Optimize AbstractBAMFileIndex.query() when querying sequentially (#1397)
9aa81ed Add documentation for VariantContext.getStart() regarding telomeric events (#1369)
83da7e7 Make Validation error emit the validation type that was violated (#1395)
3a35b89 CRAM queryAlignmentStart/queryMate fix. (#1164)
765728e Moved file extensions constants to their own class (#1382)
16e9cca Output missing vcf fields as a single . (#1389)
387de12 Adding a code of conduct document for htsjdk. (#1390)
ea1db70 adding README section about style guides (#1391)
a58f432 Describing what our version number means (#1392)
faf3d11 Updating issue and PR templates (#1393)
37ea940 Tolerate mixed case NaNs and Infinities in VCF (#1364)
f68108d Add style guide for IDE's (#1386)
b0decef Substitution matrix refactoring and tests (#1374)
a61e233 Add a function to calculate the value of the OA tag (#1354)
f13d075 Make VariantContextWriterBuilder warn when indexing-on-the-fly is enabled for streams (#1328)
b804a47 VCF codec should handle multiple missing GL fields (#1372)
58199e6 Prevent integer overflow in Interval countBases method. (#1384)
adea7d1 Remove requirement that zero length reads need color space or flow (#1360)
6df21e5 Update readme with info about reading vcf 4.3 (#1381)
d5ac863 Support VCF v4.3 (read only). (#1359)
1e3f1fa Adding a few more semi-colons in shell in travis config. (#1378)
2b83a9f Fix to ensure we only publish snapshots compiled with openjdk-8. (#1376)
1a275e1 Added fasta.gz write support to FastaReferenceWriterBuilder (#1340)
4ae8508 Moved loading of sequence dictionary into an overideable method (#1362)
64e98d6 Removed redundant readability test in unrollPaths method (#1355)
e2c0fdd Support writing a CRAI index from CRAMContainerStreamWriter (#1351)
4747d08 Change SAMTextHeaderCodec to no longer accumulate the entire text of … (#1361)
3b0dd60 SubstitutionMatrix class cleanup - part 1. (#1366)
5f0e045 Small CRAM refactor: common ExternalEncoding Abstract Base Class (#1346)
ae49710 Make interval operations scale better (#1356)
7ce3636 Added name of offending record to error message in SamPairUtil (#1358)
0b9fe0d Remove CramCompressionRecord.tagIds (#1345)
4f62add Allow BCFCodec subclasses to provide custom version compatibility. (#1352)
335f2c1 Improve exception message for unset VCF output type (#1357)
5442f78 CRAM: revert #1326 and fix tests and comments (#1341)
1ece54c Add interval list codec (#1327)
aa89809 Remove multiple versions of Slice/Container getCRAIEntries() (#1329)
9f5d86e Progress Logger prints read names when iterating in queryname order (#1302)
16fecfc Change SortingCollection log statements to DEBUG (#1334)
a82d8ba Test ContainerIO.calculateSliceOffsetsAndSizes() and fix the slice size calculation (#1326)
7925166 Fix or ignore remaining SpotBugs issues (#1331)
f9361ac Fix issues in tests identified by Spotbugs (#1330)
5dcfd73 Fix signing task in build.gradle (#1325)
2b5f3bc Reject BCF files when minor version doesn't match the current implementation. (#1324)

2.19.0

18 Mar 19:58
2.19.0
019465f
Compare
Choose a tag to compare

This release contains significant new features and many bug fixes.

Compatibility Notes:

This release is neither binary or source backwards compatible with 2.18.2 but upgrading should be painless for the majority of users. There may be some minor source changes necessary when recompiling against 2.19.0. If you encounter difficulties please contact us.

Binary/Source Compatibility issues:

  • The method SamRecord.getIndexingBin() has been removed. It was a cache of a bam format specific field which could often get out of sync with correct value. Users with their own subclasses of SamRecord may need to make changes. Uses of getIndexingBin() can be replaced by computeIndexingBin().
  • Various static fields have been made final. These fields are not intended to be update by user code.
  • There continues to be significant changes in the unstable htsjdk.samtools.cram cram package.
  • Signature change in SAMSequenceDictionaryCodec:
    SAMSequenceDictionaryCodec(BufferedWriter) -> SAMSequenceDictionaryCodec(Writer)
  • Signature change: AbstractBAMFileIndex.position() returns a long instead of an int now.
  • Removed throws clause from methods:
    • CRAMContainerStreamWriter.flushContainer( )
    • SequenceUtil.calculateMD5String
    • multiple overloads of `CRAMIterator.CRAMIterator

Changes in behavior:

  • IntervalList.fromFiles() no longer calls unique() on the returned interval list. (#1273)

New Features:

CSI index Support:

We can now use CSI indexes generated by other tools. (#1040, #1314, #447)

Fasta Writing:

We can now write fasta files with FastaReferenceWriter (#1172, #1285)

Java 11 support:

We now build and test with Java 11. This is fairly experimental and our downstream acceptance tests haven't been run on 11. We still target java 8 so no new features of 9/10/11 can be used in code yet. (#1291)
IntervalList improvements: Reduced in memory size of IntervalList and OverlapDetector and added support for read/writing them into sorting collections / from Paths.

IntervalList Improvements:

  • New support for writing/reading large interval lists from on disk SortingCollection. (#1288)
  • Reduced memory footprint IntervalList and OverlapDetector. (#1309)
  • IntervalLists can now be written to Paths. #1297 (#1298)
  • IntervalList.fromFiles() no longer calls unique on the returned interval list. (#1273)

CRAM Improvements:

Many other bug fixes and minor additions.

Complete Change List:

019465f Adding back a removed method (#1321)
fe27e66 Fixing zero-length interval bug in IntervalList.merge (#1318)
b321d91 Fix CSI bug when querying files with long references. (#1314)
dd313de Deprecate TestUtil.deleteRecursively (#1315)
68199fb Revert singletonList optimization in SamRecordSetBuilder (#1317)
d678af3 Fix bug in BlockCompressedInputStream.checkTermination() (#1310)
7b3c7a6 Fix bug when loading indexed bgzip fasta file. (#1311)
9f84b7b Optimizations to reduce in-memory size of IntervalList and OverlapDetector (#1309)
d25bbb4 Add ability to generate reference from SamRecordSetBuilder (#1286)
1509dcc Consolidate common code into CRAMStructureTestUtil: (#1312)
b1cb410 Fix toFile call that prevents IntervalListWriter from writing to Paths (#1298)
37b2e87 CRAM: formalize Slice Type with an enum (#1274)
b58a5a9 CRAM: Only calculate alignmentDelta as needed for records (#1304)
62388a2 A few fixes for issues found by spotbugs (#1278)
0b16296 Simplify CRAM sequence dictionary extractor to not require a fake reference. (#1308)
a6c5837 remove caching of alleles in AbstractVCFCodec (#1282)
2d2922f Load BlockCompressedIndexedFastaSequenceFile and GZIIndex from streams (#1259)
205d5f0 Add support for CRAM in SamSequenceDictionaryExtractor (#1305)
d6043f0 Changing the team name/url in the maven pom (#1300)
0cc762f Changed IntervalList fromFiles() so that it doesn't call .unique() on… (#1273)
3b3d107 adding new overloads to IOUtils with Path for some file only methods (#1296)
5e8b1fa Update java version information in the README (#1299)
061e217 update CramCompressionRecord.isPlaced() to make it APDelta-aware (#1284)
efe4abf Remove redundant and unused BAM_FLAGS from CRAM code. (#1292)
e8e0a6f Add an IntervalCodec that use useful for sorting large sets of Intervals (#1288)
d771b30 rm CramHeader.clone() (#1283)
c0642fb Build and test on Java 11 (#1291)
4c8dfbd Changes to FastaReferenceWriter (#1285)
38bfe65 moving HttpUtilsTest to externalApi test task (#1289)
62fc0b1 Use Integer.parseInt over Integer.valueOf to avoid unnecessary boxing (#1275)
77b3b8f Fix fields that should be final, as reported by SpotBugs (#1268)
52169ed Moving the SRA tests to a separate env (#1272)
3ae552f Move bad coordinates check (#911)
942e3d6 Ported GATK's FastaReferenceWriter (#1172)
e86af96 makeSAMOrBAMWriter accept only .sam/.bam (#834)
c189878 CRAM: Container and Slice states (#1266)
5ef9223 Add support for Sam Header Readgroup Barcode field (#1210)
3c48018 Fix bug in IntervalList.getUniqueIntervals that caused missing interval names (#1265)
d0b1a74 Adding some constants due to additions to hts-spec (#1241)
93250d5 Parse valid VCF 4.2 with ##INFO containing Source + Version (#1248)
5217fe4 Misc CRAM cleanup (#1253)
16a4e37 Support for reading CSI indexes for BAM files (#1040)
15ec7da Only compute BIN on BAM write and on index building (#1258)
94f0967 Immutable CRAIEntry (#1256)

2.18.2

16 Jan 22:24
2.18.2
Compare
Choose a tag to compare

This is a small release which includes bug fixes and some new methods and performance improvements.

Compatibility notes:

This is compatible with 2.18.1 with the exception of the unstable htsjdk.samtools.cram package which continues to undergo major changes.

Disallowed Contig Names:

The following characters have been disallowed from being included in contig names: \ , "'`` () <> [] {} (#1238)

These are being disallowed in future versions of the SAM spec because they can introduce parsing ambiguities. We've chosen to disallow them in all versions of files read or produced by htsjdk going forward which is stricter than the spec. A scan of a large number of references found that they were used in only a tiny fraction of references. We believe this shouldn't cause users problems since these characters do not appear in common references, if we're wrong and this change causes you problems please get in touch and let us know.

Deprecations:

The classes in the htsjdk.samtools.apps package are deprecated and will be removed in the future.

Complete Change list

68d9fdb Fix to ensure that SeekableStream#available() never returns a negative value. (#1255)
5a0ab68 Fix nitpick in IntervalKeepPairFilterTest (#1254)
2c97cce IntervalKeepPairFilter now filters out single ended reads (#1252)
2737292 Deprecated htsjdk.samtools.apps package. (#1250)
0bf5ff6 Add getters to VariantContextBuilder (#1247)
a4b7da8 resolving some nitpicks from pr #1245 (#1246)
2473407 Added ability to get a VariantContexts from an InputStream (#1245)
4a7cb03 Adding some sanity checking to IntervalList reading. (#1230)
28dde96 Add support for reading and writing splitting BAM index files. (#1138)
37f0789 HttpUtils: set Method to HEAD for HttpUtils.getHeaders (#1191)
fb2f42c Add finals for issues raised in review of PR 1091. (#1240)
8cc1e37 Disallowing bad characters in SamRecord names (#1238)
6ac7a60 Change CRAM validation error reporting granularity from container to record (#1091)
1126e5c CRAM: Refactor Block hierarchy (#1231)
d2360ff Replaced several for-each loops in VariantContext.Make() based on HaplotypeCaller profiling (#1234)

2.18.1

04 Dec 18:24
2.18.1
Compare
Choose a tag to compare

A small release which includes a fix for a cram bug (#1233) introduced in 2.17.0 by (#1199)

This is compatible with 2.18.0 with the exception of the htsjdk.samtools.cram package which as had major changes.

41c4634 Add isRefOnlyBlock function... (#1215)
4ff5190 Relax Beta Encoding requirement to allow 0 bits (#1233)
c596e6b CRAM: Refactor Encodings and Codecs (#1224)

2.18.0

16 Nov 22:23
2.18.0
698a4c3
Compare
Choose a tag to compare

This is a smallish release with some new features. The biggest change is that bams created by htsjdk are now version 1.6. This should really have happened when started producing bams with long CIGAR support (CG tag) since that's the marquee 1.6 feature.

Compatibility:

This release is not backwards compatible with 2.17.0 but upgrading should be easy for most users.

Incompatible changes

  • Delete long deprecated FixBAMFile and SAMTools classes (#1213, #955, #972)
  • There continue to be major changes to the to htsjdk.samtools.cram package.

New Deprecations:

  • Deprecating SAMTagUtils, use SAMTag instead.
  • Deprecating tags in SAMTag that are deprecated in the SAM spec. These are deprecated to discourage use, but will not be removed.

New features and Changes

Sam 1.6 support:

  • Bam output now is designated version 1.6. (#1211)
  • Update the list of tags in SAMTag to include all reserved tags:
  • Deprecating tags that are deprecated in the sam spec, and deprecating SAMTagUtil, it's methods have been merged into SAMTag (#1208, #1227)
  • Added first class Description field to SAMSequenceRecord (#1209)
  • Deprecate SQTagUtil (#1214)

Known issue:

  • non-ascii unicode characters are not supported even though these are now allowed in certain fields. (#1202)

Addition of method to FeatureCodec

  • new method FeatureCodec.getPathToDataFile(String path)
    This allows a new class of codecs that identify a different file as readable from the one that actually contains the data. This has a default implementation and should require no changes from downstream users unless they have custom implementations of FeatureReader that do not go through AbstractFeatureReader (#1223)

More CRAM work:

Work on refactoring and improving the cram code continues.

  • Add hashCode() to classes with equals() and clean up equals() (#1222)
  • DataSeries/DataReader/DataWriter refactor (#1219, #453)
  • Be explicit about spec IDs instead of using ordinal() (#1221)
  • More encoding tests + updates (#1203)

Other Updates

Removing cruft leftover from JAXB (#1207)
Update the README with working test examples (#1225, #984)
Getter for generic fields in VCFSimpleHeaderLine (#1212)

2.17.0

31 Oct 17:17
2.17.0
Compare
Choose a tag to compare

This is a small release that includes a number of bug fixes and minor enhancements.

Compatibility Notes:

This release is not backwards compatible with 2.16.1 due to changes in the cram code. We believe most users should be able to upgrade without issue.

htsjdk.samtools.cram package instability:

This release includes the beginning of work towards a rewrite of much of the cram code. This code is not structured like the rest of htsjdk and needs large compatibility breaking changes compatible changes in order to be brought in line with the rest of the codebase. We will be treating the code in this htsjdk.samtools.cram package as unstable and will not be providing deprecation warnings in releases before altering it like we try to do with most of the codebase.

We believe that there are few downstream users who directly call into that code and most users only interact with cram files through the CramFileWriter and CramFileReader which will not be changing. Please get in contact if you make extensive use of classes in the htsjdk.samtools.cram package and are concerned about these changes.

The end result will be better cram code that should run faster, have fewer bugs, and include more features that are not yet implemented in htsjdk.

JAXB removed

In order to improve htsjdk compatibility with java 9+ we have removed all uses of the javax.xml.bind package. This breaks the ability to marshal SAMFileHeader to XML, but since this was the only class that could be converted in this way we believe that there will be no users effected by this change.

Complete change list

c484241 MergeSamFiles accept SO:UNKNOWN (#1069)
f00a754 Remove JAXB (#1206)
44baddf Adding support for 0-length B arrays in SAM files to conform to 1.6 spec (#1194)
334800e Add the PS FORMAT VCF standard header field (#1200)
bbc674f BetaIntegerCodecTest and bugfixes (#1199)
37069a3 Unit Tests and fixes for a few classes in samtools.cram.io (#1198)
d504256 fix CRAMIterator when next() is called without hasNext() (#1193)
1971f4e Adding Allele constants for simple SV types(#1192)
23f3223 Add 6 CRAM compliance tests from htslib (#1185)
ff3db93 Bug fix: BinaryCodec should not fail when both reading and no bytes are requested. (#1188)
a7214ca Add a reset() method to ProgressLoggerInterface. (#1184)
49c70e5 fixing bug in SeekableHttpStream.read (#1182)
a762262 cleanup bam order checking code (#770)

2.16.1

07 Sep 19:20
2.16.1
fb36a36
Compare
Choose a tag to compare

This is a relatively small release which includes bug fixes and some new features. Users should upgrade from 2.16.0.

New Feature: Write VCF to Paths

You can now write vcfs to NIO Paths including gcs Paths.
4902ee9 VCFWriter accepts Path (#1134)

Compatibility:

This release is binary compatible with 2.16.0. There are some new overloads of methods which may require minor source changes where they were previously being called with untyped nulls.

Deprecations

Some poorly named methods are now deprecated.
e0bf651 locatable all the things (#1159)
c4dbf4a fixing spelling Indeces -> Indices (#1157)

Other updates and bugfixes:

fb36a36 adding back deleted method(#1178)
e37d8de extending use of IOUtil.toPath() (#967)
85580fc restore no-args VariantContextWriterBuilder.build() (#1177)
16704e6 Use quality header and SAMTag.CO in FastqEncoder (#912)
e637281 Expose VCFCodec version and header fields. (#1176)
57442bb Fix a bug where invalid bytes in BufferedInputStream's buffer were being accessed. (#1175)
5168c79 Fix CRAM container offset calculation (#1167)
e6d5c29 fix an copy-paste error in VCFHeader (#1160)
8767e64 adding position to SAMRecord.toString() (#1158)
c4912e9 update to gradle 4.8 (#1154)
c380053 Fixes for CachingBAMFileIndex (#1156, #1127)
38a24d5 Additional unit tests for IOUtil (#1149)

2.16.0

19 Jun 20:09
2.16.0
96ce8fb
Compare
Choose a tag to compare

This release includes significant new features and bug fixes. It is not technically backwards compatible with 2.15 releases, but most downstream users should not encounter major difficulties when updating.

New Features:

Block compressed indexed FASTA files:

Htsjdk can now read block compressed (.bgz / .gz) fasta files if they have a .gzi and a .fai index. No more need to unzip the reference before reading it!
Load reference files with ReferenceSequenceFileFactory.getReferenceSequenceFile() and it will automatically discover the correct indexes.

A gzi index can be generated with GZIIndex.createIndex() and related methods. Alternatively it can be created with the bgzip program. Note that gzip and bgzip are different formats and gzipped files cannot be index with a gzi. The fai index is used alongside the .gzi for random access to fasta and is unchanged for zipped vs unzipped files.

People directly instantiating the concrete ReferenceSequenceFile classes may want to move to using the factory in order to get access to this feature.
b2bfb32 Support for block-compressed indexed FASTA references (#1014)
c19f9f7 removing new finals in AbstractIndexedFastaSequenceFile (#1135)
#996, #864

SamLocusAndBaseIterator:

New iterator which traverses a bam file and reference together and returns pileup information along with the associated reference bases.
336fa5c Yf sam locus and reference base iterator (#1137)

Expanded support for non File inputs

Added new methods to allow using Path in more places.
ccc9259 Fix reading a CRAM with an index from a non-File Path. (#1125)
3a20218 Add SAMFileWriterFactory methods that take referenceFile as a Path(#1005)
96ce8fb deprecating overloads makeCRAMWriter with mixed Path/File (#1150)

Added additional methods to read/write directly from Streams in cases where a Path or File is not available.
72818a0 Allow ReferenceSequenceFileFactory to load from streams (#1123)
34dbda7 Make BAMFileWriter public to expose code to write a BAM header to a stream. (#1119)
71c5a69 Add method to load tribble index from a stream. (#1145)

Restore SRA support

Support for reading SRA was broken because of changes in the SRA API. A new version of their Java API adapter has been published and is now integrated into htsjdk. Support should be restored.
0aff433 restoring SRA support by updating the backing library (#1142)

Methods moved to IOUtils

Several methods for identifying if a file is block compressed moved to IOUtils. The original methods have been deprecated.
85d2b8d Add utilities for checking block-compressed files in IOUtils #1130 (#1132)
55debe2 adding test for IOUtil.isBlockCompressed (#1141)

Other improvements and bugfixes

9ea0a03 new method CRAMFileReader.createIndexIterator (#1120)
8b55de6 Add new class VCFHeaderReader #1122 (#1148)
52ec082 making public static fields in SamReader.Type final (#1146)
5eb8ee3 Make SamReader.Type methods public (#1144)
efefdd0 update FastqRecord.toString javadoc to warn users of future behavior change (#952)
0d8e1c6 Fixed bug where GL field overrides PL field #1097 (#1098)
497a0f3 made VariantContextWriterBuilder.determineOutputTypeFromFile public (#1066)
f007ed3 Add an optimized version of CramContainerIterator (#1129)
d3d7a6e SAMFileWriterImpl extensibility improvements (#913)
442029c seeking within SeekableBufferedStream's buffer reuses the existing buffer (#1121)