Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRAM files have zero-length blocks with a compression method associated with them (so cannot be decompressed). #1633

Open
jkbonfield opened this issue Nov 8, 2022 · 1 comment

Comments

@jkbonfield
Copy link

Description of the issue:

htsjdk produces CRAM files with zero length blocks that have a compression codec listed. An empty block post-compression is valid. An empty-block in RAW (uncompressed) is also valid, but if it states it is compressed by a specific codec then the contents of the block should be a valid byte stream for that codec (even if it decodes to zero bytes).

An example from SAMEA3302751.alt_bwamem_GRCh38DH.20200922.Finnish.simons.cram view using cram_dump:

        Block 4/36
            Size:         0 comp / 0 uncomp
            Method:       RANS0 (4)
            Content type: EXTERNAL
            Content id:   3
            Keys:         RI 

Your environment:

This was tested using htsjdk 2.26.11 with build 11.0.16+8-post-Ubuntu-0ubuntu118.04.

Steps to reproduce

The easiest way to reproduce this to convert the above file back to CRAM again using SamFormatConverter. I did this to validate it still happens and it isn't just a historic problem.

Expected behaviour

Zero length blocks should ideally just not be stored, as they're not required, but if it's easier code-wise to keep them there then the method field should be RAW, so no attempt is made to uncompress them.

See zaeleus/noodles#131 for a case where this triggered an error in a spec-compliant decoder.

@lbergelson
Copy link
Member

@jkbonfield Thanks for reporting this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants