Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No more header requried for ribosomal interval files in CollectRnaSeqMetrics #1965

Closed
wants to merge 1 commit into from

Conversation

jourdren
Copy link

With this pull request, CollectRnaSeqMetrics can new be used with ribosomal interval list file without a SAM header. In this case, CollectRnaSeqMetrics will use the header of the BAM input file.

Description

Creating a valid ribosomal interval list for CollectRnaSeqMetrics is a tedious task as the @sq tags of SAM header of this file must be rigorously the same as in the BAM header of the input file (e.g., the exact order of the chromosomes/sequences must be the same; the length of the chromosomes/sequences must be same).

With the patch in this pull request, CollectRnaSeqMetrics will have the same behavior if a header exists in the ribosomal interval list file. If no header is found in the file, the patch will use the header of the input BAM.

This new behavior will streamline the creation of ribosomal interval list files, as it can be creating with just a GTF file without the need to add the sequence lengths at the beginning from the genome FASTA, GFF3 or input BAM header.


Checklist (never delete this)

Never delete this, it is our record that procedure was followed. If you find that for whatever reason one of the checklist points doesn't apply to your PR, you can leave it unchecked but please add an explanation below.

Content

  • Added or modified tests to cover changes and any new functionality
  • Edited the README / documentation (if applicable)
  • All tests passing on github actions

Review

  • Final thumbs-up from reviewer
  • Rebase, squash and reword as applicable

For more detailed guidelines, see https://github.com/broadinstitute/picard/wiki/Guidelines-for-pull-requests

…ut a SAM header. In this case, CollectRnaSeqMetrics will use the header of the BAM input file.
@yfarjoun
Copy link
Contributor

Given that you can create an intervallist using BedToIntervalList, I'm not sure what problem this is solving. It is however making the tool be less discerning and will lead to folks hand-crafting interval lists without dictionaries rather than making sure that they used the correct dictionary for the file.

👎 from me.

@lbergelson
Copy link
Member

@jourdren It's not clear to me why you can't create an interval list with the appropriate header. Is the existing tooling inadequate? I agree with @yfarjoun that this is an important safety check. Closing this but feel free to reopen a discussion.

@lbergelson lbergelson closed this Jul 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants