Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong sorting of ROSE chrom_sizes and bed #19

Open
LeonHafner opened this issue Sep 19, 2024 · 1 comment
Open

Wrong sorting of ROSE chrom_sizes and bed #19

LeonHafner opened this issue Sep 19, 2024 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@LeonHafner
Copy link
Contributor

Description of the bug

In the ROSE workflow the bed file is sorted using SORT_BED (which uses gnu_sort).
This results in the chromosomes of the bed file being sorted in this order:

chr1
chr10
chr11
chr12
chr13
chr14
chr15
chr16
chr17
chr18
chr19
chr2
chr3
chr4
chr5
chr6
chr7
chr8
chr9
chrM
chrX
chrY

The chrom_sizes gtf (we use the genome index for that), however, looks like that:

chr1    195471971       8
chr2    182113224       198729854
chr3    160039680       383878307
chr4    156508116       546585323
chr5    151834684       705701916
chr6    149736546       860067187
chr7    145441459       1012299351
chr8    129401213       1160164843
chr9    124595110       1291722751
chr10   130694993       1418394457
chr11   122082543       1551267710
chr12   120129022       1675384973
chr13   120421639       1797516156
chr14   124902244       1919944833
chr15   104043685       2046928792
chr16   98207768        2152706549
chr17   94987271        2252551124
chr18   90702639        2349121527
chr19   61431566        2441335887
chrX    171031299       2503791321
chrY    91744698        2677673150
chrM    16299   2770946936

This leads to an error thrown in the process INVERT_TSS, since sorted files are expected here.
We didn't catch this before since we are only testing on chr1.

One option would be to add the -V flag to the SORT_BED process, for sorting the bed file.
This would bring the normal chromosomes in the right order, but we would still get an error with chromosome X, Y and M, since they are not in the natural sorting order in the chrom_sizes file.
Therefore, I would propose sorting the chrom_sizes file with GNU_SORT as well to make sure we always get the right order.

Command used and terminal output

No response

Relevant files

No response

System information

No response

@LeonHafner LeonHafner added the bug Something isn't working label Sep 19, 2024
@LeonHafner LeonHafner self-assigned this Sep 19, 2024
@nictru
Copy link
Collaborator

nictru commented Sep 19, 2024

Sounds reasonable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants