-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
indexing large genome bam file #334
Comments
We have a similar problem when working with long chromosomes, for example with wheat The offending line is Line 79 in 9dffdee
As the previous poster says, this problem could be alleviated using a That is, instead of using
this should be sufficient (added -c)
The csi index should otherwise be completely compatible with the bai, except for it's name of course. |
The wrappers.py is no longer used (in fact I don't know if it was ever used). In is the script mini_align where some change may be required in order to produce a |
I quick look at the code suggests I might be wrong on that account and the core code qill happily accept a |
Ok, thanks. I submitted a PR. For others, if ONT can't or won't accept the PR due to other breakages being caused, you can modify the
|
I'm working on a very large genome (4.5 G), when we run medaka, we gets the following error which seems to be related to bam indexing process because of the large genome, I know it would work with samtools -c which generates bam index csi and tried to modified the Wrappers.py and add -c option to the samtools command, yet getting same error, so I think I need to know where should I make this modification?
[M::worker_pipeline::14674.2757.95] mapped 126569 sequences
[M::worker_pipeline::15235.0637.95] mapped 127219 sequences
[M::worker_pipeline::15788.7047.95] mapped 127124 sequences
[M::worker_pipeline::16348.9997.95] mapped 129707 sequences
[M::worker_pipeline::16904.9507.96] mapped 128537 sequences
[M::worker_pipeline::17465.0347.96] mapped 127442 sequences
[M::worker_pipeline::18035.1987.96] mapped 127657 sequences
[M::worker_pipeline::18569.9777.96] mapped 130177 sequences
[M::worker_pipeline::18903.816*7.96] mapped 77228 sequences
[M::main] Version: 2.22-r1101
[M::main] CMD: minimap2 -x map-ont --MD -t 8 -a -A 2 -B 4 -O 4,24 -E 2,1 /home/dnalinux/efs/g34xl/Pisum_sativum_v1a.fa.mmi /home/dnalinux/efs/medaka/fastq/allfastq.gz
[M::main] Real time: 18904.149 sec; CPU: 150472.292 sec; Peak RSS: 14.238 GB
[bam_sort_core] merging from 40 files and 8 in-memory blocks...
[E::hts_idx_check_range] Region 536869540..536872741 cannot be stored in a bai index. Try using a csi index
[E::sam_index] Read '81baeba4-d17c-466a-ae19-73eeef0a548a' with ref_name='chr5LG3', ref_length=579269071, flags=0, pos=536869541 cannot be indexed
samtools index: failed to create index for "calls_to_draft.bam": Numerical result out of range
Failed to index alignment file.
Failed to run alignment of reads to draft.
Thanks
The text was updated successfully, but these errors were encountered: