Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

terminate called after throwing an instance of 'std::runtime_error' what(): BUG: dead end in get_color_set_id #31

Open
dbu008 opened this issue Oct 21, 2023 · 13 comments

Comments

@dbu008
Copy link

dbu008 commented Oct 21, 2023

Hi Themisto developers,
After running themisto build I wanted to use pseudoaligment.
I am getting an error:
terminate called after throwing an instance of 'std::runtime_error'
what(): BUG: dead end in get_color_set_id

The command is:
themisto pseudoalign --query-file /cluster/work/users/.../Repair_4/1/abc_1.repair.fastq.gz --out-file /cluster/work/users/.../mSWEEP/1/abc_R1_Kleb_pseudo_alignment.aln --rc --index-prefix /cluster/work/users/.../mSWEEP/themisto_index/themisto --temp-dir /cluster/work/users/.../mSWEEP/tmp_sweep --n-threads 8 --sort-output-lines --gzip-output

Could you tell me where is the problem coming from?

I am using themisto_linux_v3.2.0.

Best,
Dorota

@jnalanko
Copy link
Collaborator

I would like to try to reproduce the crash to debug it. Is the data used in the indexing available?

In the meantime, as a quick workaround, constructing the index with the flag -d 1 might solve your problem. This makes the index larger, but does not affect the pseudoalignment answers.

@dbu008
Copy link
Author

dbu008 commented Oct 21, 2023

Thank you jnalanko for a fast answer.
The data are published, however the assemblies are taken from two different papers and connected with other defults. However, I can send you the file with a pleasure. Could you tell me how I can send? It is relatively a big file.

I will create the index with -d 1 and see how it goes.

Thx:)

@jnalanko
Copy link
Collaborator

Could you upload it to Dropbox or Google drive and share the link?

@dbu008
Copy link
Author

dbu008 commented Oct 21, 2023

I will share with the Google drive link.
Sorry it takes so long.

[https://drive.google.com/file/d/1o9DkYThgIk3Vynj9ryQXPG3MwpTByayD/view?usp=share_link]

Please let me know if it works and I can delete the file from google?

@jnalanko
Copy link
Collaborator

Downloaded! Thank you. You can delete the file now. One more request: could please you give the command you used to build the index?

@dbu008
Copy link
Author

dbu008 commented Oct 22, 2023

Thank you for all the help:)
I am still running with the -d 1 as you mentioned.
The command I used previously is:
themisto build --k 31 --input-file /cluster/all_sequences_extra.fasta.gz --sequence-colors --index-prefix /cluster/themisto --temp-dir /cluster/tmp_themisto --mem-gigas 2 --n-threads 4

Best,
Dorota

@dbu008
Copy link
Author

dbu008 commented Oct 25, 2023

Hi again,
I finished the themisto index with the -d 1 and I am still getting the same error in pseudoalignment part.

terminate called after throwing an instance of 'std::runtime_error'
what(): BUG: dead end in get_color_set_id
caught signal: 6
Cleaning up temporary files
Aborting

@jnalanko
Copy link
Collaborator

That is strange. We're still working on this. Hang tight.

@jnalanko
Copy link
Collaborator

I think I found the problem! The following sequence in your input contains an empty line at the end, which messes up our FASTA parser:

AP007209.1_Bacillus_cereus_C7401_genomic_DA_complete_genome

So, to fix this, please remove ALL empty lines from the input fasta file.

For future, I made it so that the parser now detects this situation, and crashes with an error message: 782009d

This error message will be included in the next Themisto release, which will be out shortly.

@jnalanko
Copy link
Collaborator

jnalanko commented Oct 27, 2023

We could also make it so that empty lines would just be ignored, but for example the FASTA spec at NCBI says that no empty lines are allowed in the format, so maybe it's better to crash with an error message because empty lines could indicate some problem with the data.

https://blast.ncbi.nlm.nih.gov/doc/blast-topics/

@dbu008
Copy link
Author

dbu008 commented Oct 27, 2023

Thank You!!!!
Thank You!!!

I will remove all blank lines, I did not know I have it in the input file at all!
I will run the index soon.

Thank you, have a great day
Dorota

@jnalanko
Copy link
Collaborator

Let me know if it works. In that case, I will close the issue.

@jnalanko
Copy link
Collaborator

jnalanko commented Nov 9, 2023

Hi Dorota, it's been two weeks. Is your issue fixed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants