Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TreeSAPP Create Bug When Using Guarantee Flag #98

Open
janstett opened this issue Oct 1, 2024 · 0 comments
Open

TreeSAPP Create Bug When Using Guarantee Flag #98

janstett opened this issue Oct 1, 2024 · 0 comments
Assignees
Labels
bug Unexpected error raised? Weird results? Use this label.

Comments

@janstett
Copy link

janstett commented Oct 1, 2024

I noticed that when creating reference packages that have guaranteed sequences from TIGRFAM, the header gets truncated and as a result, when querying the NCBI, the sequences gets misclassified as "r__Root".

For example for NapA, here's the base treesapp create command

treesapp create -c NapA -p 0.85 --min_taxonomic_rank c -n 16 -i RefPkgs/Nitrogen_metabolism/Denitrification/NapA/ENOG501NS3T.faa --guarantee RefPkgs/Nitrogen_metabolism/Denitrification/NapA/TIGR01706.faa --cluster --trim_align --outdet_align --headless --fast --overwrite -o TS_Make_Lin_Table_For_Eval/Base/NapA/ --profile RefPkgs/Nitrogen_metabolism/Denitrification/NapA/TIGR01706.HMM --deduplicate --min_seq_length 600

For the TIGRFAM file, here are the sequence headers:

SP|Q56350|NAPA_PARDT/2-831
SP|P39185|NAPA_ALCEU/2-831

If you look at both the accession table and any trees that are generated for this package
These both get truncated to:
SP| r__Root

When they should be:
Q56350 r__Root; d__Bacteria; p__Pseudomonadota; c__Alphaproteobacteria; o__Rhodobacterales; f__Paracoccaceae; g__Paracoccus; s__Paracoccus pantotrophus

P39185 r__Root; d__Bacteria; p__Pseudomonadota; c__Betaproteobacteria; o__Burkholderiales; f__Burkholderiaceae; g__Cupriavidus; s__Cupriavidus necator

When removing the prefix of the headers, this fixes the issue, however, I'm wondering if the header truncation needs to be addressed.

However, this doesn't seem to be an issue when running treesapp with these prefixes in the base fasta input file (Example for RadA), or when treesapp update is used, after which the final clustered sequences go to treesapp create.

  • TreeSAPP Version [e.g. 0.11.4]
@janstett janstett added the bug Unexpected error raised? Weird results? Use this label. label Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Unexpected error raised? Weird results? Use this label.
Projects
None yet
Development

No branches or pull requests

2 participants