You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using vsearch sintax to assign sequences taxonomically. One of the reference databases I'm using does not contain the domain rank for references, just the phylum rank and upwards. I've been asked to manually add the domain rank, so I did, but I noticed that the confidence values of the sintax results have changed slightly for some sequences.
I have attached example files for reproducibility here. When I assign taxonomy to example_sequence.fasta using reference_no_domain.fasta, the result is: p:Ascomycota(1.00),c:Eurotiomycetes(1.00),o:Eurotiales(1.00),f:Trichocomaceae(1.00),g:Talaromyces(0.69),s:Talaromyces_marneffei(0.69)
However, when I use reference_with_domain.fasta, I get: d:Eukaryota(1.00),p:Ascomycota(1.00),c:Eurotiomycetes(1.00),o:Eurotiales(1.00),f:Trichocomaceae(1.00),g:Talaromyces(0.70),s:Talaromyces_marneffei(0.70)
Note that the confidence value decreased by 0.01 for the genus and species level. I have observed some cases in which the decrease is even bigger. Note that all sequences in the references are from Eukaryota in this example.
I've tried to understand the algorithm via the usearch website and sintax paper but was unable to find anything that hinted at an explanation for this. Would you be able to explain to me how the addition of the domain rank impacts the confidence values?
Thank you!
The text was updated successfully, but these errors were encountered:
Hello,
I'm using vsearch sintax to assign sequences taxonomically. One of the reference databases I'm using does not contain the domain rank for references, just the phylum rank and upwards. I've been asked to manually add the domain rank, so I did, but I noticed that the confidence values of the sintax results have changed slightly for some sequences.
I have attached example files for reproducibility here. When I assign taxonomy to example_sequence.fasta using reference_no_domain.fasta, the result is:
p:Ascomycota(1.00),c:Eurotiomycetes(1.00),o:Eurotiales(1.00),f:Trichocomaceae(1.00),g:Talaromyces(0.69),s:Talaromyces_marneffei(0.69)
However, when I use reference_with_domain.fasta, I get:
d:Eukaryota(1.00),p:Ascomycota(1.00),c:Eurotiomycetes(1.00),o:Eurotiales(1.00),f:Trichocomaceae(1.00),g:Talaromyces(0.70),s:Talaromyces_marneffei(0.70)
Note that the confidence value decreased by 0.01 for the genus and species level. I have observed some cases in which the decrease is even bigger. Note that all sequences in the references are from Eukaryota in this example.
I've tried to understand the algorithm via the usearch website and sintax paper but was unable to find anything that hinted at an explanation for this. Would you be able to explain to me how the addition of the domain rank impacts the confidence values?
Thank you!
The text was updated successfully, but these errors were encountered: