Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Records dropped as duplicates but then 'added back' as types are absent from named_type_hits output #45

Open
dhoogest opened this issue Oct 12, 2021 · 2 comments
Assignees

Comments

@dhoogest
Copy link
Collaborator

Looks like there's a bit of a circular issue emerging from interplay between definition of the named .fasta set (which has duplicate records within a genome dropped https://github.com/nhoffman/ya16sdb/blob/master/SConstruct#L496), and the logic which adds all type strain records back to the 'trusted' .fasta output (and BLAST db). The outcome of this is that the trusted BLASTdb contains dropped duplicate alleles for some seqs within is_type genomes, and these records lack info about the nearest type strain, since the named fa is used as a target in https://github.com/nhoffman/ya16sdb/blob/master/SConstruct#L737

Possible solutions:

The third option seems easiest implementation-wise.

@dhoogest
Copy link
Collaborator Author

Example accession NZ_CP056776_2309782_2311321 which user visualized in NGS16S validation here as lacking 'closest type' info:

dhoogest@naga:$ grep NZ_CP056776_2309782_2311321 /molmicro/common/ncbi/16s/output/20211004/dedup/1200bp/named/seqs.fasta | wc -l 
0

No records for this allele in the 'named' set, its a duplicate allele of another for the NZ_CP056776 genome

Examining the 'trusted' set confirms the presence of the record seqs.fasta (which would also be the target for BLAST db used in the pipeline output where bug was detected).

dhoogest@naga:$ grep NZ_CP056776_2309782_2311321 /molmicro/common/ncbi/16s/output/20211004/dedup/1200bp/named/filtered/trusted/seqs.fasta | wc -l
1

@crosenth
Copy link
Collaborator

I believe this if fixed right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants