Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve reads filtered out at dada2 filterAndTrim? #87

Open
marykstewart opened this issue Jul 10, 2024 · 2 comments
Open

Preserve reads filtered out at dada2 filterAndTrim? #87

marykstewart opened this issue Jul 10, 2024 · 2 comments
Assignees

Comments

@marykstewart
Copy link

I think we need to revisit our trimming parameters for ThruPlex NGS16S libraries (more an issue for the clampi pipeline than this one). If it's straightforward to do, it might be helpful to preserve the reads filtered out at filterAndTrim for troubleshooting. I thought there was a flag in dada2 that would do this, but I didn't see one on re-reading the docs so this would not be nearly as simple as I was thinking, and depending upon the effort involved might not be worth it (ie, watching what remains may serve just as well). I talked with Chris and Noah about this last week, and said I'd follow up in github, but from my perspective no need to jump on this right away.

Chris, the lines I'd commented out to preserve the work/ directory are 742-746 https://gitlab.labmed.uw.edu/molmicro/clampi-ngs/-/blob/dev/SConstruct. Doing so still preserves the directory, but it looks to be empty now.

The Background:
We're observing significant read loss at the dada2 filter with ThruPlex NGS16S library preps, compare this ThruPlex/standard pair: https://share.labmed.uw.edu/molmicro/markergene/24N0316_SRS16S/report/
https://share.labmed.uw.edu/molmicro/markergene/24N0300_NGS16S/report/

This is likely due to minimum length requirements we've set in the dada2 parameters. When I spoke with Chris and Noah last week, I was somewhat mystified because I'd done an analysis where I reduced minLen from 100 to 20, but the filtering numbers didn't budge https://gitlab.labmed.uw.edu/molmicro/ops/-/issues/1912#note_127852.

I read the dada2 docs again and realized that while we do have a minimum length of 100 set in the pipeline, it's not actually doing anything b/c it operates after truncLen. We have truncLen set to R1=240 and R2=220 (https://gitlab.labmed.uw.edu/molmicro/clampi-ngs/-/blob/dev/data/dada_params_16S.json), and reads shorter than truncLen-trimLeft(=20) will be discarded (https://www.bioconductor.org/packages/release/bioc/manuals/dada2/man/dada2.pdf). So heavy losses at filterAndTrim are not surprising, given the adapter positions in reads of ThruPlex NGS16S libraries, see examples
https://share.labmed.uw.edu/molmicro/markergene/24N0316_SRS16S/24R215_fastqc/multiqc_report.html#fastqc_adapter_content
https://share.labmed.uw.edu/molmicro/markergene/24N0315_SRS16S/fastqc_24R213/multiqc_report.html#fastqc_adapter_content
https://share.labmed.uw.edu/molmicro/markergene/24N0329_SRS16S/fastqc_24R224/multiqc_report.html#fastqc_adapter_content

@crosenth
Copy link
Collaborator

Hi @marykstewart , I have a PR to include filter_and_trim filtered and dropped sequences in the dada2-nf output. I will coordinate with @dhoogest to make sure they are be available for you in 16S pipeline to analyze

@crosenth
Copy link
Collaborator

New 2.0.3 release - 2.0.2...2.0.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants