Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporate logic from chim_ints.R script into dada2-nf pipeline #17

Open
dhoogest opened this issue Nov 24, 2020 · 3 comments
Open

Incorporate logic from chim_ints.R script into dada2-nf pipeline #17

dhoogest opened this issue Nov 24, 2020 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@dhoogest
Copy link
Collaborator

There is a small R script in the current NGS16S pipeline which extracts from the dada2.rda file the sequences which were dropped as chimeras and outputs a csv with columns weight and sequence. This file is not used for any downstream classification pipeline dependencies, however is useful to have for troubleshooting when a larger fraction of reads is lost during the chimera removal phase. The script currently outputs a file for each sample in the run, but it is not necessary to separate per sample if that is less convenient for the nextflow pipeline

#!/usr/bin/env Rscript

suppressPackageStartupMessages(library(argparse, quietly = TRUE))
suppressPackageStartupMessages(library(tidyverse, quietly = TRUE))

do_intermediates <- function(dada.path, id, out.path) {
    load(dada.path)
    pre <- as.data.frame(as.table(seqtab))
    post <- as.data.frame(as.table(seqtab.nochim))
    pre %>% anti_join(post, by=c('Var2')) %>%
        filter(Var1==id) %>% # ex: '624-27'
        arrange(-Freq) %>% rename(sequence=Var2) %>%
        rename(weight=Freq) %>% select(weight, sequence) %>%
        write_csv(out.path)
}

main <- function(arguments) {
    parser <- ArgumentParser()
    parser$add_argument('--rdata', default='dada2.rda')
    parser$add_argument('--id')
    parser$add_argument('--out')

    args <- parser$parse_args(arguments)
    
    do_intermediates(args$rdata, args$id, args$out)
}

main(commandArgs(trailingOnly=TRUE))


@dhoogest
Copy link
Collaborator Author

We'll need to bump this in response to https://gitlab.labmed.uw.edu/molmicro/NGS16S/-/issues/285

@dhoogest dhoogest assigned dhoogest and unassigned crosenth May 18, 2022
dhoogest pushed a commit that referenced this issue May 18, 2022
- mimics logic from old NGS16S pipeline script as described in #17
@dhoogest
Copy link
Collaborator Author

dhoogest commented May 18, 2022

I've got changes in #49 to complete this issue. Need to find a sample which demonstrates significant abundance of chimera and add that to the ngs16s test data set. /cc @nhoffman

@marykstewart
Copy link

@dhoogest You asked for test specimens, here's one from today where 10% reads were dropped at the chimeras removed filtering step https://share.labmed.uw.edu/molmicro/markergene/22N0238_NGS16S/report/22R192-11/ Seems as good as any for a validation run, we know our current version of dada2 is flagging svs as chimeric here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants