Incorporate logic from chim_ints.R script into dada2-nf pipeline #17

dhoogest · 2020-11-24T16:49:00Z

There is a small R script in the current NGS16S pipeline which extracts from the dada2.rda file the sequences which were dropped as chimeras and outputs a csv with columns weight and sequence. This file is not used for any downstream classification pipeline dependencies, however is useful to have for troubleshooting when a larger fraction of reads is lost during the chimera removal phase. The script currently outputs a file for each sample in the run, but it is not necessary to separate per sample if that is less convenient for the nextflow pipeline

#!/usr/bin/env Rscript

suppressPackageStartupMessages(library(argparse, quietly = TRUE))
suppressPackageStartupMessages(library(tidyverse, quietly = TRUE))

do_intermediates <- function(dada.path, id, out.path) {
    load(dada.path)
    pre <- as.data.frame(as.table(seqtab))
    post <- as.data.frame(as.table(seqtab.nochim))
    pre %>% anti_join(post, by=c('Var2')) %>%
        filter(Var1==id) %>% # ex: '624-27'
        arrange(-Freq) %>% rename(sequence=Var2) %>%
        rename(weight=Freq) %>% select(weight, sequence) %>%
        write_csv(out.path)
}

main <- function(arguments) {
    parser <- ArgumentParser()
    parser$add_argument('--rdata', default='dada2.rda')
    parser$add_argument('--id')
    parser$add_argument('--out')

    args <- parser$parse_args(arguments)
    
    do_intermediates(args$rdata, args$id, args$out)
}

main(commandArgs(trailingOnly=TRUE))

The text was updated successfully, but these errors were encountered:

dhoogest · 2022-05-18T21:41:03Z

We'll need to bump this in response to https://gitlab.labmed.uw.edu/molmicro/NGS16S/-/issues/285

- mimics logic from old NGS16S pipeline script as described in #17

dhoogest · 2022-05-18T23:05:51Z

I've got changes in #49 to complete this issue. Need to find a sample which demonstrates significant abundance of chimera and add that to the ngs16s test data set. /cc @nhoffman

marykstewart · 2022-05-24T22:45:49Z

@dhoogest You asked for test specimens, here's one from today where 10% reads were dropped at the chimeras removed filtering step https://share.labmed.uw.edu/molmicro/markergene/22N0238_NGS16S/report/22R192-11/ Seems as good as any for a validation run, we know our current version of dada2 is flagging svs as chimeric here.

dhoogest assigned crosenth Nov 24, 2020

dhoogest added the enhancement New feature or request label Nov 24, 2020

dhoogest mentioned this issue Nov 24, 2020

Include unmerged 16S SVs in pipeline output #11

Closed

dhoogest assigned dhoogest and unassigned crosenth May 18, 2022

dhoogest pushed a commit that referenced this issue May 18, 2022

add script and process to extract unmerged seqs from the rds file

cc27b5b

- mimics logic from old NGS16S pipeline script as described in #17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorporate logic from chim_ints.R script into dada2-nf pipeline #17

Incorporate logic from chim_ints.R script into dada2-nf pipeline #17

dhoogest commented Nov 24, 2020

dhoogest commented May 18, 2022

dhoogest commented May 18, 2022 •

edited

Loading

marykstewart commented May 24, 2022

Incorporate logic from chim_ints.R script into dada2-nf pipeline #17

Incorporate logic from chim_ints.R script into dada2-nf pipeline #17

Comments

dhoogest commented Nov 24, 2020

dhoogest commented May 18, 2022

dhoogest commented May 18, 2022 • edited Loading

marykstewart commented May 24, 2022

dhoogest commented May 18, 2022 •

edited

Loading