Skip to content

Commit

Permalink
Merge pull request #17 from CCBR/filter-blacklist-subwf
Browse files Browse the repository at this point in the history
Create subworkflow to filter reads from blacklisted regions
  • Loading branch information
kelly-sovacool authored Oct 23, 2023
2 parents 1773e91 + a7ef376 commit 9d45619
Show file tree
Hide file tree
Showing 19 changed files with 206 additions and 7 deletions.
8 changes: 6 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
## development version
## Development version

### new modules
### New modules

- bwa/index
- bwa/mem
Expand All @@ -10,3 +10,7 @@
- khmer/uniquekmers (#7)
- picard/samtofastq (#21)
- samtools/filteraligned (#13,#20)

### New subworkflows

- custom/filter_blacklist (#17)
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,14 @@ nf-core modules \
update [module]
```

Use the `subworkflows` command in place of the `modules` command to install or update subworkflows.

```sh
nf-core subworkflows \
--git-remote https://github.com/CCBR/nf-modules \
update [subworkflow]
```

## Help & Contributing

Come across a **bug**? Open an [issue](https://github.com/CCBR/nf-modules/issues) and include a minimal reproducible example.
Expand Down
19 changes: 19 additions & 0 deletions data/genomics/sarscov2/illumina/fastq/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@


Original files downloaded from https://github.com/nf-core/test-datasets/tree/modules/data/genomics/sarscov2/illumina/fastq

How subsets were created:

```sh
grep "^@" test_1.fastq | wc -l
grep -n "^@" test_1.fastq
head -n 40 test_1.fastq > test_1.subset.fastq
grep -n "^@" test_2.fastq
head -n 40 test_2.fastq > test_2.subset.fastq
```

Check tails of subset files to make sure they don't end with fastq headers:

```sh
tail -n 1 *subset.fastq
```
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
2 changes: 1 addition & 1 deletion modules/CCBR/samtools/filteraligned/main.nf
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@

process SAMTOOLS_FILTER_ALIGNED {
process SAMTOOLS_FILTERALIGNED {
'''
Given a bam file, filter out reads that aligned.
'''
Expand Down
2 changes: 1 addition & 1 deletion modules/CCBR/samtools/filteraligned/meta.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: samtools_filter_aligned
name: samtools_filteraligned
description: Filter out aligned reads from a BAM file.
keywords:
- bam
Expand Down
28 changes: 28 additions & 0 deletions subworkflows/CCBR/filter_blacklist/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@


include { BWA_MEM } from '../../../modules/CCBR/bwa/mem'
include { SAMTOOLS_FILTERALIGNED } from '../../../modules/CCBR/samtools/filteraligned'
include { PICARD_SAMTOFASTQ } from '../../../modules/CCBR/picard/samtofastq'

workflow FILTER_BLACKLIST {
take:
ch_fastq_input // channel: [ val(meta), path(fastq) ]
ch_blacklist_index // channel: [ val(meta), path(bwa/*) ]

main:
ch_versions = Channel.empty()

BWA_MEM ( ch_fastq_input, ch_blacklist_index )
SAMTOOLS_FILTERALIGNED( BWA_MEM.out.bam )
PICARD_SAMTOFASTQ( BWA_MEM.out.bam )

ch_versions = ch_versions.mix(
BWA_MEM.out.versions,
SAMTOOLS_FILTERALIGNED.out.versions,
PICARD_SAMTOFASTQ.out.versions
)

emit:
reads = PICARD_SAMTOFASTQ.out.reads // channel: [ val(meta), path(fastq) ]
versions = ch_versions // channel: [ path(versions.yml) ]
}
32 changes: 32 additions & 0 deletions subworkflows/CCBR/filter_blacklist/meta.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
name: filter_blacklist
description: Filter out reads that align to an index
keywords:
- bwa
- samtools
- fastq
- bam
- filter
- blacklist
components:
- bwa/mem
- samtools/filteraligned
- picard/samtofastq
input:
- ch_fastq_input:
description: |
A channel containing fastq files
- ch_blacklist_index:
description: |
A BWA index created by running BWA/INDEX on a fasta file of blacklisted regions/
output:
- reads:
description: |
Reads from the fastq files that do not align to the blacklist
- versions:
type: file
description: File containing software versions
pattern: "versions.yml"
authors:
- "@kelly-sovacool"
maintainers:
- "@kelly-sovacool"
4 changes: 4 additions & 0 deletions tests/config/pytest_modules.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,7 @@ picard/samtofastq:
samtools/filteraligned:
- modules/CCBR/samtools/filteraligned/**
- tests/modules/CCBR/samtools/filteraligned/**

subworkflows/filter_blacklist:
- subworkflows/CCBR/filter_blacklist/**
- tests/subworkflows/CCBR/filter_blacklist/**
10 changes: 10 additions & 0 deletions tests/config/test_data_CCBR.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
params {
test_data_base = 'https://raw.githubusercontent.com/CCBR/nf-modules/filter-blacklist-subwf/'

test_data {
test_1_fastq_gz = "${params.test_data_base}/data/genomics/sarscov2/illumina/fastq/test_1.fastq.gz"
test_2_fastq_gz = "${params.test_data_base}/data/genomics/sarscov2/illumina/fastq/test_2.fastq.gz"
test_1_subset_fastq_gz = "${params.test_data_base}/data/genomics/sarscov2/illumina/fastq/test_1.subset.fastq.gz"
test_2_subset_fastq_gz = "${params.test_data_base}/data/genomics/sarscov2/illumina/fastq/test_2.subset.fastq.gz"
}
}
6 changes: 3 additions & 3 deletions tests/modules/CCBR/samtools/filteraligned/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ nextflow.enable.dsl = 2

include { BWA_INDEX } from '../../../../../modules/CCBR/bwa/index/main.nf'
include { BWA_MEM } from '../../../../../modules/CCBR/bwa/mem/main.nf'
include { SAMTOOLS_FILTER_ALIGNED } from '../../../../../modules/CCBR/samtools/filteraligned/main.nf'
include { SAMTOOLS_FILTERALIGNED } from '../../../../../modules/CCBR/samtools/filteraligned/main.nf'

//
// Test with single-end data
Expand All @@ -23,7 +23,7 @@ workflow test_filter_aligned_single_end {

BWA_INDEX ( fasta )
BWA_MEM ( input, BWA_INDEX.out.index )
SAMTOOLS_FILTER_ALIGNED( BWA_MEM.out.bam )
SAMTOOLS_FILTERALIGNED( BWA_MEM.out.bam )
}

//
Expand All @@ -44,5 +44,5 @@ workflow test_filter_aligned_paired_end {

BWA_INDEX ( fasta )
BWA_MEM ( input, BWA_INDEX.out.index )
SAMTOOLS_FILTER_ALIGNED( BWA_MEM.out.bam )
SAMTOOLS_FILTERALIGNED( BWA_MEM.out.bam )
}
Empty file.
32 changes: 32 additions & 0 deletions tests/subworkflows/CCBR/filter_blacklist/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#!/usr/bin/env nextflow

nextflow.enable.dsl = 2

include { BWA_INDEX } from "../../../../modules/CCBR/bwa/index/main"
include { FILTER_BLACKLIST } from "../../../../subworkflows/CCBR/filter_blacklist/main"


workflow test_filter_blacklist_single {
input = [ [ id:'test', single_end:true ], // meta map
file(params.test_data['test_1_fastq_gz'], checkIfExists: true)
]
blacklist_reads = [
[ id:'test', single_end:true ], // meta map
file(params.test_data['test_1_subset_fastq_gz'], checkIfExists: true)
]
BWA_INDEX(blacklist_reads)
FILTER_BLACKLIST(input, BWA_INDEX.out.index)
}

workflow test_filter_blacklist_paired {
input = [ [ id:'test', single_end:false ], // meta map
[ file(params.test_data['test_1_fastq_gz'], checkIfExists: true),
file(params.test_data['test_2_fastq_gz'], checkIfExists: true) ]
]
blacklist_reads = [
[ id:'test', single_end:false ], // meta map
file(params.test_data['test_1_subset_fastq_gz'], checkIfExists: true)
]
BWA_INDEX(blacklist_reads)
FILTER_BLACKLIST(input, BWA_INDEX.out.index)
}
6 changes: 6 additions & 0 deletions tests/subworkflows/CCBR/filter_blacklist/nextflow.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
process {

publishDir = { "${params.outdir}/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" }


}
43 changes: 43 additions & 0 deletions tests/subworkflows/CCBR/filter_blacklist/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
- name: filter_blacklist test_filter_blacklist_single
command: nextflow run ./tests/subworkflows/CCBR/filter_blacklist/main.nf -entry test_filter_blacklist_single -c tests/config/nextflow.config -c tests/config/test_data_CCBR.config
tags:
- subworkflows
- subworkflows/filter_blacklist
- picard
- picard/samtofastq
- samtools
- samtools/filteraligned
- bwa
- bwa/mem
files:
- path: output/picard/test.fastq.gz

- name: filter_blacklist test_filter_blacklist_paired
command: nextflow run ./tests/subworkflows/CCBR/filter_blacklist/main.nf -entry test_filter_blacklist_paired -c tests/config/nextflow.config -c tests/config/test_data_CCBR.config
tags:
- subworkflows
- subworkflows/filter_blacklist
- picard
- picard/samtofastq
- samtools
- samtools/filteraligned
- bwa
- bwa/mem
files:
- path: output/picard/test_1.fastq.gz
- path: output/picard/test_2.fastq.gz
- path: output/picard/test.unpaired.fastq.gz

- name: filter_blacklist test_filter_blacklist_single stub
command: nextflow run ./tests/subworkflows/CCBR/filter_blacklist/main.nf -entry test_filter_blacklist_single -c tests/config/nextflow.config -c tests/config/test_data_CCBR.config -stub
tags:
- subworkflows
- subworkflows/filter_blacklist
- picard
- picard/samtofastq
- samtools
- samtools/filteraligned
- bwa
- bwa/mem
files:
- path: output/picard/test.fastq.gz
13 changes: 13 additions & 0 deletions tests/subworkflows/CCBR/filter_blacklist/test_filter_blacklist.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
import gzip
import pathlib
import pytest


@pytest.mark.workflow("filter_blacklist test_filter_blacklist_paired")
def test_unpaired_is_empty(workflow_dir):
unpaired_fastq = pathlib.Path(
workflow_dir, "output", "picard", "test.unpaired.fastq.gz"
)
with gzip.open(unpaired_fastq, "rt") as infile:
lines = infile.readlines()
assert len(lines) == 0

0 comments on commit 9d45619

Please sign in to comment.