Skip to content

Commit

Permalink
Merge pull request #27 from phac-nml/modify-output-filenames
Browse files Browse the repository at this point in the history
Resolves #26 Modify output files with `staramr`
  • Loading branch information
sgsutcliffe authored Aug 14, 2024
2 parents 1e29738 + fa6f128 commit 1f7d602
Show file tree
Hide file tree
Showing 6 changed files with 76 additions and 73 deletions.
28 changes: 14 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,22 +100,22 @@ For More information see [staramr output description](https://github.com/phac-nm

- `staramr/`
- StarAMR search results for each sample:
- `sample_detailed_summary.tsv` : A detailed summary of all detected AMR genes/mutations/plasmids/sequence type in each genome, one gene per line.
- `sample_mlst.tsv` : A tabular file of each multi-locus sequence type (MLST) and it's corresponding locus/alleles, one genome per line.
- `sample_plasmidfinder.tsv` :A tabular file of each AMR plasmid type and additional BLAST information from the PlasmidFinder database, one plasmid type per line.
- `sample_pointfinder.tsv` : A tabular file of each AMR point mutation and additional BLAST information from the PointFinder database, one gene per line.(Pointfinder organisms)
- `sample_resfinder.tsv` : A tabular file of each AMR gene and additional BLAST information from the ResFinder database, one gene per line.
- `sample_results.xlsx` : An Excel spreadsheet containing the previous 6 files as separate worksheets.
- `sample_settings.txt` :The command-line, database versions, and other settings used to run `staramr`.
- `sample_summary.tsv` : A summary of all detected AMR genes/mutations/plasmids/sequence type in each genome, one genome per line. A series of descriptive statistics is also provided for each genome as well as feedback for whether or not the genome passes several quality metrics and if not, feedback on why the genome fails.
- `sample_detailed_summary.staramr.tsv` : A detailed summary of all detected AMR genes/mutations/plasmids/sequence type in each genome, one gene per line.
- `sample_mlst.staramr.tsv` : A tabular file of each multi-locus sequence type (MLST) and it's corresponding locus/alleles, one genome per line.
- `sample_plasmidfinder.staramr.tsv` :A tabular file of each AMR plasmid type and additional BLAST information from the PlasmidFinder database, one plasmid type per line.
- `sample_pointfinder.staramr.tsv` : A tabular file of each AMR point mutation and additional BLAST information from the PointFinder database, one gene per line.(Pointfinder organisms)
- `sample_resfinder.staramr.tsv` : A tabular file of each AMR gene and additional BLAST information from the ResFinder database, one gene per line.
- `sample_results.staramr.xlsx` : An Excel spreadsheet containing the previous 6 files as separate worksheets.
- `sample_settings.staramr.txt` :The command-line, database versions, and other settings used to run `staramr`.
- `sample_summary.staramr.tsv` : A summary of all detected AMR genes/mutations/plasmids/sequence type in each genome, one genome per line. A series of descriptive statistics is also provided for each genome as well as feedback for whether or not the genome passes several quality metrics and if not, feedback on why the genome fails.
- `csvtk/`
- Combine results from all samples into a single report
- `merged_detailed_summary.tsv`
- `merged_mlst.tsv`
- `merged_plasmidfinder.tsv`
- `merged_pointfinder.tsv` (Pointfinder organisms)
- `merged_resfinder.tsv`
- `merged_summary.tsv`
- `merged_detailed_summary.staramr.tsv`
- `merged_mlst.staramr.tsv`
- `merged_plasmidfinder.staramr.tsv`
- `merged_pointfinder.staramr.tsv` (Pointfinder organisms)
- `merged_resfinder.staramr.tsv`
- `merged_summary.staramr.tsv`

</details>

Expand Down
30 changes: 15 additions & 15 deletions conf/iridanext.config
Original file line number Diff line number Diff line change
Expand Up @@ -5,21 +5,21 @@ iridanext {
overwrite = true
files {
global = [
"**/csvtk/merged_detailed_summary.tsv",
"**/csvtk/merged_mlst.tsv",
"**/csvtk/merged_plasmidfinder.tsv",
"**/csvtk/merged_resfinder.tsv",
"**/csvtk/merged_pointfinder.tsv",
"**/csvtk/merged_summary.tsv"]
"**/csvtk/merged_detailed_summary.staramr.tsv",
"**/csvtk/merged_mlst.staramr.tsv",
"**/csvtk/merged_plasmidfinder.staramr.tsv",
"**/csvtk/merged_resfinder.staramr.tsv",
"**/csvtk/merged_pointfinder.staramr.tsv",
"**/csvtk/merged_summary.staramr.tsv"]
samples = [
"**/*_results/*detailed_summary.tsv",
"**/*_results/*mlst.tsv",
"**/*_results/*plasmidfinder.tsv",
"**/*_results/*resfinder.tsv",
"**/*_results/*pointfinder.tsv",
"**/*_results.xlsx",
"**/*_settings.txt",
"**/*_results/*summary.tsv"]
"**/*_results/*detailed_summary.staramr.tsv",
"**/*_results/*mlst.staramr.tsv",
"**/*_results/*plasmidfinder.staramr.tsv",
"**/*_results/*resfinder.staramr.tsv",
"**/*_results/*pointfinder.staramr.tsv",
"**/*_results.staramr.xlsx",
"**/*_settings.staramr.txt",
"**/*_results/*summary.staramr.tsv"]
}
metadata {
samples {
Expand All @@ -35,7 +35,7 @@ iridanext {
"Genome Length": "StarAMR Genome Length"
]
csv {
path = "**/merged_summary.tsv"
path = "**/merged_summary.staramr.tsv"
sep = "\t"
idcol = "Isolate ID"
}
Expand Down
28 changes: 14 additions & 14 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,22 +29,22 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d

- `staramr/`
- StarAMR search results for each sample:
- `sample_detailed_summary.tsv`
- `sample_mlst.tsv`
- `sample_plasmidfinder.tsv`
- `sample_pointfinder.tsv` (Pointfinder organisms)
- `sample_resfinder.tsv`
- `sample_results.xlsx`
- `sample_settings.txt`
- `sample_summary.tsv`
- `sample_detailed_summary.staramr.tsv`
- `sample_mlst.staramr.tsv`
- `sample_plasmidfinder.staramr.tsv`
- `sample_pointfinder.staramr.tsv` (Pointfinder organisms)
- `sample_resfinder.staramr.tsv`
- `sample_results.staramr.xlsx`
- `sample_settings.staramr.txt`
- `sample_summary.staramr.tsv`
- `csvtk/`
- Combine results from all samples into a single report
- `merged_detailed_summary.tsv`
- `merged_mlst.tsv`
- `merged_plasmidfinder.tsv`
- `merged_pointfinder.tsv` (Pointfinder organisms)
- `merged_resfinder.tsv`
- `merged_summary.tsv`
- `merged_detailed_summary.staramr.tsv`
- `merged_mlst.staramr.tsv`
- `merged_plasmidfinder.staramr.tsv`
- `merged_pointfinder.staramr.tsv` (Pointfinder organisms)
- `merged_resfinder.staramr.tsv`
- `merged_summary.staramr.tsv`

</details>

Expand Down
27 changes: 15 additions & 12 deletions modules/local/staramr/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,15 @@ process STARAMR_SEARCH {
tuple val(meta), path(contigs)

output:
tuple val(meta), path("*_results/${meta.id}_results.xlsx") , emit: results_xlsx
tuple val(meta), path("*_results/${meta.id}_summary.tsv") , emit: summary_tsv
tuple val(meta), path("*_results/${meta.id}_detailed_summary.tsv") , emit: detailed_summary_tsv
tuple val(meta), path("*_results/${meta.id}_resfinder.tsv") , emit: resfinder_tsv
tuple val(meta), path("*_results/${meta.id}_plasmidfinder.tsv") , emit: plasmidfinder_tsv
tuple val(meta), path("*_results/${meta.id}_mlst.tsv") , emit: mlst_tsv
tuple val(meta), path("*_results/${meta.id}_settings.txt") , emit: settings_txt
tuple val(meta), path("*_results/${meta.id}_pointfinder.tsv") , emit: pointfinder_tsv, optional: true
path "versions.yml" , emit: versions
tuple val(meta), path("*_results/${meta.id}_results.staramr.xlsx") , emit: results_xlsx
tuple val(meta), path("*_results/${meta.id}_summary.staramr.tsv") , emit: summary_tsv
tuple val(meta), path("*_results/${meta.id}_detailed_summary.staramr.tsv") , emit: detailed_summary_tsv
tuple val(meta), path("*_results/${meta.id}_resfinder.staramr.tsv") , emit: resfinder_tsv
tuple val(meta), path("*_results/${meta.id}_plasmidfinder.staramr.tsv") , emit: plasmidfinder_tsv
tuple val(meta), path("*_results/${meta.id}_mlst.staramr.tsv") , emit: mlst_tsv
tuple val(meta), path("*_results/${meta.id}_settings.staramr.txt") , emit: settings_txt
tuple val(meta), path("*_results/${meta.id}_pointfinder.staramr.tsv") , emit: pointfinder_tsv, optional: true
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when
Expand Down Expand Up @@ -48,6 +48,9 @@ process STARAMR_SEARCH {
# Add prefix ($meta.id) to the names of output files (allows for CSVTK module to concatenate files downstream)
for f in ${prefix}_results/* ; do mv "\$f" \$(echo \$f | sed 's;/;/${prefix}_;'); done
# Add extension (.staramr) to the names of output files (to simplify output differntiation amongst other modules)
for f in ${prefix}_results/*{.tsv,.xlsx,.txt} ; do mv "\$f" \$(echo \$f | sed 's;\\.\\([^\\.]*\\)\$;\\.staramr\\.\\1;'); done
cat <<-END_VERSIONS > versions.yml
"${task.process}":
staramr : \$(echo \$(staramr --version 2>&1) | sed 's/^.*staramr //' )
Expand All @@ -59,9 +62,9 @@ process STARAMR_SEARCH {
def prefix = task.ext.prefix ?: "${meta.id}"
"""
mkdir ${prefix}_results
touch ${prefix}_results/results.xlsx
touch ${prefix}_results/{${prefix}_summary,${prefix}_detailed_summary,${prefix}_resfinder,${prefix}_pointfinder,${prefix}_plasmidfinder,mlst}.tsv
touch ${prefix}_results/settings.txt
touch ${prefix}_results/results.staramr.xlsx
touch ${prefix}_results/{${prefix}_summary,${prefix}_detailed_summary,${prefix}_resfinder,${prefix}_pointfinder,${prefix}_plasmidfinder,mlst}.staramr.tsv
touch ${prefix}_results/settings.staramr.txt
cat <<-END_VERSIONS > versions.yml
"${task.process}":
Expand Down
24 changes: 12 additions & 12 deletions tests/main.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -62,52 +62,52 @@ nextflow_pipeline {

// Check the commandline parameters
// Salmonella
assert path("$baseDir/tests/results/staramr/GCA_000008105_results/GCA_000008105_settings.txt").exists()
def salmonella_settings = new File("$baseDir/tests/results/staramr/GCA_000008105_results/GCA_000008105_settings.txt")
assert path("$baseDir/tests/results/staramr/GCA_000008105_results/GCA_000008105_settings.staramr.txt").exists()
def salmonella_settings = new File("$baseDir/tests/results/staramr/GCA_000008105_results/GCA_000008105_settings.staramr.txt")
def salmonella_cmd = salmonella_settings.readLines().get(0)
assert salmonella_cmd == "command_line = /usr/local/bin/staramr search --pointfinder-organism salmonella --minimum-contig-length 300 --genome-size-lower-bound 4000000 --genome-size-upper-bound 6000000 --minimum-N50-value 10000 --minimum-contig-length 300 --unacceptable-number-contigs 1000 --pid-threshold 98 --percent-length-overlap-plasmidfinder 60 --percent-length-overlap-resfinder 60 --percent-length-overlap-pointfinder 95 --nprocs 1 -o GCA_000008105_results GCA_000008105.fasta"

// Ecoli
assert path("$baseDir/tests/results/staramr/GCA_000947975_results/GCA_000947975_settings.txt").exists()
def ecoli_settings = new File("$baseDir/tests/results/staramr/GCA_000947975_results/GCA_000947975_settings.txt")
assert path("$baseDir/tests/results/staramr/GCA_000947975_results/GCA_000947975_settings.staramr.txt").exists()
def ecoli_settings = new File("$baseDir/tests/results/staramr/GCA_000947975_results/GCA_000947975_settings.staramr.txt")
def ecoli_cmd = ecoli_settings.readLines().get(0)
assert ecoli_cmd == "command_line = /usr/local/bin/staramr search --pointfinder-organism escherichia_coli --minimum-contig-length 300 --genome-size-lower-bound 4000000 --genome-size-upper-bound 6000000 --minimum-N50-value 10000 --minimum-contig-length 300 --unacceptable-number-contigs 1000 --pid-threshold 98 --percent-length-overlap-plasmidfinder 60 --percent-length-overlap-resfinder 60 --percent-length-overlap-pointfinder 95 --nprocs 1 -o GCA_000947975_results GCA_000947975.fasta"

// Listeria
assert path("$baseDir/tests/results/staramr/GCF_000196035_results/GCF_000196035_settings.txt").exists()
def listeria_settings = new File("$baseDir/tests/results/staramr/GCF_000196035_results/GCF_000196035_settings.txt")
assert path("$baseDir/tests/results/staramr/GCF_000196035_results/GCF_000196035_settings.staramr.txt").exists()
def listeria_settings = new File("$baseDir/tests/results/staramr/GCF_000196035_results/GCF_000196035_settings.staramr.txt")
def listeria_cmd = listeria_settings.readLines().get(0)
assert listeria_cmd == "command_line = /usr/local/bin/staramr search --minimum-contig-length 300 --genome-size-lower-bound 4000000 --genome-size-upper-bound 6000000 --minimum-N50-value 10000 --minimum-contig-length 300 --unacceptable-number-contigs 1000 --pid-threshold 98 --percent-length-overlap-plasmidfinder 60 --percent-length-overlap-resfinder 60 --percent-length-overlap-pointfinder 95 --nprocs 1 -o GCF_000196035_results GCF_000196035.fasta"

// Check CSVTK_concat output (merged_*) files

// merged_detailed_summary.tsv
def actual_detailed_summary_tsv = path("$baseDir/tests/results/csvtk/merged_detailed_summary.tsv")
def actual_detailed_summary_tsv = path("$baseDir/tests/results/csvtk/merged_detailed_summary.staramr.tsv")
def expected_detailed_summary_tsv = path("$baseDir/tests/data/merged_detailed_summary.tsv")
assert actual_detailed_summary_tsv.readLines().sort() == expected_detailed_summary_tsv.readLines().sort()

// merged_mlst.tsv
def actual_mlst_tsv = path("$baseDir/tests/results/csvtk/merged_mlst.tsv")
def actual_mlst_tsv = path("$baseDir/tests/results/csvtk/merged_mlst.staramr.tsv")
def expected_mlst_tsv = path("$baseDir/tests/data/merged_mlst.tsv")
assert actual_mlst_tsv.readLines().sort() == expected_mlst_tsv.readLines().sort()

// merged_plasmidfinder.tsv
def actual_plasmidfinder_tsv = path("$baseDir/tests/results/csvtk/merged_plasmidfinder.tsv")
def actual_plasmidfinder_tsv = path("$baseDir/tests/results/csvtk/merged_plasmidfinder.staramr.tsv")
def expected_plasmidfinder_tsv = path("$baseDir/tests/data/merged_plasmidfinder.tsv")
assert actual_plasmidfinder_tsv.readLines().sort() == expected_plasmidfinder_tsv.readLines().sort()

// merged_pointfinder.tsv
def actual_pointfinder_tsv = path("$baseDir/tests/results/csvtk/merged_pointfinder.tsv")
def actual_pointfinder_tsv = path("$baseDir/tests/results/csvtk/merged_pointfinder.staramr.tsv")
def expected_pointfinder_tsv = path("$baseDir/tests/data/merged_pointfinder.tsv")
assert actual_pointfinder_tsv.readLines().sort() == expected_pointfinder_tsv.readLines().sort()

// merged_resfinder.tsv
def actual_resfinder_tsv = path("$baseDir/tests/results/csvtk/merged_resfinder.tsv")
def actual_resfinder_tsv = path("$baseDir/tests/results/csvtk/merged_resfinder.staramr.tsv")
def expected_resfinder_tsv = path("$baseDir/tests/data/merged_resfinder.tsv")
assert actual_resfinder_tsv.readLines().sort() == expected_resfinder_tsv.readLines().sort()

// merged_summary.tsv
def actual_summary_tsv = path("$baseDir/tests/results/csvtk/merged_summary.tsv")
def actual_summary_tsv = path("$baseDir/tests/results/csvtk/merged_summary.staramr.tsv")
def expected_summary_tsv = path("$baseDir/tests/data/merged_summary.tsv")
assert actual_summary_tsv.readLines().sort() == expected_summary_tsv.readLines().sort()
}
Expand Down
12 changes: 6 additions & 6 deletions workflows/staramr.nf
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ workflow STARAMR {
ch_tsvs_1 = tsv_files_1.map{
meta, summary_tsv -> summary_tsv
}.collect().map{
summary_tsv -> [ [id:"merged_summary"], summary_tsv]
summary_tsv -> [ [id:"merged_summary.staramr"], summary_tsv]
}

// 2) detailed_summary.tsv file
Expand All @@ -82,7 +82,7 @@ workflow STARAMR {
ch_tsvs_2 = tsv_files_2.map{
meta, detailed_summary_tsv -> detailed_summary_tsv
}.collect().map{
detailed_summary_tsv -> [ [id:"merged_detailed_summary"], detailed_summary_tsv]
detailed_summary_tsv -> [ [id:"merged_detailed_summary.staramr"], detailed_summary_tsv]
}

// 3) resfinder.tsv file
Expand All @@ -91,7 +91,7 @@ workflow STARAMR {
ch_tsvs_3 = tsv_files_3.map{
meta, resfinder_tsv -> resfinder_tsv
}.collect().map{
resfinder_tsv -> [ [id:"merged_resfinder"], resfinder_tsv]
resfinder_tsv -> [ [id:"merged_resfinder.staramr"], resfinder_tsv]
}

// 4) plasmidfinder.tsv file
Expand All @@ -100,7 +100,7 @@ workflow STARAMR {
ch_tsvs_4 = tsv_files_4.map{
meta, plasmidfinder_tsv -> plasmidfinder_tsv
}.collect().map{
plasmidfinder_tsv -> [ [id:"merged_plasmidfinder"], plasmidfinder_tsv]
plasmidfinder_tsv -> [ [id:"merged_plasmidfinder.staramr"], plasmidfinder_tsv]
}

// 5) mlst.tsv file
Expand All @@ -109,7 +109,7 @@ workflow STARAMR {
ch_tsvs_5 = tsv_files_5.map{
meta, mlst_tsv -> mlst_tsv
}.collect().map{
mlst_tsv -> [ [id:"merged_mlst"], mlst_tsv]
mlst_tsv -> [ [id:"merged_mlst.staramr"], mlst_tsv]
}

// 6) pointfinder.tsv file
Expand All @@ -118,7 +118,7 @@ workflow STARAMR {
ch_tsvs_6 = tsv_files_6.map{
meta, pointfinder_tsv -> pointfinder_tsv
}.collect().map{
pointfinder_tsv -> [ [id:"merged_pointfinder"], pointfinder_tsv]
pointfinder_tsv -> [ [id:"merged_pointfinder.staramr"], pointfinder_tsv]
}.mix(ch_tsvs_1,ch_tsvs_2,ch_tsvs_3,ch_tsvs_4,ch_tsvs_5)

CSVTK_CONCAT(
Expand Down

0 comments on commit 1f7d602

Please sign in to comment.