From f558bcf415de35bf3210c0cfd773c99db9b88207 Mon Sep 17 00:00:00 2001 From: Rike Date: Thu, 16 Nov 2023 11:27:44 +0100 Subject: [PATCH 01/11] add docs for bcftools annotate params --- nextflow_schema.json | 26 ++++++++++++++------------ 1 file changed, 14 insertions(+), 12 deletions(-) diff --git a/nextflow_schema.json b/nextflow_schema.json index f585103b28..5be42f1ab0 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -84,7 +84,7 @@ "description": "Path to target bed file in case of whole exome or targeted sequencing or intervals file." }, "nucleotides_per_second": { - "type": "number", + "type": "integer", "fa_icon": "fas fa-clock", "description": "Estimate interval size.", "help_text": "Intervals are parts of the chopped up genome used to speed up preprocessing and variant calling. See `--intervals` for more info. \n\nChanging this parameter, changes the number of intervals that are grouped and processed together. Bed files from target sequencing can contain thousands or small intervals. Spinning up a new process for each can be quite resource intensive. Instead it can be desired to process small intervals together on larger nodes. \nIn order to make use of this parameter, no runtime estimate can be present in the bed file (column 5). ", @@ -100,8 +100,7 @@ "type": "string", "fa_icon": "fas fa-toolbox", "description": "Tools to use for duplicate marking, variant calling and/or for annotation.", - "help_text": "Multiple tools separated with commas.\n\n**Variant Calling:**\n\nGermline variant calling can currently be performed with the following variant callers:\n- SNPs/Indels: DeepVariant, FreeBayes, GATK HaplotypeCaller, mpileup, Sentieon Haplotyper, Strelka\n- Structural Variants: Manta, TIDDIT\n- Copy-number: CNVKit\n\nTumor-only somatic variant calling can currently be performed with the following variant callers:\n- SNPs/Indels: FreeBayes, mpileup, Mutect2, Strelka\n- Structural Variants: Manta, TIDDIT\n- Copy-number: CNVKit, ControlFREEC\n\nSomatic variant calling can currently only be performed with the following variant callers:\n- SNPs/Indels: FreeBayes, Mutect2, Strelka2\n- Structural variants: Manta, TIDDIT\n- Copy-Number: ASCAT, CNVKit, Control-FREEC\n- Microsatellite Instability: MSIsensorpro\n\n> **NB** Mutect2 for somatic variant calling cannot be combined with `--no_intervals`\n\n**Annotation:**\n \n- snpEff, VEP, merge (both consecutively).\n\n> **NB** As Sarek will use bgzip and tabix to compress and index VCF files annotated, it expects VCF files to be sorted when starting from `--step annotate`.", - + "help_text": "Multiple tools separated with commas.\n\n**Variant Calling:**\n\nGermline variant calling can currently be performed with the following variant callers:\n- SNPs/Indels: DeepVariant, FreeBayes, GATK HaplotypeCaller, mpileup, Sentieon Haplotyper, Strelka\n- Structural Variants: Manta, TIDDIT\n- Copy-number: CNVKit\n\nTumor-only somatic variant calling can currently be performed with the following variant callers:\n- SNPs/Indels: FreeBayes, mpileup, Mutect2, Strelka\n- Structural Variants: Manta, TIDDIT\n- Copy-number: CNVKit, ControlFREEC\n\nSomatic variant calling can currently only be performed with the following variant callers:\n- SNPs/Indels: FreeBayes, Mutect2, Strelka2\n- Structural variants: Manta, TIDDIT\n- Copy-Number: ASCAT, CNVKit, Control-FREEC\n- Microsatellite Instability: MSIsensorpro\n\n> **NB** Mutect2 for somatic variant calling cannot be combined with `--no_intervals`\n\n**Annotation:**\n \n- snpEff, VEP, merge (both consecutively), and bcftools annotate (needs `--bcftools_annotation`).\n\n> **NB** As Sarek will use bgzip and tabix to compress and index VCF files annotated, it expects VCF files to be sorted when starting from `--step annotate`.", "pattern": "^((ascat|bcfann|cnvkit|controlfreec|deepvariant|freebayes|haplotypecaller|sentieon_dnascope|sentieon_haplotyper|manta|merge|mpileup|msisensorpro|mutect2|ngscheckmate|sentieon_dedup|snpeff|strelka|tiddit|vep)?,?)*(? Date: Thu, 16 Nov 2023 11:34:26 +0100 Subject: [PATCH 02/11] add input validation --- CHANGELOG.md | 2 +- subworkflows/local/samplesheet_to_channel/main.nf | 10 ++++++++-- 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 97e4486f64..b29c9b110c 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -16,7 +16,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Fixed - [#1334](https://github.com/nf-core/sarek/pull/1334) - Remove extra v, when reporting tower runs on slack - +- [#1335](https://github.com/nf-core/sarek/pull/1335) - Add docs and validation for bcftools annotation parameters ### Removed ### Dependencies diff --git a/subworkflows/local/samplesheet_to_channel/main.nf b/subworkflows/local/samplesheet_to_channel/main.nf index b3f9b2311a..a15f9c6e2a 100644 --- a/subworkflows/local/samplesheet_to_channel/main.nf +++ b/subworkflows/local/samplesheet_to_channel/main.nf @@ -112,6 +112,7 @@ workflow SAMPLESHEET_TO_CHANNEL{ } } } + input_sample.filter{ it[0].status == 0 }.ifEmpty{ // In this case, the sample-sheet contains no normal/germline-samples def tools_requiring_normal_samples = ['ascat', 'deepvariant', 'haplotypecaller', 'msisensorpro'] def requested_tools_requiring_normal_samples = [] @@ -126,8 +127,8 @@ workflow SAMPLESHEET_TO_CHANNEL{ // Fails when wrongfull extension for intervals file if (params.wes && !params.step == 'annotate') { - if (params.intervals && !params.intervals.endsWith("bed")) error("Target file specified with `--intervals` must be in BED format for targeted data") - else log.warn("Intervals file was provided without parameter `--wes`: Pipeline will assume this is Whole-Genome-Sequencing data.") + if (params.intervals && !params.intervals.endsWith("bed")) error("Target file specified with `--intervals` must be in BED format for targeted data") + else log.warn("Intervals file was provided without parameter `--wes`: Pipeline will assume this is Whole-Genome-Sequencing data.") } else if (params.intervals && !params.intervals.endsWith("bed") && !params.intervals.endsWith("list")) error("Intervals file must end with .bed, .list, or .interval_list") if (params.step == 'mapping' && params.aligner.contains("dragmap") && !(params.skip_tools && params.skip_tools.split(',').contains("baserecalibrator"))) { @@ -250,6 +251,11 @@ workflow SAMPLESHEET_TO_CHANNEL{ } } + // Fails when bcftools annotate is used but no files are supplied + if (params.tools && (params.tools.split(',').contains('bcfann') && !(params.bcftools_annotations && params.bcftools_annotations_index && params.bcftools_header_lines)) { + error("Please specify --bcftools_annotations, --bcftools_annotations_index, and --bcftools_header_lines, when using BCFTools annotations") + } + emit: input_sample } From a815fdaf1fd6dc4d0fc8e22bf6cd691cfb851f2d Mon Sep 17 00:00:00 2001 From: nf-core-bot Date: Thu, 16 Nov 2023 11:16:47 +0000 Subject: [PATCH 03/11] [automated] Fix linting with Prettier --- CHANGELOG.md | 1 + 1 file changed, 1 insertion(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index b29c9b110c..abdd2166ef 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -17,6 +17,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - [#1334](https://github.com/nf-core/sarek/pull/1334) - Remove extra v, when reporting tower runs on slack - [#1335](https://github.com/nf-core/sarek/pull/1335) - Add docs and validation for bcftools annotation parameters + ### Removed ### Dependencies From 70d52c8b0cc918265131e12b3ee0399dc8428e9c Mon Sep 17 00:00:00 2001 From: Rike Date: Thu, 16 Nov 2023 12:17:12 +0100 Subject: [PATCH 04/11] fix brackets --- subworkflows/local/samplesheet_to_channel/main.nf | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/subworkflows/local/samplesheet_to_channel/main.nf b/subworkflows/local/samplesheet_to_channel/main.nf index a15f9c6e2a..d674fa1dcd 100644 --- a/subworkflows/local/samplesheet_to_channel/main.nf +++ b/subworkflows/local/samplesheet_to_channel/main.nf @@ -252,7 +252,7 @@ workflow SAMPLESHEET_TO_CHANNEL{ } // Fails when bcftools annotate is used but no files are supplied - if (params.tools && (params.tools.split(',').contains('bcfann') && !(params.bcftools_annotations && params.bcftools_annotations_index && params.bcftools_header_lines)) { + if (params.tools && params.tools.split(',').contains('bcfann') && !(params.bcftools_annotations && params.bcftools_annotations_index && params.bcftools_header_lines)) { error("Please specify --bcftools_annotations, --bcftools_annotations_index, and --bcftools_header_lines, when using BCFTools annotations") } From 8f34778bc946780b386b097b672914093614c034 Mon Sep 17 00:00:00 2001 From: Friederike Hanssen Date: Thu, 16 Nov 2023 15:44:46 +0100 Subject: [PATCH 05/11] Update CHANGELOG.md Co-authored-by: Maxime U Garcia --- CHANGELOG.md | 1 + 1 file changed, 1 insertion(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index abdd2166ef..fff0e67cce 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -18,6 +18,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - [#1334](https://github.com/nf-core/sarek/pull/1334) - Remove extra v, when reporting tower runs on slack - [#1335](https://github.com/nf-core/sarek/pull/1335) - Add docs and validation for bcftools annotation parameters + ### Removed ### Dependencies From 7061e432bbefcd353e19ce3ca3134bef2f23fde2 Mon Sep 17 00:00:00 2001 From: nf-core-bot Date: Thu, 16 Nov 2023 14:50:14 +0000 Subject: [PATCH 06/11] [automated] Fix linting with Prettier --- CHANGELOG.md | 1 - 1 file changed, 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index fff0e67cce..abdd2166ef 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -18,7 +18,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - [#1334](https://github.com/nf-core/sarek/pull/1334) - Remove extra v, when reporting tower runs on slack - [#1335](https://github.com/nf-core/sarek/pull/1335) - Add docs and validation for bcftools annotation parameters - ### Removed ### Dependencies From 4eff47ad91d1c481ef7043ee032ae685ea300dc0 Mon Sep 17 00:00:00 2001 From: Rike Date: Fri, 17 Nov 2023 16:21:48 +0100 Subject: [PATCH 07/11] compute index if missing and rename to tbi --- nextflow.config | 40 ++++----- nextflow_schema.json | 4 +- subworkflows/local/prepare_genome/main.nf | 83 ++++++++++--------- .../local/samplesheet_to_channel/main.nf | 4 +- workflows/sarek.nf | 40 ++++----- 5 files changed, 89 insertions(+), 82 deletions(-) diff --git a/nextflow.config b/nextflow.config index 092662fb2c..d309acfc21 100644 --- a/nextflow.config +++ b/nextflow.config @@ -77,26 +77,26 @@ params { wes = false // Set to true, if data is exome/targeted sequencing data. Used to use correct models in various variant callers // Annotation - bcftools_annotations = null // No extra annotation file - bcftools_annotations_index = null // No extra annotation file index - bcftools_header_lines = null // No header lines to be added to the VCF file - dbnsfp = null // No dbnsfp processed file - dbnsfp_consequence = null // No default consequence for dbnsfp plugin - dbnsfp_fields = "rs_dbSNP,HGVSc_VEP,HGVSp_VEP,1000Gp3_EAS_AF,1000Gp3_AMR_AF,LRT_score,GERP++_RS,gnomAD_exomes_AF" // Default fields for dbnsfp plugin - dbnsfp_tbi = null // No dbnsfp processed file index - outdir_cache = null // No default outdir cache - spliceai_indel = null // No spliceai_indel file - spliceai_indel_tbi = null // No spliceai_indel file index - spliceai_snv = null // No spliceai_snv file - spliceai_snv_tbi = null // No spliceai_snv file index - vep_custom_args = "--everything --filter_common --per_gene --total_length --offline --format vcf" // Default arguments for VEP - vep_dbnsfp = null // dbnsfp plugin disabled within VEP - vep_include_fasta = false // Don't use fasta file for annotation with VEP - vep_loftee = null // loftee plugin disabled within VEP - vep_out_format = "vcf" - vep_spliceai = null // spliceai plugin disabled within VEP - vep_spliceregion = null // spliceregion plugin disabled within VEP - vep_version = "110.0-0" // Should be updated when we update VEP, needs this to get full path to some plugins + bcftools_annotations = null // No extra annotation file + bcftools_annotations_tbi = null // No extra annotation file index + bcftools_header_lines = null // No header lines to be added to the VCF file + dbnsfp = null // No dbnsfp processed file + dbnsfp_consequence = null // No default consequence for dbnsfp plugin + dbnsfp_fields = "rs_dbSNP,HGVSc_VEP,HGVSp_VEP,1000Gp3_EAS_AF,1000Gp3_AMR_AF,LRT_score,GERP++_RS,gnomAD_exomes_AF" // Default fields for dbnsfp plugin + dbnsfp_tbi = null // No dbnsfp processed file index + outdir_cache = null // No default outdir cache + spliceai_indel = null // No spliceai_indel file + spliceai_indel_tbi = null // No spliceai_indel file index + spliceai_snv = null // No spliceai_snv file + spliceai_snv_tbi = null // No spliceai_snv file index + vep_custom_args = "--everything --filter_common --per_gene --total_length --offline --format vcf" // Default arguments for VEP + vep_dbnsfp = null // dbnsfp plugin disabled within VEP + vep_include_fasta = false // Don't use fasta file for annotation with VEP + vep_loftee = null // loftee plugin disabled within VEP + vep_out_format = "vcf" + vep_spliceai = null // spliceai plugin disabled within VEP + vep_spliceregion = null // spliceregion plugin disabled within VEP + vep_version = "110.0-0" // Should be updated when we update VEP, needs this to get full path to some plugins // MultiQC options multiqc_config = null diff --git a/nextflow_schema.json b/nextflow_schema.json index 5be42f1ab0..3f3f21313f 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -533,9 +533,9 @@ "bcftools_annotations": { "type": "string", "fa_icon": "fas fa-file", - "description": "A vcf file containing custom annotations to be used with bcftools annotate" + "description": "A vcf file containing custom annotations to be used with bcftools annotate. Needs to be bgzipped." }, - "bcftools_annotations_index": { + "bcftools_annotations_tbi": { "type": "string", "fa_icon": "fas fa-file", "description": "Index file for `bcftools_annotations`" diff --git a/subworkflows/local/prepare_genome/main.nf b/subworkflows/local/prepare_genome/main.nf index 4945282d98..4582c6dd74 100644 --- a/subworkflows/local/prepare_genome/main.nf +++ b/subworkflows/local/prepare_genome/main.nf @@ -8,37 +8,39 @@ // Condition is based on params.step and params.tools // If and extra condition exists, it's specified in comments -include { BWA_INDEX as BWAMEM1_INDEX } from '../../../modules/nf-core/bwa/index/main' -include { BWAMEM2_INDEX } from '../../../modules/nf-core/bwamem2/index/main' -include { DRAGMAP_HASHTABLE } from '../../../modules/nf-core/dragmap/hashtable/main' -include { GATK4_CREATESEQUENCEDICTIONARY } from '../../../modules/nf-core/gatk4/createsequencedictionary/main' -include { MSISENSORPRO_SCAN } from '../../../modules/nf-core/msisensorpro/scan/main' -include { SAMTOOLS_FAIDX } from '../../../modules/nf-core/samtools/faidx/main' -include { TABIX_TABIX as TABIX_DBSNP } from '../../../modules/nf-core/tabix/tabix/main' -include { TABIX_TABIX as TABIX_GERMLINE_RESOURCE } from '../../../modules/nf-core/tabix/tabix/main' -include { TABIX_TABIX as TABIX_KNOWN_INDELS } from '../../../modules/nf-core/tabix/tabix/main' -include { TABIX_TABIX as TABIX_KNOWN_SNPS } from '../../../modules/nf-core/tabix/tabix/main' -include { TABIX_TABIX as TABIX_PON } from '../../../modules/nf-core/tabix/tabix/main' -include { UNTAR as UNTAR_CHR_DIR } from '../../../modules/nf-core/untar/main' -include { UNZIP as UNZIP_ALLELES } from '../../../modules/nf-core/unzip/main' -include { UNZIP as UNZIP_GC } from '../../../modules/nf-core/unzip/main' -include { UNZIP as UNZIP_LOCI } from '../../../modules/nf-core/unzip/main' -include { UNZIP as UNZIP_RT } from '../../../modules/nf-core/unzip/main' +include { BWA_INDEX as BWAMEM1_INDEX } from '../../../modules/nf-core/bwa/index/main' +include { BWAMEM2_INDEX } from '../../../modules/nf-core/bwamem2/index/main' +include { DRAGMAP_HASHTABLE } from '../../../modules/nf-core/dragmap/hashtable/main' +include { GATK4_CREATESEQUENCEDICTIONARY } from '../../../modules/nf-core/gatk4/createsequencedictionary/main' +include { MSISENSORPRO_SCAN } from '../../../modules/nf-core/msisensorpro/scan/main' +include { SAMTOOLS_FAIDX } from '../../../modules/nf-core/samtools/faidx/main' +include { TABIX_TABIX as TABIX_BCFTOOLS_ANNOTATIONS } from '../../../modules/nf-core/tabix/tabix/main' +include { TABIX_TABIX as TABIX_DBSNP } from '../../../modules/nf-core/tabix/tabix/main' +include { TABIX_TABIX as TABIX_GERMLINE_RESOURCE } from '../../../modules/nf-core/tabix/tabix/main' +include { TABIX_TABIX as TABIX_KNOWN_INDELS } from '../../../modules/nf-core/tabix/tabix/main' +include { TABIX_TABIX as TABIX_KNOWN_SNPS } from '../../../modules/nf-core/tabix/tabix/main' +include { TABIX_TABIX as TABIX_PON } from '../../../modules/nf-core/tabix/tabix/main' +include { UNTAR as UNTAR_CHR_DIR } from '../../../modules/nf-core/untar/main' +include { UNZIP as UNZIP_ALLELES } from '../../../modules/nf-core/unzip/main' +include { UNZIP as UNZIP_GC } from '../../../modules/nf-core/unzip/main' +include { UNZIP as UNZIP_LOCI } from '../../../modules/nf-core/unzip/main' +include { UNZIP as UNZIP_RT } from '../../../modules/nf-core/unzip/main' workflow PREPARE_GENOME { take: - ascat_alleles // channel: [optional] ascat allele files - ascat_loci // channel: [optional] ascat loci files - ascat_loci_gc // channel: [optional] ascat gc content file - ascat_loci_rt // channel: [optional] ascat replictiming file - chr_dir // channel: [optional] chromosome files - dbsnp // channel: [optional] dbsnp - fasta // channel: [mandatory] fasta - fasta_fai // channel: [optional] fasta_fai - germline_resource // channel: [optional] germline_resource - known_indels // channel: [optional] known_indels - known_snps // channel: [optional] known_snps - pon // channel: [optional] pon + ascat_alleles // channel: [optional] ascat allele files + ascat_loci // channel: [optional] ascat loci files + ascat_loci_gc // channel: [optional] ascat gc content file + ascat_loci_rt // channel: [optional] ascat replictiming file + bcftools_annotations // channel: [optional] bcftools annotations file + chr_dir // channel: [optional] chromosome files + dbsnp // channel: [optional] dbsnp + fasta // channel: [mandatory] fasta + fasta_fai // channel: [optional] fasta_fai + germline_resource // channel: [optional] germline_resource + known_indels // channel: [optional] known_indels + known_snps // channel: [optional] known_snps + pon // channel: [optional] pon main: @@ -57,6 +59,7 @@ workflow PREPARE_GENOME { // written for KNOWN_INDELS, but preemptively applied to the rest // [ file1, file2 ] becomes [ [ meta1, file1 ], [ meta2, file2 ] ] // outputs are collected to maintain a single channel for relevant TBI files + TABIX_BCFTOOLS_ANNOTATIONS(bcftools_annotations.flatten().map{ it -> [ [ id:it.baseName ], it ] }) TABIX_DBSNP(dbsnp.flatten().map{ it -> [ [ id:it.baseName ], it ] }) TABIX_GERMLINE_RESOURCE(germline_resource.flatten().map{ it -> [ [ id:it.baseName ], it ] }) TABIX_KNOWN_SNPS(known_snps.flatten().map{ it -> [ [ id:it.baseName ], it ] } ) @@ -105,6 +108,7 @@ workflow PREPARE_GENOME { versions = versions.mix(DRAGMAP_HASHTABLE.out.versions) versions = versions.mix(GATK4_CREATESEQUENCEDICTIONARY.out.versions) versions = versions.mix(MSISENSORPRO_SCAN.out.versions) + versions = versions.mix(TABIX_BCFTOOLS_ANNOTATIONS.out.versions) versions = versions.mix(TABIX_DBSNP.out.versions) versions = versions.mix(TABIX_GERMLINE_RESOURCE.out.versions) versions = versions.mix(TABIX_KNOWN_SNPS.out.versions) @@ -112,17 +116,18 @@ workflow PREPARE_GENOME { versions = versions.mix(TABIX_PON.out.versions) emit: - bwa = BWAMEM1_INDEX.out.index.map{ meta, index -> [index] }.collect() // path: bwa/* - bwamem2 = BWAMEM2_INDEX.out.index.map{ meta, index -> [index] }.collect() // path: bwamem2/* - hashtable = DRAGMAP_HASHTABLE.out.hashmap.map{ meta, index -> [index] }.collect() // path: dragmap/* - dbsnp_tbi = TABIX_DBSNP.out.tbi.map{ meta, tbi -> [tbi] }.collect() // path: dbsnb.vcf.gz.tbi - dict = GATK4_CREATESEQUENCEDICTIONARY.out.dict // path: genome.fasta.dict - fasta_fai = SAMTOOLS_FAIDX.out.fai.map{ meta, fai -> [fai] } // path: genome.fasta.fai - germline_resource_tbi = TABIX_GERMLINE_RESOURCE.out.tbi.map{ meta, tbi -> [tbi] }.collect() // path: germline_resource.vcf.gz.tbi - known_snps_tbi = TABIX_KNOWN_SNPS.out.tbi.map{ meta, tbi -> [tbi] }.collect() // path: {known_indels*}.vcf.gz.tbi - known_indels_tbi = TABIX_KNOWN_INDELS.out.tbi.map{ meta, tbi -> [tbi] }.collect() // path: {known_indels*}.vcf.gz.tbi - msisensorpro_scan = MSISENSORPRO_SCAN.out.list.map{ meta, list -> [list] } // path: genome_msi.list - pon_tbi = TABIX_PON.out.tbi.map{ meta, tbi -> [tbi] }.collect() // path: pon.vcf.gz.tbi + bcftools_annotations = TABIX_BCFTOOLS_ANNOTATIONS.out.tbi.map{ meta, tbi -> [tbi] }.collect() // bcftools_annotations.vcf.gz.tbi + bwa = BWAMEM1_INDEX.out.index.map{ meta, index -> [index] }.collect() // path: bwa/* + bwamem2 = BWAMEM2_INDEX.out.index.map{ meta, index -> [index] }.collect() // path: bwamem2/* + hashtable = DRAGMAP_HASHTABLE.out.hashmap.map{ meta, index -> [index] }.collect() // path: dragmap/* + dbsnp_tbi = TABIX_DBSNP.out.tbi.map{ meta, tbi -> [tbi] }.collect() // path: dbsnb.vcf.gz.tbi + dict = GATK4_CREATESEQUENCEDICTIONARY.out.dict // path: genome.fasta.dict + fasta_fai = SAMTOOLS_FAIDX.out.fai.map{ meta, fai -> [fai] } // path: genome.fasta.fai + germline_resource_tbi = TABIX_GERMLINE_RESOURCE.out.tbi.map{ meta, tbi -> [tbi] }.collect() // path: germline_resource.vcf.gz.tbi + known_snps_tbi = TABIX_KNOWN_SNPS.out.tbi.map{ meta, tbi -> [tbi] }.collect() // path: {known_indels*}.vcf.gz.tbi + known_indels_tbi = TABIX_KNOWN_INDELS.out.tbi.map{ meta, tbi -> [tbi] }.collect() // path: {known_indels*}.vcf.gz.tbi + msisensorpro_scan = MSISENSORPRO_SCAN.out.list.map{ meta, list -> [list] } // path: genome_msi.list + pon_tbi = TABIX_PON.out.tbi.map{ meta, tbi -> [tbi] }.collect() // path: pon.vcf.gz.tbi allele_files chr_files gc_file diff --git a/subworkflows/local/samplesheet_to_channel/main.nf b/subworkflows/local/samplesheet_to_channel/main.nf index d674fa1dcd..6784b4616b 100644 --- a/subworkflows/local/samplesheet_to_channel/main.nf +++ b/subworkflows/local/samplesheet_to_channel/main.nf @@ -252,8 +252,8 @@ workflow SAMPLESHEET_TO_CHANNEL{ } // Fails when bcftools annotate is used but no files are supplied - if (params.tools && params.tools.split(',').contains('bcfann') && !(params.bcftools_annotations && params.bcftools_annotations_index && params.bcftools_header_lines)) { - error("Please specify --bcftools_annotations, --bcftools_annotations_index, and --bcftools_header_lines, when using BCFTools annotations") + if (params.tools && params.tools.split(',').contains('bcfann') && !(params.bcftools_annotations && params.bcftools_annotations_tbi && params.bcftools_header_lines)) { + error("Please specify --bcftools_annotations, --bcftools_annotations_tbi, and --bcftools_header_lines, when using BCFTools annotations") } emit: diff --git a/workflows/sarek.nf b/workflows/sarek.nf index 99ecd733fc..5b0a9dcde9 100644 --- a/workflows/sarek.nf +++ b/workflows/sarek.nf @@ -28,7 +28,7 @@ def checkPathParamList = [ params.bwa, params.bwamem2, params.bcftools_annotations, - params.bcftools_annotations_index, + params.bcftools_annotations_tbi, params.bcftools_header_lines, params.cf_chrom_len, params.chr_dir, @@ -87,6 +87,8 @@ ascat_alleles = params.ascat_alleles ? Channel.fromPath(para ascat_loci = params.ascat_loci ? Channel.fromPath(params.ascat_loci).collect() : Channel.empty() ascat_loci_gc = params.ascat_loci_gc ? Channel.fromPath(params.ascat_loci_gc).collect() : Channel.value([]) ascat_loci_rt = params.ascat_loci_rt ? Channel.fromPath(params.ascat_loci_rt).collect() : Channel.value([]) +bcftools_annotations = params.bcftools_annotations ? Channel.fromPath(params.bcftools_annotations).collect() : Channel.empty() +bcftools_header_lines = params.bcftools_header_lines ? Channel.fromPath(params.bcftools_header_lines).collect() : Channel.empty() cf_chrom_len = params.cf_chrom_len ? Channel.fromPath(params.cf_chrom_len).collect() : [] chr_dir = params.chr_dir ? Channel.fromPath(params.chr_dir).collect() : Channel.value([]) dbsnp = params.dbsnp ? Channel.fromPath(params.dbsnp).collect() : Channel.value([]) @@ -100,18 +102,16 @@ pon = params.pon ? Channel.fromPath(para sentieon_dnascope_model = params.sentieon_dnascope_model ? Channel.fromPath(params.sentieon_dnascope_model).collect() : Channel.value([]) // Initialize value channels based on params, defined in the params.genomes[params.genome] scope -ascat_genome = params.ascat_genome ?: Channel.empty() -dbsnp_vqsr = params.dbsnp_vqsr ? Channel.value(params.dbsnp_vqsr) : Channel.empty() -known_indels_vqsr = params.known_indels_vqsr ? Channel.value(params.known_indels_vqsr) : Channel.empty() -known_snps_vqsr = params.known_snps_vqsr ? Channel.value(params.known_snps_vqsr) : Channel.empty() -ngscheckmate_bed = params.ngscheckmate_bed ? Channel.value(params.ngscheckmate_bed) : Channel.empty() -snpeff_db = params.snpeff_db ?: Channel.empty() -vep_cache_version = params.vep_cache_version ?: Channel.empty() -vep_genome = params.vep_genome ?: Channel.empty() -vep_species = params.vep_species ?: Channel.empty() -bcftools_annotations = params.bcftools_annotations ?: Channel.empty() -bcftools_annotations_index = params.bcftools_annotations_index ?: Channel.empty() -bcftools_header_lines = params.bcftools_header_lines ?: Channel.empty() +ascat_genome = params.ascat_genome ?: Channel.empty() +dbsnp_vqsr = params.dbsnp_vqsr ? Channel.value(params.dbsnp_vqsr) : Channel.empty() +known_indels_vqsr = params.known_indels_vqsr ? Channel.value(params.known_indels_vqsr) : Channel.empty() +known_snps_vqsr = params.known_snps_vqsr ? Channel.value(params.known_snps_vqsr) : Channel.empty() +ngscheckmate_bed = params.ngscheckmate_bed ? Channel.value(params.ngscheckmate_bed) : Channel.empty() +snpeff_db = params.snpeff_db ?: Channel.empty() +vep_cache_version = params.vep_cache_version ?: Channel.empty() +vep_genome = params.vep_genome ?: Channel.empty() +vep_species = params.vep_species ?: Channel.empty() + vep_extra_files = [] @@ -287,6 +287,7 @@ workflow SAREK { ascat_loci, ascat_loci_gc, ascat_loci_rt, + bcftools_annotations, chr_dir, dbsnp, fasta, @@ -325,11 +326,12 @@ workflow SAREK { rt_file = PREPARE_GENOME.out.rt_file // Tabix indexed vcf files: - dbsnp_tbi = params.dbsnp ? params.dbsnp_tbi ? Channel.fromPath(params.dbsnp_tbi).collect() : PREPARE_GENOME.out.dbsnp_tbi : Channel.value([]) - germline_resource_tbi = params.germline_resource ? params.germline_resource_tbi ? Channel.fromPath(params.germline_resource_tbi).collect() : PREPARE_GENOME.out.germline_resource_tbi : [] //do not change to Channel.value([]), the check for its existence then fails for Getpileupsumamries - known_indels_tbi = params.known_indels ? params.known_indels_tbi ? Channel.fromPath(params.known_indels_tbi).collect() : PREPARE_GENOME.out.known_indels_tbi : Channel.value([]) - known_snps_tbi = params.known_snps ? params.known_snps_tbi ? Channel.fromPath(params.known_snps_tbi).collect() : PREPARE_GENOME.out.known_snps_tbi : Channel.value([]) - pon_tbi = params.pon ? params.pon_tbi ? Channel.fromPath(params.pon_tbi).collect() : PREPARE_GENOME.out.pon_tbi : Channel.value([]) + bcftools_annotations_tbi = params.bcftools_annotations ? params.bcftools_annotations_tbi ? Channel.fromPath(params.bcftools_annotations_tbi).collect() : PREPARE_GENOME.out.bcftools_annotations_tbi : Channel.empty([]) + dbsnp_tbi = params.dbsnp ? params.dbsnp_tbi ? Channel.fromPath(params.dbsnp_tbi).collect() : PREPARE_GENOME.out.dbsnp_tbi : Channel.value([]) + germline_resource_tbi = params.germline_resource ? params.germline_resource_tbi ? Channel.fromPath(params.germline_resource_tbi).collect() : PREPARE_GENOME.out.germline_resource_tbi : [] //do not change to Channel.value([]), the check for its existence then fails for Getpileupsumamries + known_indels_tbi = params.known_indels ? params.known_indels_tbi ? Channel.fromPath(params.known_indels_tbi).collect() : PREPARE_GENOME.out.known_indels_tbi : Channel.value([]) + known_snps_tbi = params.known_snps ? params.known_snps_tbi ? Channel.fromPath(params.known_snps_tbi).collect() : PREPARE_GENOME.out.known_snps_tbi : Channel.value([]) + pon_tbi = params.pon ? params.pon_tbi ? Channel.fromPath(params.pon_tbi).collect() : PREPARE_GENOME.out.pon_tbi : Channel.value([]) // known_sites is made by grouping both the dbsnp and the known snps/indels resources // Which can either or both be optional @@ -1057,7 +1059,7 @@ workflow SAREK { vep_cache, vep_extra_files, bcftools_annotations, - bcftools_annotations_index, + bcftools_annotations_tbi, bcftools_header_lines) // Gather used softwares versions From 9771cfb9cffbae37e1121347b10af6a710ec22f3 Mon Sep 17 00:00:00 2001 From: Rike Date: Fri, 17 Nov 2023 16:26:36 +0100 Subject: [PATCH 08/11] fix naming, add config --- conf/modules/prepare_genome.config | 10 ++++++++++ subworkflows/local/prepare_genome/main.nf | 24 +++++++++++------------ 2 files changed, 22 insertions(+), 12 deletions(-) diff --git a/conf/modules/prepare_genome.config b/conf/modules/prepare_genome.config index 241c164e7e..e948e1eea5 100644 --- a/conf/modules/prepare_genome.config +++ b/conf/modules/prepare_genome.config @@ -96,6 +96,16 @@ process { ] } + withName: 'TABIX_BCFTOOLS_ANNOTATIONS' { + ext.when = { !params.bcftools_annotations_tbi && params.bcftools_annotations && params.tools && params.tools.split(',').contains('bcfann') } + publishDir = [ + mode: params.publish_dir_mode, + path: { "${params.outdir}/reference/bcfann" }, + pattern: "*vcf.gz.tbi", + saveAs: { params.save_reference || params.build_only_index ? it : null } + ] + } + withName: 'TABIX_DBSNP' { ext.when = { !params.dbsnp_tbi && params.dbsnp && ((params.step == "mapping" || params.step == "markduplicates" || params.step == "prepare_recalibration") || params.tools && (params.tools.split(',').contains('controlfreec') || params.tools.split(',').contains('haplotypecaller') || params.tools.split(',').contains('sentieon_haplotyper') || params.tools.split(',').contains('sentieon_dnascope') || params.tools.split(',').contains('mutect2'))) } publishDir = [ diff --git a/subworkflows/local/prepare_genome/main.nf b/subworkflows/local/prepare_genome/main.nf index 4582c6dd74..f9b9e62c95 100644 --- a/subworkflows/local/prepare_genome/main.nf +++ b/subworkflows/local/prepare_genome/main.nf @@ -116,18 +116,18 @@ workflow PREPARE_GENOME { versions = versions.mix(TABIX_PON.out.versions) emit: - bcftools_annotations = TABIX_BCFTOOLS_ANNOTATIONS.out.tbi.map{ meta, tbi -> [tbi] }.collect() // bcftools_annotations.vcf.gz.tbi - bwa = BWAMEM1_INDEX.out.index.map{ meta, index -> [index] }.collect() // path: bwa/* - bwamem2 = BWAMEM2_INDEX.out.index.map{ meta, index -> [index] }.collect() // path: bwamem2/* - hashtable = DRAGMAP_HASHTABLE.out.hashmap.map{ meta, index -> [index] }.collect() // path: dragmap/* - dbsnp_tbi = TABIX_DBSNP.out.tbi.map{ meta, tbi -> [tbi] }.collect() // path: dbsnb.vcf.gz.tbi - dict = GATK4_CREATESEQUENCEDICTIONARY.out.dict // path: genome.fasta.dict - fasta_fai = SAMTOOLS_FAIDX.out.fai.map{ meta, fai -> [fai] } // path: genome.fasta.fai - germline_resource_tbi = TABIX_GERMLINE_RESOURCE.out.tbi.map{ meta, tbi -> [tbi] }.collect() // path: germline_resource.vcf.gz.tbi - known_snps_tbi = TABIX_KNOWN_SNPS.out.tbi.map{ meta, tbi -> [tbi] }.collect() // path: {known_indels*}.vcf.gz.tbi - known_indels_tbi = TABIX_KNOWN_INDELS.out.tbi.map{ meta, tbi -> [tbi] }.collect() // path: {known_indels*}.vcf.gz.tbi - msisensorpro_scan = MSISENSORPRO_SCAN.out.list.map{ meta, list -> [list] } // path: genome_msi.list - pon_tbi = TABIX_PON.out.tbi.map{ meta, tbi -> [tbi] }.collect() // path: pon.vcf.gz.tbi + bcftools_annotations_tbi = TABIX_BCFTOOLS_ANNOTATIONS.out.tbi.map{ meta, tbi -> [tbi] }.collect() // bcftools_annotations.vcf.gz.tbi + bwa = BWAMEM1_INDEX.out.index.map{ meta, index -> [index] }.collect() // path: bwa/* + bwamem2 = BWAMEM2_INDEX.out.index.map{ meta, index -> [index] }.collect() // path: bwamem2/* + hashtable = DRAGMAP_HASHTABLE.out.hashmap.map{ meta, index -> [index] }.collect() // path: dragmap/* + dbsnp_tbi = TABIX_DBSNP.out.tbi.map{ meta, tbi -> [tbi] }.collect() // path: dbsnb.vcf.gz.tbi + dict = GATK4_CREATESEQUENCEDICTIONARY.out.dict // path: genome.fasta.dict + fasta_fai = SAMTOOLS_FAIDX.out.fai.map{ meta, fai -> [fai] } // path: genome.fasta.fai + germline_resource_tbi = TABIX_GERMLINE_RESOURCE.out.tbi.map{ meta, tbi -> [tbi] }.collect() // path: germline_resource.vcf.gz.tbi + known_snps_tbi = TABIX_KNOWN_SNPS.out.tbi.map{ meta, tbi -> [tbi] }.collect() // path: {known_indels*}.vcf.gz.tbi + known_indels_tbi = TABIX_KNOWN_INDELS.out.tbi.map{ meta, tbi -> [tbi] }.collect() // path: {known_indels*}.vcf.gz.tbi + msisensorpro_scan = MSISENSORPRO_SCAN.out.list.map{ meta, list -> [list] } // path: genome_msi.list + pon_tbi = TABIX_PON.out.tbi.map{ meta, tbi -> [tbi] }.collect() // path: pon.vcf.gz.tbi allele_files chr_files gc_file From d978945b701d7270de85f08fdd12429da3d861c7 Mon Sep 17 00:00:00 2001 From: Rike Date: Mon, 20 Nov 2023 08:39:27 +0100 Subject: [PATCH 09/11] add parameter change to CHANGELOG --- CHANGELOG.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index abdd2166ef..2eaf71de06 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -30,6 +30,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 | script | Old name | New name | | ------ | -------- | -------- | +### Parameter + +| Old name | New name | +| -------------------------- | ----------------------- | +| bcftools_annotations_index | bcftools_annotations_tbi| + ## [3.4.0](https://github.com/nf-core/sarek/releases/tag/3.4.0) - Pårtetjåkko Pårtetjåkko is a mountain in the south of the park. From c3b4e3be9854b3379260657b5413b1ff1313ea65 Mon Sep 17 00:00:00 2001 From: Rike Date: Mon, 20 Nov 2023 08:40:22 +0100 Subject: [PATCH 10/11] Describe changes in more detail --- CHANGELOG.md | 1 + 1 file changed, 1 insertion(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 2eaf71de06..16ee03603a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,6 +10,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Added - [#1333](https://github.com/nf-core/sarek/pull/1333) - Back to dev +- [#1335](https://github.com/nf-core/sarek/pull/1335) - Add index computation of `bcftools_annotations`, if not provided ### Changed From fa45aaba33bae398e6d66670fd90150c92718d02 Mon Sep 17 00:00:00 2001 From: nf-core-bot Date: Mon, 20 Nov 2023 07:44:54 +0000 Subject: [PATCH 11/11] [automated] Fix linting with Prettier --- CHANGELOG.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 16ee03603a..a2918e702c 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -33,9 +33,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Parameter -| Old name | New name | -| -------------------------- | ----------------------- | -| bcftools_annotations_index | bcftools_annotations_tbi| +| Old name | New name | +| -------------------------- | ------------------------ | +| bcftools_annotations_index | bcftools_annotations_tbi | ## [3.4.0](https://github.com/nf-core/sarek/releases/tag/3.4.0) - Pårtetjåkko