Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASSIGN_REFERENCES:SOURMASH_COMPARE error when inputting newly formatted test metadata file #46

Closed
masudermann opened this issue Feb 23, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@masudermann
Copy link
Contributor

Description of the bug

I was running some of the test datasets in preparation to input a high complexity dataset, and encountered an error with PATHOGENSURVEILLANCE:ASSIGN_REFERENCES:SOURMASH_COMPARE

The data input was metadata_PRJNA523365_small.csv

ERROR ~ Error executing process > 'PATHOGENSURVEILLANCE:ASSIGN_REFERENCES:SOURMASH_COMPARE (1)'

Caused by:
Process PATHOGENSURVEILLANCE:ASSIGN_REFERENCES:SOURMASH_COMPARE input file name collision -- There are multiple input files for each of the following file names: GCF_017189435_1.sig

The metadata file has some new columns compared to the other test datasets, and I wonder if this contributed.
Everything else leading up to this ran as expected.

Command used and terminal output

(nf-core) marthasudermann@pop-os:~/pathogensurveillance$ nextflow run main.nf --input 'https://raw.githubusercontent.com/grunwaldlab/pathogensurveillance/master/test/data/metadata_PRJNA523365_small.csv' --outdir test_out4 --bakta_db /home/marthasudermann/Software/bakta_db_02_2024/db/ -profile docker -resume
N E X T F L O W  ~  version 23.10.1
Launching `main.nf` [confident_engelbart] DSL2 - revision: cc83aa0c27


------------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/plantpathsurveil v1.0dev
------------------------------------------------------
Core Nextflow options
  runName        : confident_engelbart
  containerEngine: docker
  launchDir      : /home/marthasudermann/pathogensurveillance
  workDir        : /home/marthasudermann/pathogensurveillance/work
  projectDir     : /home/marthasudermann/pathogensurveillance
  userName       : marthasudermann
  profile        : docker
  configFiles    : /home/marthasudermann/pathogensurveillance/nextflow.config

Input/output options
  input          : https://raw.githubusercontent.com/grunwaldlab/pathogensurveillance/master/test/data/metadata_PRJNA523365_small.csv
  outdir         : test_out4
  bakta_db       : /home/marthasudermann/Software/bakta_db_02_2024/db/

!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
If you use nf-core/plantpathsurveil for your analysis please cite:

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x

* Software dependencies
  https://github.com/nf-core/plantpathsurveil/blob/master/CITATIONS.md
------------------------------------------------------
[-        ] process > PATHOGENSURVEILLANCE:INPUT_CHECK:SAMPLESHEET_CHECK                                    -
[-        ] process > PATHOGENSURVEILLANCE:SRATOOLS_FASTERQDUMP                                             -
[-        ] process > PATHOGENSURVEILLANCE:DOWNLOAD_ASSEMBLIES                                              -
[-        ] process > PATHOGENSURVEILLANCE:FASTQC                                                           -
[-        ] process > PATHOGENSURVEILLANCE:COARSE_SAMPLE_TAXONOMY:BBMAP_SENDSKETCH                          -
[-        ] process > PATHOGENSURVEILLANCE:COARSE_SAMPLE_TAXONOMY:INITIAL_CLASSIFICATION                    -
[-        ] process > PATHOGENSURVEILLANCE:DOWNLOAD_REFERENCES:FIND_ASSEMBLIES                              -
[-        ] process > PATHOGENSURVEILLANCE:DOWNLOAD_REFERENCES:PICK_ASSEMBLIES                              -
[-        ] process > PATHOGENSURVEILLANCE:DOWNLOAD_REFERENCES:DOWNLOAD_ASSEMBLIES                          -
[-        ] process > PATHOGENSURVEILLANCE:DOWNLOAD_REFERENCES:MAKE_GFF_WITH_FASTA                          -
[-        ] process > PATHOGENSURVEILLANCE:DOWNLOAD_REFERENCES:SOURMASH_SKETCH_GENOME                       -
[-        ] process > PATHOGENSURVEILLANCE:ASSIGN_REFERENCES:SUBSET_READS                                   -
[-        ] process > PATHOGENSURVEILLANCE:INPUT_CHECK:SAMPLESHEET_CHECK                                    -
[-        ] process > PATHOGENSURVEILLANCE:SRATOOLS_FASTERQDUMP                                             -
[-        ] process > PATHOGENSURVEILLANCE:DOWNLOAD_ASSEMBLIES                                              -
[-        ] process > PATHOGENSURVEILLANCE:FASTQC                                                           -
[-        ] process > PATHOGENSURVEILLANCE:COARSE_SAMPLE_TAXONOMY:BBMAP_SENDSKETCH                          -
[-        ] process > PATHOGENSURVEILLANCE:COARSE_SAMPLE_TAXONOMY:INITIAL_CLASSIFICATION                    -
[-        ] process > PATHOGENSURVEILLANCE:DOWNLOAD_REFERENCES:FIND_ASSEMBLIES                              -
[-        ] process > PATHOGENSURVEILLANCE:DOWNLOAD_REFERENCES:PICK_ASSEMBLIES                              -
[-        ] process > PATHOGENSURVEILLANCE:DOWNLOAD_REFERENCES:DOWNLOAD_ASSEMBLIES                          -
[-        ] process > PATHOGENSURVEILLANCE:DOWNLOAD_REFERENCES:MAKE_GFF_WITH_FASTA                          -
[-        ] process > PATHOGENSURVEILLANCE:DOWNLOAD_REFERENCES:SOURMASH_SKETCH_GENOME                       -
[-        ] process > PATHOGENSURVEILLANCE:ASSIGN_REFERENCES:SUBSET_READS                                   -
[-        ] process > PATHOGENSURVEILLANCE:ASSIGN_REFERENCES:KHMER_TRIMLOWABUND                             -
[-        ] process > PATHOGENSURVEILLANCE:ASSIGN_REFERENCES:SOURMASH_SKETCH_READS                          -
[-        ] process > PATHOGENSURVEILLANCE:ASSIGN_REFERENCES:SOURMASH_SKETCH_GENOME                         -
[-        ] process > PATHOGENSURVEILLANCE:ASSIGN_REFERENCES:SOURMASH_COMPARE                               -
[f5/6125a3] process > PATHOGENSURVEILLANCE:INPUT_CHECK:SAMPLESHEET_CHECK (metadata_PRJNA523365_small.csv)   [100%] 1 of 1, cached: 1 ✔
[21/b6b96f] process > PATHOGENSURVEILLANCE:SRATOOLS_FASTERQDUMP (SRR12574846)                               [100%] 3 of 3, cached: 3 ✔
[42/f6d4e0] process > PATHOGENSURVEILLANCE:DOWNLOAD_ASSEMBLIES (GCF_017189435_1)                            [100%] 1 of 1, cached: 1 ✔
[63/a974dd] process > PATHOGENSURVEILLANCE:FASTQC (SRR12574847)                                             [100%] 3 of 3, cached: 3 ✔
[c2/8679ec] process > PATHOGENSURVEILLANCE:COARSE_SAMPLE_TAXONOMY:BBMAP_SENDSKETCH (SRR12574848)            [100%] 3 of 3, cached: 3 ✔
[a4/18385c] process > PATHOGENSURVEILLANCE:COARSE_SAMPLE_TAXONOMY:INITIAL_CLASSIFICATION (SRR12574846)      [100%] 3 of 3, cached: 3 ✔
[24/1a3d6e] process > PATHOGENSURVEILLANCE:DOWNLOAD_REFERENCES:FIND_ASSEMBLIES (Mycobacteriaceae)           [100%] 1 of 1, cached: 1 ✔
[6a/673614] process > PATHOGENSURVEILLANCE:DOWNLOAD_REFERENCES:PICK_ASSEMBLIES (SRR12574846)                [100%] 3 of 3, cached: 3 ✔
[1d/f003e6] process > PATHOGENSURVEILLANCE:DOWNLOAD_REFERENCES:DOWNLOAD_ASSEMBLIES (GCF_001677215_1)        [100%] 9 of 9, cached: 9 ✔
[65/b881e2] process > PATHOGENSURVEILLANCE:DOWNLOAD_REFERENCES:MAKE_GFF_WITH_FASTA (GCF_001456355_1)        [100%] 9 of 9, cached: 9 ✔
[db/0526c0] process > PATHOGENSURVEILLANCE:DOWNLOAD_REFERENCES:SOURMASH_SKETCH_GENOME (GCF_001456355_1)     [100%] 9 of 9, cached: 9 ✔
[6f/fce62b] process > PATHOGENSURVEILLANCE:ASSIGN_REFERENCES:SUBSET_READS (SRR12574846)                     [100%] 3 of 3, cached: 3 ✔
[cc/a2f30c] process > PATHOGENSURVEILLANCE:ASSIGN_REFERENCES:KHMER_TRIMLOWABUND (SRR12574847)               [100%] 3 of 3, cached: 3 ✔
[9e/09a4cb] process > PATHOGENSURVEILLANCE:ASSIGN_REFERENCES:SOURMASH_SKETCH_READS (SRR12574847)            [100%] 3 of 3, cached: 3 ✔
[d3/a6b247] process > PATHOGENSURVEILLANCE:ASSIGN_REFERENCES:SOURMASH_SKETCH_GENOME (GCF_017189435_1)       [100%] 3 of 3, cached: 3 ✔
[-        ] process > PATHOGENSURVEILLANCE:ASSIGN_REFERENCES:SOURMASH_COMPARE                               -
[-        ] process > PATHOGENSURVEILLANCE:ASSIGN_REFERENCES:ASSIGN_GROUP_REFERENCES                        -
[-        ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:REFERENCE_INDEX:PICARD_CREATESEQUENCEDICTIONARY -
[-        ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:REFERENCE_INDEX:SAMTOOLS_FAIDX                  -
[-        ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:REFERENCE_INDEX:BWA_INDEX                       -
[-        ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:ALIGN_READS:CALCULATE_DEPTH                     -
[-        ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:ALIGN_READS:SUBSET_READS                        -
[-        ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:ALIGN_READS:BWA_MEM                             -
[-        ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:ALIGN_READS:PICARD_ADDORREPLACEREADGROUPS       -
[-        ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:ALIGN_READS:PICARD_SORTSAM_1                    -
[-        ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:ALIGN_READS:PICARD_MARKDUPLICATES               -
[-        ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:ALIGN_READS:PICARD_SORTSAM_2                    -
[f5/6125a3] process > PATHOGENSURVEILLANCE:INPUT_CHECK:SAMPLESHEET_CHECK (metadata_PRJNA523365_small.csv)   [100%] 1 of 1, cached: 1 ✔
[21/b6b96f] process > PATHOGENSURVEILLANCE:SRATOOLS_FASTERQDUMP (SRR12574846)                               [100%] 3 of 3, cached: 3 ✔
[42/f6d4e0] process > PATHOGENSURVEILLANCE:DOWNLOAD_ASSEMBLIES (GCF_017189435_1)                            [100%] 1 of 1, cached: 1 ✔
[63/a974dd] process > PATHOGENSURVEILLANCE:FASTQC (SRR12574847)                                             [100%] 3 of 3, cached: 3 ✔
[c2/8679ec] process > PATHOGENSURVEILLANCE:COARSE_SAMPLE_TAXONOMY:BBMAP_SENDSKETCH (SRR12574848)            [100%] 3 of 3, cached: 3 ✔
[a4/18385c] process > PATHOGENSURVEILLANCE:COARSE_SAMPLE_TAXONOMY:INITIAL_CLASSIFICATION (SRR12574846)      [100%] 3 of 3, cached: 3 ✔
[24/1a3d6e] process > PATHOGENSURVEILLANCE:DOWNLOAD_REFERENCES:FIND_ASSEMBLIES (Mycobacteriaceae)           [100%] 1 of 1, cached: 1 ✔
[6a/673614] process > PATHOGENSURVEILLANCE:DOWNLOAD_REFERENCES:PICK_ASSEMBLIES (SRR12574846)                [100%] 3 of 3, cached: 3 ✔
[1d/f003e6] process > PATHOGENSURVEILLANCE:DOWNLOAD_REFERENCES:DOWNLOAD_ASSEMBLIES (GCF_001677215_1)        [100%] 9 of 9, cached: 9 ✔
[65/b881e2] process > PATHOGENSURVEILLANCE:DOWNLOAD_REFERENCES:MAKE_GFF_WITH_FASTA (GCF_001456355_1)        [100%] 9 of 9, cached: 9 ✔
[db/0526c0] process > PATHOGENSURVEILLANCE:DOWNLOAD_REFERENCES:SOURMASH_SKETCH_GENOME (GCF_001456355_1)     [100%] 9 of 9, cached: 9 ✔
[6f/fce62b] process > PATHOGENSURVEILLANCE:ASSIGN_REFERENCES:SUBSET_READS (SRR12574846)                     [100%] 3 of 3, cached: 3 ✔
[cc/a2f30c] process > PATHOGENSURVEILLANCE:ASSIGN_REFERENCES:KHMER_TRIMLOWABUND (SRR12574847)               [100%] 3 of 3, cached: 3 ✔
[9e/09a4cb] process > PATHOGENSURVEILLANCE:ASSIGN_REFERENCES:SOURMASH_SKETCH_READS (SRR12574847)            [100%] 3 of 3, cached: 3 ✔
[d3/a6b247] process > PATHOGENSURVEILLANCE:ASSIGN_REFERENCES:SOURMASH_SKETCH_GENOME (GCF_017189435_1)       [100%] 3 of 3, cached: 3 ✔
[-        ] process > PATHOGENSURVEILLANCE:ASSIGN_REFERENCES:SOURMASH_COMPARE                               -
[-        ] process > PATHOGENSURVEILLANCE:ASSIGN_REFERENCES:ASSIGN_GROUP_REFERENCES                        -
[-        ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:REFERENCE_INDEX:PICARD_CREATESEQUENCEDICTIONARY -
[-        ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:REFERENCE_INDEX:SAMTOOLS_FAIDX                  -
[-        ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:REFERENCE_INDEX:BWA_INDEX                       -
[-        ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:ALIGN_READS:CALCULATE_DEPTH                     -
[-        ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:ALIGN_READS:SUBSET_READS                        -
[-        ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:ALIGN_READS:BWA_MEM                             -
[-        ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:ALIGN_READS:PICARD_ADDORREPLACEREADGROUPS       -
[-        ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:ALIGN_READS:PICARD_SORTSAM_1                    -
[-        ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:ALIGN_READS:PICARD_MARKDUPLICATES               -
[-        ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:ALIGN_READS:PICARD_SORTSAM_2                    -
[-        ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:ALIGN_READS:SAMTOOLS_INDEX                      -
[-        ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:CALL_VARIANTS:MAKE_REGION_FILE                  -
[-        ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:CALL_VARIANTS:GRAPHTYPER_GENOTYPE               -
[-        ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:CALL_VARIANTS:GRAPHTYPER_VCFCONCATENATE         -
[-        ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:CALL_VARIANTS:TABIX_TABIX                       -
[-        ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:CALL_VARIANTS:BGZIP_MAKE_GZIP                   -
[-        ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:CALL_VARIANTS:GATK4_VARIANTFILTRATION           -
[-        ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:CALL_VARIANTS:VCFLIB_VCFFILTER                  -
[-        ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:VCF_TO_TAB                                      -
[-        ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:VCF_TO_SNPALN                                   -
[-        ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:IQTREE2_SNP                                     -
[-        ] process > PATHOGENSURVEILLANCE:GENOME_ASSEMBLY:SUBSET_READS                                     -
[-        ] process > PATHOGENSURVEILLANCE:GENOME_ASSEMBLY:FASTP                                            -
[-        ] process > PATHOGENSURVEILLANCE:GENOME_ASSEMBLY:SPADES                                           -
[-        ] process > PATHOGENSURVEILLANCE:GENOME_ASSEMBLY:FILTER_ASSEMBLY                                  -
[-        ] process > PATHOGENSURVEILLANCE:GENOME_ASSEMBLY:QUAST                                            -
[-        ] process > PATHOGENSURVEILLANCE:GENOME_ASSEMBLY:BAKTA_BAKTA                                      -
[-        ] process > PATHOGENSURVEILLANCE:CORE_GENOME_PHYLOGENY:PIRATE                                     -
[-        ] process > PATHOGENSURVEILLANCE:CORE_GENOME_PHYLOGENY:REFORMAT_PIRATE_RESULTS                    -
[-        ] process > PATHOGENSURVEILLANCE:CORE_GENOME_PHYLOGENY:ALIGN_FEATURE_SEQUENCES                    -
[-        ] process > PATHOGENSURVEILLANCE:CORE_GENOME_PHYLOGENY:RENAME_CORE_GENE_HEADERS                   -
[-        ] process > PATHOGENSURVEILLANCE:CORE_GENOME_PHYLOGENY:SUBSET_CORE_GENES                          -
[-        ] process > PATHOGENSURVEILLANCE:CORE_GENOME_PHYLOGENY:MAFFT_SMALL                                -
[-        ] process > PATHOGENSURVEILLANCE:CORE_GENOME_PHYLOGENY:IQTREE2_CORE                               -
[-        ] process > PATHOGENSURVEILLANCE:CUSTOM_DUMPSOFTWAREVERSIONS                                      -
[-        ] process > PATHOGENSURVEILLANCE:MULTIQC                                                          -
[-        ] process > PATHOGENSURVEILLANCE:RECORD_MESSAGES                                                  -
[-        ] process > PATHOGENSURVEILLANCE:MAIN_REPORT                                                      -
ERROR ~ Error executing process > 'PATHOGENSURVEILLANCE:ASSIGN_REFERENCES:SOURMASH_COMPARE (1)'

Caused by:
  Process `PATHOGENSURVEILLANCE:ASSIGN_REFERENCES:SOURMASH_COMPARE` input file name collision -- There are multiple input files for each of the following file names: GCF_017189435_1.sig


Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

 -- Check '.nextflow.log' file for details

Relevant files

nextflow.log

System information

Nextflow 23.10.1.5891
Desktop
local
Docker
Linux

@masudermann masudermann added the bug Something isn't working label Feb 23, 2024
@zachary-foster
Copy link
Contributor

Im guessing that the pipeline tried to select and download the same reference as the user specified.

@masudermann
Copy link
Contributor Author

masudermann commented Feb 27, 2024

As a quick follow-up: with a separate test dataset (high_complexity_kpneumoniae), I encountered a second error at (PATHOGENSURVEILLANCE:ASSIGN_REFERENCES:SOURMASH_COMPARE) step. When I go into the output directory sourmash_sketch_genome, I see signatures for several assemblies and then a final null.sig. Remaning ignature files look fine. I didn't specify any user-defined references and just have a 'sample_id' column and 'sra' column in my input metadata sheet.

ERROR ~ Error executing process > 'PATHOGENSURVEILLANCE:ASSIGN_REFERENCES:SOURMASH_COMPARE (1)'

Caused by:
Process PATHOGENSURVEILLANCE:ASSIGN_REFERENCES:SOURMASH_COMPARE input file name collision -- There are multiple input files for each of the following file names: null.sig

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh

-- Check '.nextflow.log' file for details

My main command was as follows (with test data config and metadata files in the appropriate location)
nextflow run main.nf --input /home/marthasudermann/pathogensurveillance/test/data/metadata_high_complexity_kpneumoniae.csv --outdir test_highcomplexity2 --bakta_db /home/marthasudermann/Software/bakta_db_02_2024/db/ -profile docker -resume

@zachary-foster
Copy link
Contributor

Thanks, I will try it

@zachary-foster
Copy link
Contributor

This should be fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants