Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Glitch at the download genome step #13

Closed
cahuparo opened this issue Aug 10, 2023 · 3 comments
Closed

Glitch at the download genome step #13

cahuparo opened this issue Aug 10, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@cahuparo
Copy link
Contributor

cahuparo commented Aug 10, 2023

Description of the bug

This maybe nothing at all but maybe important to mention in the documentation, that this process may require restart...
At the download genome assemblies step, I got this error:

ERROR ~ Error executing process > 'NFCORE_PLANTPATHSURVEIL:PLANTPATHSURVEIL:DOWNLOAD_REFERENCES:DOWNLOAD_ASSEMBLIES (GCA_016864655.1)'

Caused by:
  Process `NFCORE_PLANTPATHSURVEIL:PLANTPATHSURVEIL:DOWNLOAD_REFERENCES:DOWNLOAD_ASSEMBLIES (GCA_016864655.1)` terminated with an error exit status (2)

Command executed:

  # Download assemblies as zip archives
  datasets download genome accession GCA_016864655.1 --include gff3,rna,cds,protein,genome,seq-report --filename GCA_016864655.1.zip

  # Unzip
  unzip GCA_016864655.1.zip

  # Rename files with assembly name
  if [ -f ncbi_dataset/data/GCA_016864655.1/genomic.gff ]; then
      mv ncbi_dataset/data/GCA_016864655.1/genomic.gff ncbi_dataset/data/GCA_016864655.1/GCA_016864655.1.gff
  fi
  if [ -f ncbi_dataset/data/GCA_016864655.1/cds_from_genomic.fna ]; then
      mv ncbi_dataset/data/GCA_016864655.1/cds_from_genomic.fna ncbi_dataset/data/GCA_016864655.1/GCA_016864655.1_cds.fna
  fi
  if [ -f ncbi_dataset/data/GCA_016864655.1/protein.faa ]; then
      mv ncbi_dataset/data/GCA_016864655.1/protein.faa ncbi_dataset/data/GCA_016864655.1/GCA_016864655.1.faa
  fi

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_PLANTPATHSURVEIL:PLANTPATHSURVEIL:DOWNLOAD_REFERENCES:DOWNLOAD_ASSEMBLIES":
      datasets: $(datasets --version | sed -e "s/datasets version: //")
  END_VERSIONS

Command exit status:
  2

Command output:
  Archive:  GCA_016864655.1.zip
    inflating: README.md
    inflating: ncbi_dataset/data/assembly_data_report.jsonl
    inflating: ncbi_dataset/data/GCA_016864655.1/GCA_016864655.1_ASM1686465v1_genomic.fna
    inflating: ncbi_dataset/data/GCA_016864655.1/genomic.gff    inflating: ncbi_dataset/data/GCA_016864655.1/cds_from_genomic.fna
    inflating: ncbi_dataset/data/GCA_016864655.1/protein.faa
    inflating: ncbi_dataset/data/GCA_016864655.1/sequence_report.jsonl
    inflating: ncbi_dataset/data/dataset_catalog.json
  Collecting 1  records [================================================] 100% 1/1
  Downloading: GCA_016864655.1.zip    41MB done
  Archive:  GCA_016864655.1.zip
    inflating: README.md
    inflating: ncbi_dataset/data/assembly_data_report.jsonl
    inflating: ncbi_dataset/data/GCA_016864655.1/GCA_016864655.1_ASM1686465v1_genomic.fna
    inflating: ncbi_dataset/data/GCA_016864655.1/genomic.gff
    error:  invalid compressed data to inflate
    inflating: ncbi_dataset/data/GCA_016864655.1/cds_from_genomic.fna
    inflating: ncbi_dataset/data/GCA_016864655.1/protein.faa
    inflating: ncbi_dataset/data/GCA_016864655.1/sequence_report.jsonl
    inflating: ncbi_dataset/data/dataset_catalog.json

Work dir:
  /nfs7/BPP/Chang_Lab/paradarc/nf_brady_N120/scripts/nf-core-plantpathsurveil/work/00/7f5d8df6693888570725145aa13835

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details

It could be that it is a glitch on the download or unzip process. I ran it again (-resume) and the download works just fine.

Command used and terminal output

No response

Relevant files

No response

System information

No response

@cahuparo cahuparo added the bug Something isn't working label Aug 10, 2023
@cahuparo cahuparo changed the title Glitch at download step Glitch at the download genome step Aug 10, 2023
@zachary-foster
Copy link
Contributor

I added code for that step to be retried up to some number of times and then just not include that reference and continue if it fails too many times. I think it helped with a lot of those random internet connection related errors.

@zachary-foster
Copy link
Contributor

Are you still seeing such errors stop the pipeline from running?

@zachary-foster
Copy link
Contributor

should be fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants