Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError #372

Closed
JosephAgim opened this issue Sep 27, 2024 · 15 comments
Closed

AssertionError #372

JosephAgim opened this issue Sep 27, 2024 · 15 comments
Labels
bug Something isn't working

Comments

@JosephAgim
Copy link

Description of the bug

filename pattern claims to be incorrect but the input on the design.csv file matches with the protocol guide and the fileneames was changed to match the names of the design.csv file

We have tried to add R1 and R2 at the end of the file names and we have also replaced the fastq file names numerical order to instead of having 1, it becomes 001 the only thing we have not tried yet is adding the _S1_L, but since the Error code does not send any output on that. Thus we focused only on the R1 and R2 and the order, example: either 001_R1 and 001_R2 or R1_001 and R2_001 both returns the same AssertionError

Command used and terminal output

Command used:
#!/bin/bash -l
#SBATCH -A naiss2023-22-922       # Project ID (adjust if needed)
#SBATCH --time=50:00:00           # Time limit in HH:MM:SS
#SBATCH -n 10                      # Number of cores
#SBATCH -J scRNAseq_analysis       # Job name
#SBATCH -e error_log.txt           # Error log
#SBATCH -o output_log.txt          # Output log
#SBATCH --mail-type=ALL            # Send email notifications
#SBATCH [email protected]  # Email to notify

# Load required modules
module load bioinfo-tools
module load Nextflow/

export NXF_SINGULARITY_CACHEDIR=/proj/naiss2024-23-81/private/vathsa/sc/PD/singularity-images
export NXF_OPTS='-Xms1g -Xmx4g'
export NXF_HOME=/proj/naiss2024-23-81/private/vathsa/sc/PD/nxf-home
export NXF_TEMP=${SNIC_TMP:-$PATH}

# Run nf-core/scrnaseq pipeline
nextflow run nf-core/scrnaseq \
  --protocol 10XV2 \
  --input /proj/naiss2024-23-81/private/vathsa/sc/PD/scSample_sheet2.csv \
  --fasta /sw/data/iGenomes/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa \
  --gtf /sw/data/iGenomes/Homo_sapiens/NCBI/GRCh38/Annotation/Genes.gencode/genes.gtf \
  --outdir /proj/naiss2024-23-81/private/vathsa/sc/PD/results \
  -profile uppmax \
  --project naiss2023-22-922 \
  --aligner cellranger \
  -resume
  --multiqc_title "snRNA-seq Analysis"


Terminal output:
# Match R1 in the filename, but only if it is followed by a non-digit or non-character
  # match "file_R1.fastq.gz", "file.R1_000.fastq.gz", etc. but
  # do not match "SRR12345", "file_INFIXR12", etc
  filename_pattern = r"([^a-zA-Z0-9])R1([^a-zA-Z0-9])"

  for i, (r1, r2) in enumerate(chunk_iter(fastqs, 2), start=1):
      # double escapes are required because nextflow processes this python 'template'
      if re.sub(filename_pattern, r"\1R2\2", r1.name) != r2.name:
          raise AssertionError(
              dedent(
                  f"""                We expect R1 and R2 of the same sample to have the same filename except for R1/R2.
                  This has been checked by replacing "R1" with "R2" in the first filename and comparing it to the second filename.
                  If you believe this check shouldn't have failed on your filenames, please report an issue on GitHub!

                  Files involved:
                      - {r1}
                      - {r2}
                  """
              )
          )
      r1.rename(fastq_all / f"{sample_id}_S1_L{i:03d}_R1_001.fastq.gz")
      r2.rename(fastq_all / f"{sample_id}_S1_L{i:03d}_R2_001.fastq.gz")

Relevant files

design.csv content:
sample,fastq_1,fastq_2
SRR26129865_1,/proj/naiss2024-23-81/private/vathsa/sc/PD/fastq/SRX21843315_SRR26129865_R1_001.fastq.gz,/proj/naiss2024-23-81/private/vathsa/sc/PD/fastq/SRX21843315_SRR26129865_R2_001.fastq.gz

the file names:
SRX21843315_SRR26129865_R1_001.fastq.gz SRX21843315_SRR26129865_R2_001.fastq.gz

System information

Nextflow version 2.7.1
Hardware: Uppmax cluster
Executor: Slurm
Container engine: Singularity
Version of nf-core/scrnaseq: 2.7.1

@JosephAgim JosephAgim added the bug Something isn't working label Sep 27, 2024
@grst
Copy link
Member

grst commented Oct 2, 2024

Hi,

thanks for reporting! Could you dig out the full .command.log file for the failed process and share it here?
I'm a bit confused, because the "terminal output" is just a code snippet from the script, not an actual error message.

In theory, it shouldn't even be necessary to change the filenames. The reason why we implemented this feature in the first place is that input filenames don't need to follow the 10x conventions.

@JosephAgim
Copy link
Author

JosephAgim commented Oct 2, 2024 via email

@grst
Copy link
Member

grst commented Oct 2, 2024

Hmm, that doesn't contain any error message at all (it seems truncated). Not sure if it was chopped off by github?

I was actually hoping to see the .command.log file of the process that failed. It's located in the work directory of that process, e.g.
/crex/proj/naiss2024-23-81/private/vathsa/sc/PD/work/f2/71f5c948cbd4bce16286fbc597e43c/.command.log

@grst grst closed this as completed Oct 2, 2024
@JosephAgim
Copy link
Author

JosephAgim commented Oct 2, 2024 via email

@grst
Copy link
Member

grst commented Oct 2, 2024

Ok, that reveals the actual error message:

Log message:
R1 and R2 reads identical in sample "SRR26129865_1" at "/scratch/50513097/nxf.vtwL0u2PTy/fastq_all"

Can you please first verify that the files as specified in the samplesheet are really not identical? Maybe this was a mistake during download or renaming.

If they are correct, then please check the fastq_all folder in that process work directory that contains the files as they were renamed by nextflow. If the issue is only there, then it could indeed be a bug in the autorename feature.

@grst grst reopened this Oct 2, 2024
@JosephAgim
Copy link
Author

JosephAgim commented Oct 2, 2024 via email

@grst
Copy link
Member

grst commented Oct 2, 2024

yeah, the filenames are different, but if you look at the actual files (e.g. using less, or just calculating md5sums), are they any different?

@JosephAgim
Copy link
Author

JosephAgim commented Oct 2, 2024 via email

@grst
Copy link
Member

grst commented Oct 2, 2024

hmm... but let's first check the input files.
What's the result of

md5sum /proj/naiss2024-23-81/private/vathsa/sc/PD/fastq/SRX21843315_SRR26129865_R1_001.fastq.gz
md5sum /proj/naiss2024-23-81/private/vathsa/sc/PD/fastq/SRX21843315_SRR26129865_R2_001.fastq.gz

@JosephAgim
Copy link
Author

JosephAgim commented Oct 2, 2024 via email

@JosephAgim
Copy link
Author

JosephAgim commented Oct 2, 2024 via email

@grst
Copy link
Member

grst commented Oct 2, 2024

ok, so SRX21843315_SRR26129865_R1_001.fastq.gz and SRX21843315_SRR26129865_R2_001.fastq.gz are the same files actually. Then it seems to be a problem with the input data rather than the pipeline.

@JosephAgim
Copy link
Author

JosephAgim commented Oct 2, 2024 via email

@grst
Copy link
Member

grst commented Oct 2, 2024

Sometimes SRA is weird and it doesn't properly return paired end reads as separate files. I had a similar case once and ended up contacting SRA support.

You could also try asking in #fetchngs on the nf-core slack if they have any insight or if it could be a bug in fetchngs.

Im pretty sure now though that the problem is not within scrnaseq, so I'm closing the issue here.

@grst grst closed this as completed Oct 2, 2024
@JosephAgim
Copy link
Author

JosephAgim commented Oct 3, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants