You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been interested in using your pipeline with ONT long-read data with UMIs that are generated by colleagues in my lab.
To begin with, I tried to execute the pipeline with the example data that are provided in the repository. I followed the instructions provided in the README, that is:
Clone the repository: git clone [email protected]:camcl/pipeline-umi-amplicon.git
Navigate to the cloned repository and finish the configuration and installation. I used the latest miniconda3:
cd pipeline-umi-amplicon
conda env create -f environment.yml
conda activate pipeline-umi-amplicon
cd lib && pip install . && cd ..
This ran without error and I have the following components in the conda environment:
Testing the installation with snakemake -j 1 -pr --configfile config.yml does not produce any error:
Targets: EGFR_917
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job count
------------------ -------
copy_bed 1
reads 1
seqkit_bam_acc_tsv 1
total 3
Select jobs to execute...
[Tue Sep 17 16:20:28 2024]
rule copy_bed:
input: data/example_egfr_amplicon.bed
output: example_egfr_single_read_run/targets.bed
jobid: 1
reason: Missing output files: example_egfr_single_read_run/targets.bed
wildcards: name=example_egfr_single_read_run
resources: tmpdir=/tmp
cp data/example_egfr_amplicon.bed example_egfr_single_read_run/targets.bed
[Tue Sep 17 16:20:28 2024]
Finished job 1.
1 of 3 steps (33%) done
Select jobs to execute...
[Tue Sep 17 16:20:28 2024]
rule seqkit_bam_acc_tsv:
input: example_egfr_single_read_run/align/EGFR_917_consensus.bam
output: example_egfr_single_read_run/stats/EGFR_917_consensus_size_vs_acc.tsv
jobid: 13
reason: Missing output files: example_egfr_single_read_run/stats/EGFR_917_consensus_size_vs_acc.tsv
wildcards: name=example_egfr_single_read_run, target=EGFR_917, stage=consensus
resources: tmpdir=/tmp
echo -e "Read Cluster_size Ref MapQual Acc ReadLen RefLen RefAln RefCov ReadAln ReadCov Strand MeanQual LeftClip RightClip Flags IsSec IsSup" > example_egfr_single_read_run/stats/EGFR_917_consensus_size_vs_acc.tsv && seqkit bam example_egfr_single_read_run/align/EGFR_917_consensus.bam 2>&1 | sed 's/_/ /' | tail -n +2 >> example_egfr_single_read_run/stats/EGFR_917_consensus_size_vs_acc.tsv
[Tue Sep 17 16:20:29 2024]
Finished job 13.
2 of 3 steps (67%) done
Select jobs to execute...
[Tue Sep 17 16:20:29 2024]
localrule reads:
input: example_egfr_single_read_run/targets.bed, example_egfr_single_read_run/align/EGFR_917_final.bam.bai, example_egfr_single_read_run/stats/EGFR_917_vsearch_cluster_stats.tsv, example_egfr_single_read_run/stats/EGFR_917_consensus_size_vs_acc.tsv
jobid: 0
reason: Input files updated by another job: example_egfr_single_read_run/stats/EGFR_917_consensus_size_vs_acc.tsv, example_egfr_single_read_run/targets.bed
resources: tmpdir=/tmp
[Tue Sep 17 16:20:29 2024]
Finished job 0.
3 of 3 steps (100%) done
Complete log: .snakemake/log/2024-09-17T162028.348925.snakemake.log
Without editing anything in config.yml, I ran the command snakemake -j 30 reads --configfile config.yml. All steps until the rule polish clusters complete but the execution terminates upon polishing with the following output:
[Tue Sep 17 16:42:44 2024]
Error in rule polish_clusters:
jobid: 6
input: example_egfr_single_read_run/clustering/EGFR_917/clusters_fa, example_egfr_single_read_run/clustering/EGFR_917/smolecule_clusters.fa
output: example_egfr_single_read_run/fasta/EGFR_917_consensus_tmp, example_egfr_single_read_run/fasta/EGFR_917_consensus.bam, example_egfr_single_read_run/fasta/EGFR_917_consensus.fasta
shell:
rm -rf example_egfr_single_read_run/fasta/EGFR_917_consensus_tmp
medaka smolecule --threads 30 --length 50 --depth 2 --model r941_min_high_g360 --method spoa example_egfr_single_read_run/clustering/EGFR_917/smolecule_clusters.fa example_egfr_single_read_run/fasta/EGFR_917_consensus_tmp 2> example_egfr_single_read_run/fasta/EGFR_917_consensus.bam_smolecule.log
cp example_egfr_single_read_run/fasta/EGFR_917_consensus_tmp/consensus.fasta example_egfr_single_read_run/fasta/EGFR_917_consensus.fasta
cp example_egfr_single_read_run/fasta/EGFR_917_consensus_tmp/subreads_to_spoa.bam example_egfr_single_read_run/fasta/EGFR_917_consensus.bam && cp example_egfr_single_read_run/fasta/EGFR_917_consensus_tmp/subreads_to_spoa.bam.bai example_egfr_single_read_run/fasta/EGFR_917_consensus.bam.bai
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-09-17T164240.824646.snakemake.log
The contents of the file example_egfr_single_read_run/fasta/EGFR_917_consensus.bam_smolecule.log provide more information about the error:
Traceback (most recent call last):
File "~/miniconda3/envs/pipeline-umi-amplicon/bin/medaka", line 11, in <module>
sys.exit(main())
File "~/miniconda3/envs/pipeline-umi-amplicon/lib/python3.8/site-packages/medaka/medaka.py", line 814, in main
args.func(args)
File "~/miniconda3/envs/pipeline-umi-amplicon/lib/python3.8/site-packages/medaka/smolecule.py", line 429, in main
medaka.common.mkdir_p(args.output, info='Results will be overwritten.')
File "~/miniconda3/envs/pipeline-umi-amplicon/lib/python3.8/site-packages/medaka/common.py", line 763, in mkdir_p
os.makedirs(path)
File "~/miniconda3/envs/pipeline-umi-amplicon/lib/python3.8/os.py", line 223, in makedirs
mkdir(name, mode)
FileExistsError: [Errno 17] File exists: 'example_egfr_single_read_run/clustering/EGFR_917/smolecule_clusters.fa'
What have I done wrong?
Regards,
Camille C.
The text was updated successfully, but these errors were encountered:
camcl
changed the title
FileExistError in medaka smolecule upon execution of the pipeline with the example data
FileExistsError in medaka smolecule upon execution of the pipeline with the example data
Sep 17, 2024
Hi,
I have been interested in using your pipeline with ONT long-read data with UMIs that are generated by colleagues in my lab.
To begin with, I tried to execute the pipeline with the example data that are provided in the repository. I followed the instructions provided in the README, that is:
Clone the repository:
git clone [email protected]:camcl/pipeline-umi-amplicon.git
Navigate to the cloned repository and finish the configuration and installation. I used the latest miniconda3:
This ran without error and I have the following components in the conda environment:
snakemake -j 1 -pr --configfile config.yml
does not produce any error:config.yml
, I ran the commandsnakemake -j 30 reads --configfile config.yml
. All steps until the rulepolish clusters
complete but the execution terminates upon polishing with the following output:The contents of the file
example_egfr_single_read_run/fasta/EGFR_917_consensus.bam_smolecule.log
provide more information about the error:What have I done wrong?
Regards,
Camille C.
The text was updated successfully, but these errors were encountered: