Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix logs #3

Open
wants to merge 29 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
a6404cc
Add submit script and combine oBC and mBC scripts
maurermaggie Apr 15, 2024
10d008e
Merge pull request #1 from maurermaggie/maurermaggie-bashworkflow
maurermaggie Apr 15, 2024
082d9fa
Snakemake Pipeline for MPRA Preprocessing
maurermaggie Apr 19, 2024
478b537
Merge pull request #2 from maurermaggie/SnakeMake_First_Go
maurermaggie Apr 19, 2024
c7a11eb
Rename scripts/environment.yaml to scripts/scripts/MPRA_Snakemake_Pip…
maurermaggie Apr 19, 2024
211d7b6
Rename scripts/run_snakemake.sh to scripts/MPRA_Snakemake_Pipeline/wo…
maurermaggie Apr 19, 2024
464cea9
Update run_snakemake.sh
maurermaggie Apr 19, 2024
ecdb533
Rename scripts/scripts/MPRA_Snakemake_Pipeline/workflow/environment.y…
maurermaggie Apr 19, 2024
fb6dc16
Update run_snakemake.sh
maurermaggie Apr 19, 2024
d4af282
Add config files
maurermaggie Apr 22, 2024
d252162
Update config.yaml
maurermaggie Apr 22, 2024
53773ef
Update environment.yaml
maurermaggie Apr 23, 2024
34e1067
Update run_snakemake.sh
maurermaggie Apr 23, 2024
5703ceb
Update Snakefile
maurermaggie Apr 29, 2024
ff9527c
Update run_snakemake.sh
maurermaggie Apr 29, 2024
1fd41bc
Update environment.yaml
maurermaggie Apr 29, 2024
3bc946e
Merge pull request #3 from maurermaggie/updates_broadly
maurermaggie Apr 29, 2024
5e0f224
Update cellranger.smk
maurermaggie Apr 29, 2024
37d0f69
Update clean_umis.smk
maurermaggie Apr 29, 2024
694c3ab
Update get_barcodes.smk
maurermaggie Apr 29, 2024
2648bdc
Merge pull request #4 from maurermaggie/update_rules
maurermaggie Apr 29, 2024
be1bff7
Update clean_up_UMI_counts_v3_20220126.R
maurermaggie Apr 29, 2024
22b2819
Update get_barcode_v2_fixed_pos_w_seq_check_20220201.py
maurermaggie Apr 29, 2024
8f91208
Merge pull request #5 from maurermaggie/update_scripts
maurermaggie Apr 29, 2024
dc56c9a
Update config.yaml
maurermaggie Apr 29, 2024
db208e4
Update config.yaml
maurermaggie Apr 29, 2024
b0c54c6
Merge pull request #6 from maurermaggie/update_configs
maurermaggie Apr 29, 2024
1d45b86
Update config.yaml
maurermaggie Apr 29, 2024
ba85546
Update config.yaml
maurermaggie Apr 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions scripts/MPRA_Snakemake_Pipeline/config/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
input_directory: "/your/input/directory"
output_directory: "/your/output/directory"
reference_data: "/directory/with/reference/data"
filepaths_df: "/csv/with/your/filepaths.csv"
3 changes: 3 additions & 0 deletions scripts/MPRA_Snakemake_Pipeline/config/filepaths.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
ID,Type
SRR22253236down,o
SRR22253239down,m
35 changes: 35 additions & 0 deletions scripts/MPRA_Snakemake_Pipeline/config/slurm_scg/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# helpful to give path e.g. if you write to scratch/not in current dir
cluster:
mkdir -p ../logs/{rule} &&
sbatch
--partition={resources.partition}
--account={resources.account}
--time={resources.time}
--job-name={rule}.{wildcards}
--output=../logs/{rule}/%j.{wildcards}.out
--error=../logs/{rule}/%j.{wildcards}.err
--mem-per-cpu={resources.mem}
--nodes={resources.nodes}
--cpus-per-task={resources.threads}
--parsable
default-resources:
- partition=batch
- account=smontgom
- time="02:00:00"
- mem="64G"
- nodes=1
- threads=1
latency-wait: 120
# restart-times: 3
jobs: 50
keep-going: True
rerun-incomplete: True
printshellcmds: True
scheduler: greedy
use-conda: True
# Singularity args (binds to oak)
#use-singularity: True
#singularity-args: "-B /oak:/oak"
#cluster-status: "/oak/stanford/groups/smontgom/maurertm/MPRA/MPRA_snakemake_pipeline/config/slurm_scg/status-sacct.sh"
max-status-checks-per-second: 10
#cluster-cancel: scancel
20 changes: 20 additions & 0 deletions scripts/MPRA_Snakemake_Pipeline/config/slurm_scg/status-sacct.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#!/usr/bin/env bash

# Check status of slurm jobs
jobid="$1"
if [[ "$jobid" == Submitted ]]
then
echo smk-simple-slurm: Invalid job ID: "$jobid" >&2
echo smk-simple-slurm: Did you remember to add the flag --parsable to your sbatch call? >&2
exit 1
fi
output=`sacct -j "$jobid" --format State --noheader | head -n 1 | awk '{print $1}'`
if [[ $output =~ ^(COMPLETED).* ]]
then
echo success
elif [[ $output =~ ^(RUNNING|PENDING|COMPLETING|CONFIGURING|SUSPENDED).* ]]
then
echo running
else
echo failed
fi
36 changes: 36 additions & 0 deletions scripts/MPRA_Snakemake_Pipeline/config/slurm_sherlock/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
cluster:
mkdir -p ../logs/{rule} &&
sbatch
--partition={resources.partition}
--time={resources.time}
--job-name={rule}.{wildcards}
--output=../logs/{rule}/%j.out
--error=../logs/{rule}/%j.err
--parsable
--mem={resources.mem}
--gpus-per-task={resources.gpus}
--nodes={resources.nodes}
--ntasks-per-node={resources.tasks}
--cpus-per-task={resources.threads}
default-resources:
- partition=normal,owners
- time="00:10:00"
- mem=4000
- nodes=1
- threads=1
- tasks=1
- gpus=0
latency-wait: 120
# restart-times: 3
jobs: 50
keep-going: True
rerun-incomplete: True
printshellcmds: True
scheduler: greedy
use-conda: True
# Singularity args (binds to oak)
#use-singularity: True
#singularity-args: "-B /oak:/oak"
cluster-status: "config/slurm_sherlock/status-sacct.sh"
max-status-checks-per-second: 10
cluster-cancel: scancel
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#!/usr/bin/env bash

# Check status of slurm jobs
jobid="$1"
if [[ "$jobid" == Submitted ]]
then
echo smk-simple-slurm: Invalid job ID: "$jobid" >&2
echo smk-simple-slurm: Did you remember to add the flag --parsable to your sbatch call? >&2
exit 1
fi
output=`sacct -j "$jobid" --format State --noheader | head -n 1 | awk '{print $1}'`
if [[ $output =~ ^(COMPLETED).* ]]
then
echo success
elif [[ $output =~ ^(RUNNING|PENDING|COMPLETING|CONFIGURING|SUSPENDED).* ]]
then
echo running
else
echo failed
fi
34 changes: 34 additions & 0 deletions scripts/MPRA_Snakemake_Pipeline/workflow/Snakefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
import os
import sys
import pandas as pd

###################################---Step 1: Define all Config Variables---######################################################
input_directory=config["input_directory"]
output_directory=config["output_directory"]
reference_data=config["reference_data"]
filepaths_df=config["filepaths_df"]

#########################################---Step 1: Define Wildcard---############################################################
filepaths_dataframe = pd.read_csv(filepaths_df)

IDs = list(filepaths_dataframe['ID'])
print('Running on samples:{}'.format(IDs))

########################################---Step 1: Define Dictionary---###########################################################
#this dictionary maps the sampleIDs in filepaths_dictionary to their corresponding barcode type
#these values will be passed on to get_barcodes.smk and clean_umis.smk to select further parameters
filepaths_dictionary = filepaths_dataframe.set_index("ID")['Type'].to_dict()
barcode_types = [*filepaths_dictionary.values()]

###########################################---Step 1: Define Rules--##############################################################
include: 'rules/cellranger.smk'
include: 'rules/get_barcodes.smk'
include: 'rules/clean_umis.smk'

rule all:
input:
expand(config["input_directory"] + "/" + "{ID}" + "/" + "{ID}" + "_S1_L001_R1_001.fastq", ID = IDs),
expand(config["output_directory"] + "/" + "{ID}" + "/outs/possorted_genome_bam.bam", ID = IDs),
expand(config["output_directory"] + "/" + "{ID}" + "/outs/" + "{ID}" + "_get_bc_v3.txt", ID = IDs),
expand(config["output_directory"] + "/" + "{ID}" + "/outs/" + "{ID}" + "_get_bc_v3_no_G.txt", ID = IDs),
expand(config["output_directory"] + "/" + "{ID}" + "/outs/" + "{ID}" + "_get_bc_v3_no_G_cleaned_UMI.txt", ID = IDs)
Loading