Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: datavzrd wrapper v3.12.1, offer-excel configurable, free disk space for CI, dynamic sleuth_init mem_mb, pure download rules as localrules #92

Merged
merged 38 commits into from
Jun 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
994ec02
perf: update datavzrd to wrapper `v3.10.2` datavzrd.smk
dlaehnemann May 13, 2024
5770b83
perf: switch diffexp_datavzrd to new wrapper structure
dlaehnemann May 22, 2024
a26cbff
remove comments that make snakefmt choke
dlaehnemann May 22, 2024
cf1256e
turn yte template rendering on in template
dlaehnemann May 22, 2024
50671a6
perf: make offer-excel in diffexp-template.yaml configurable
dlaehnemann May 23, 2024
0eb2a0e
perf: make offer-excel in diffexp-template.yaml configurable
dlaehnemann May 23, 2024
888706a
perf: make offer-excel in diffexp-template.yaml configurable
dlaehnemann May 23, 2024
accaaa7
perf: make offer-excel in diffexp-template.yaml configurable
dlaehnemann May 23, 2024
7065042
fix: chaining in config.get() statements
dlaehnemann May 23, 2024
cab060b
fix: datavzrd template python code
dlaehnemann May 27, 2024
d5e7efa
perf: dynamic threads-dependent mem_mb for sleuth_init
dlaehnemann May 28, 2024
082b03f
fix: put back accidental deletion
dlaehnemann May 28, 2024
a75ce40
snakefmt
dlaehnemann May 28, 2024
8378a38
perf: increase per-thread memory for sleuth_init, because it failed o…
dlaehnemann Jun 4, 2024
4fe9edb
fix: remove disfunctional is_activated helper
dlaehnemann Jun 5, 2024
66798de
fix: typo
dlaehnemann Jun 5, 2024
940a4d4
fix: clean up space in GitHub Actions CI runner containers, to avoid …
dlaehnemann Jun 5, 2024
4b30f88
perf: switch to very latest version of datavzrd (wrapper and tool)
dlaehnemann Jun 5, 2024
cc2d4e2
perf: move spia and go term enrichment to new datavzrd wrapper (one r…
dlaehnemann Jun 5, 2024
905d8b6
docs: update configs to warn of excel performance penalty
dlaehnemann Jun 5, 2024
c6907ac
chore: snakefmt
dlaehnemann Jun 5, 2024
7d4e851
Merge branch 'main' into perf/update-datavzrd-wrapper-to-3-10-2
dlaehnemann Jun 5, 2024
847784f
fix: try updating isoformsiwtchanalyzer to get bug fix (https://githu…
dlaehnemann Jun 6, 2024
0004c62
Merge branch 'perf/update-datavzrd-wrapper-to-3-10-2' of github.com:s…
dlaehnemann Jun 6, 2024
6d7d8f8
fix: also update tidyverse in isoformswitchanalyzer env
dlaehnemann Jun 6, 2024
16c43a3
fix: remove r-base pinning in isoform-switch-analyzer.yaml env, as th…
dlaehnemann Jun 6, 2024
bb78ca3
fix: parse the config["report"]["offer_excel"] entry in all datavzrd …
dlaehnemann Jun 6, 2024
49e080b
chore: switch to lookup syntax for offer_excel, requires snakemake 8.13
dlaehnemann Jun 6, 2024
194e34c
chore: snakefmt
dlaehnemann Jun 6, 2024
0b5dddf
fix: avoid regular reruns of rules that use `scripts/common.R`, by lo…
dlaehnemann Jun 12, 2024
5a0b878
perf: dynamic mem_mb request for `rule init_isoform_switch`
dlaehnemann Jun 12, 2024
8fbe1ab
perf: make pure download rules `localrule: true`
dlaehnemann Jun 12, 2024
5efab1a
patch: turn off diffsplice / isoformswitchanalyzer testing to merge t…
dlaehnemann Jun 12, 2024
1552a4a
patch: flip forgotten diffsplice testing siwtch to false
dlaehnemann Jun 12, 2024
93443f8
patch: Ensembl release 111 is currently down (jan2024.archive.ensembl…
dlaehnemann Jun 12, 2024
41b9b3f
fix: enforce boolean type for config["diffsplice"]["acitvate"] and co…
dlaehnemann Jun 13, 2024
be89257
perf: update datavzrd wrapper to v3.12.1
dlaehnemann Jun 13, 2024
a3b40ec
Update workflow/rules/datavzrd.smk
dlaehnemann Jun 13, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,22 @@ jobs:
- formatting
steps:

- name: Free Disk Space (Ubuntu)
uses: jlumbroso/[email protected]
with:
# this might remove tools that are actually needed,
# if set to "true" but frees about 6 GB
tool-cache: false

# all of these default to true, but feel free to set to
# "false" if necessary for your workflow
android: true
dotnet: true
haskell: true
large-packages: true
docker-images: false
swap-storage: true

- name: Checkout repository
uses: actions/checkout@v3
with:
Expand All @@ -88,6 +104,22 @@ jobs:
- formatting
steps:

- name: Free Disk Space (Ubuntu)
uses: jlumbroso/[email protected]
with:
# this might remove tools that are actually needed,
# if set to "true" but frees about 6 GB
tool-cache: false

# all of these default to true, but feel free to set to
# "false" if necessary for your workflow
android: true
dotnet: true
haskell: true
large-packages: true
docker-images: false
swap-storage: true

- name: Checkout repository
uses: actions/checkout@v3

Expand Down
10 changes: 8 additions & 2 deletions .test/config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ diffexp:
genelist: "resources/gene_list.tsv"

diffsplice:
activate: true
activate: false
# codingCutoff parameter of isoformSwitchAnalyzer, see
# https://rdrr.io/bioc/IsoformSwitchAnalyzeR/man/analyzeCPAT.html
coding_cutoff: 0.725
Expand Down Expand Up @@ -93,6 +93,12 @@ enrichment:
# the species specified by resources -> ref -> species above
pathway_database: "panther"

report:
# make this `true`, to get excel files for download in the snakemake
# report, BUT: this can drastically increase the runtime of datavzrd report
# generation, especially on larger cohorts
offer_excel: true
dlaehnemann marked this conversation as resolved.
Show resolved Hide resolved

bootstrap_plots:
# desired false discovery rate for bootstrap plots, i.e. a lower FDR will result in fewer boxplots generated
FDR: 0.01
Expand Down Expand Up @@ -133,4 +139,4 @@ params:
# of expected adapter matches by chance
cutadapt-pe:
adapters: "-a ACGGATCGATCGATCGATCGAT -g GGATCGATCGATCGATCGAT -A ACGGATCGATCGATCGATCGAT -G GGATCGATCGATCGATCGAT"
extra: "--minimum-length 33 -e 0.005 --overlap 7"
extra: "--minimum-length 33 -e 0.005 --overlap 7"
4 changes: 2 additions & 2 deletions .test/three_prime/config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ resources:
# ensembl species name
species: homo_sapiens
# ensembl release version
release: "111"
release: "112"
# genome build
build: GRCh38
# pfam release to use for annotation of domains in differential splicing analysis
Expand Down Expand Up @@ -63,7 +63,7 @@ diffexp:
genelist: "resources/gene_list.tsv"

diffsplice:
activate: true
activate: false
# codingCutoff parameter of isoformSwitchAnalyzer, see
# https://rdrr.io/bioc/IsoformSwitchAnalyzeR/man/analyzeCPAT.html
coding_cutoff: 0.725
Expand Down
10 changes: 8 additions & 2 deletions config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ resources:
# ensembl species name
species: homo_sapiens
# ensembl release version
release: "104"
release: "112"
# genome build
build: GRCh38
# pfam release to use for annotation of domains in differential splicing analysis
Expand Down Expand Up @@ -71,7 +71,7 @@ diffexp:
genelist: "resources/gene_list.tsv"

diffsplice:
activate: true
activate: false
# codingCutoff parameter of isoformSwitchAnalyzer, see
# https://rdrr.io/bioc/IsoformSwitchAnalyzeR/man/analyzeCPAT.html
coding_cutoff: 0.725
Expand Down Expand Up @@ -117,6 +117,12 @@ plot_vars:
# significance level used for plot_vars() plots
sig_level: 0.1

report:
# make this `true`, to get excel files for download in the snakemake
# report, BUT: this can drastically increase the runtime of datavzrd report
# generation, especially on larger cohorts
offer_excel: false

params:
#For reads that are produced by 3’-end sequencing, the --single-overhang option does not discard
#reads where the expected fragment size goes beyond the transcript start
Expand Down
2 changes: 1 addition & 1 deletion workflow/Snakefile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from snakemake.utils import min_version

min_version("7.17.0")
min_version("8.13.0")


configfile: "config/config.yaml"
Expand Down
7 changes: 3 additions & 4 deletions workflow/envs/isoform-switch-analyzer.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ channels:
- bioconda
- nodefaults
dependencies:
- bioconductor-isoformswitchanalyzer =1.14
- bioconductor-rhdf5 =2.36
- r-tidyverse =1.3
- r-base =4.1
- bioconductor-isoformswitchanalyzer =2.2.0
- bioconductor-rhdf5 =2.46.1
- r-tidyverse =2.0
12 changes: 7 additions & 5 deletions workflow/resources/datavzrd/diffexp-template.yaml
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
__use_yte__: true

name: ?f"Differential expression analysis for model {wildcards.model}"
datasets:
genes_representative:
path: ?input.genes_representative
offer-excel: true
offer-excel: ?params.offer_excel
links:
link to transcripts:
column: ens_gene
Expand All @@ -19,19 +21,19 @@ datasets:
separator: "\t"
transcripts:
path: ?input.transcripts
offer-excel: true
offer-excel: ?params.offer_excel
links:
link to genes representative:
column: ens_gene
table-row: genes_representative/ens_gene
separator: "\t"
genes_aggregated:
path: ?input.genes_aggregated
offer-excel: true
offer-excel: ?params.offer_excel
separator: "\t"
logcount_matrix:
path: ?input.logcount_matrix
offer-excel: true
offer-excel: ?params.offer_excel
links:
link to transcripts:
column: transcript
Expand Down Expand Up @@ -292,4 +294,4 @@ views:
domain:
- -1
- 0
- 1
- 1
6 changes: 4 additions & 2 deletions workflow/resources/datavzrd/go-enrichment-template.yaml
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
__use_yte__: true

name: ?f"Gene ontology (GO) term enrichment analysis performed by goatools {wildcards.model}"
datasets:
significant_terms:
path: ?input.significant_terms
offer-excel: true
offer-excel: ?params.offer_excel
separator: "\t"
go_enrichment:
path: ?input.enrichment
offer-excel: true
offer-excel: ?params.offer_excel
separator: "\t"
default-view: significant_terms
views:
Expand Down
4 changes: 3 additions & 1 deletion workflow/resources/datavzrd/spia-template.yaml
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
__use_yte__: true

name: ?f"spia pathway impact analysis for model {wildcards.model}"
datasets:
spia_table:
path: ?input.spia_table
offer-excel: true
offer-excel: ?params.offer_excel
separator: "\t"
default-view: spia_table
views:
Expand Down
6 changes: 1 addition & 5 deletions workflow/rules/common.smk
Original file line number Diff line number Diff line change
Expand Up @@ -80,10 +80,6 @@ def check_config():
check_config()


def is_activated(config_element):
return config_element["activate"] in {"true", "True"}


def get_model(wildcards):
dlaehnemann marked this conversation as resolved.
Show resolved Hide resolved
if wildcards.model == "all":
return {"full": None}
Expand Down Expand Up @@ -360,7 +356,7 @@ def all_input(wildcards):
)
)

if is_activated(config["diffsplice"]):
if config["diffsplice"]["activate"]:
johanneskoester marked this conversation as resolved.
Show resolved Hide resolved
# diffsplice analysis
wanted_input.extend(
expand(
Expand Down
78 changes: 19 additions & 59 deletions workflow/rules/datavzrd.smk
Original file line number Diff line number Diff line change
@@ -1,56 +1,6 @@
rule render_datavzrd_config_spia:
input:
template=workflow.source_path("../resources/datavzrd/spia-template.yaml"),
spia_table="results/tables/pathways/{model}.pathways.tsv",
output:
"results/datavzrd/spia/{model}.yaml",
log:
"logs/yte/render-datavzrd-config-spia/{model}.log",
params:
pathway_db=config["enrichment"]["spia"]["pathway_database"],
template_engine:
"yte"


rule render_datavzrd_config_diffexp:
input:
template=workflow.source_path("../resources/datavzrd/diffexp-template.yaml"),
logcount_matrix="results/tables/logcount-matrix/{model}.logcount-matrix.tsv",
transcripts="results/tables/diffexp/{model}.transcripts.diffexp.tsv",
genes_aggregated="results/tables/diffexp/{model}.genes-aggregated.diffexp.tsv",
genes_representative="results/tables/diffexp/{model}.genes-representative.diffexp.tsv",
volcano_plots="results/plots/interactive/volcano/{model}.vl.json",
output:
"results/datavzrd/diffexp/{model}.yaml",
params:
samples=get_model_samples,
log:
"logs/yte/render-datavzrd-config-diffexp/{model}.log",
template_engine:
"yte"


rule render_datavzrd_config_go_enrichment:
input:
template=workflow.source_path(
"../resources/datavzrd/go-enrichment-template.yaml"
),
enrichment="results/tables/go_terms/{model}.go_term_enrichment.gene_fdr_{gene_fdr}.go_term_fdr_{go_term_fdr}.tsv",
significant_terms="results/tables/go_terms/{model}.go_term_enrichment.gene_fdr_{gene_fdr}.go_term_fdr_{go_term_fdr}.sig_terms.tsv",
genes_representative="results/tables/diffexp/{model}.genes-representative.diffexp.tsv",
output:
"results/datavzrd/go_terms/{model}_{gene_fdr}.go_term_fdr_{go_term_fdr}.yaml",
params:
samples=get_model_samples,
log:
"logs/yte/render-datavzrd-config-go_terms/{model}_{gene_fdr}.go_term_fdr_{go_term_fdr}.log",
template_engine:
"yte"


rule spia_datavzrd:
input:
config="results/datavzrd/spia/{model}.yaml",
config=workflow.source_path("../resources/datavzrd/spia-template.yaml"),
# files required for rendering the given configs
spia_table="results/tables/pathways/{model}.pathways.tsv",
output:
Expand All @@ -64,13 +14,17 @@ rule spia_datavzrd:
),
log:
"logs/datavzrd-report/spia-{model}/spia-{model}.log",
params:
offer_excel=lookup(within=config, dpath="report/offer_excel", default=False),
pathway_db=config["enrichment"]["spia"]["pathway_database"],
wrapper:
"v3.5.2/utils/datavzrd"
"v3.12.1/utils/datavzrd"


rule diffexp_datavzrd:
input:
config="results/datavzrd/diffexp/{model}.yaml",
config=workflow.source_path("../resources/datavzrd/diffexp-template.yaml"),
# optional files required for rendering the given config
logcount_matrix="results/tables/logcount-matrix/{model}.logcount-matrix.tsv",
transcripts="results/tables/diffexp/{model}.transcripts.diffexp.tsv",
genes_aggregated="results/tables/diffexp/{model}.genes-aggregated.diffexp.tsv",
Expand All @@ -85,19 +39,22 @@ rule diffexp_datavzrd:
patterns=["index.html"],
labels={"model": "{model}"},
),
params:
model=get_model,
log:
"logs/datavzrd-report/diffexp.{model}/diffexp.{model}.log",
params:
extra="",
model=get_model,
offer_excel=lookup(within=config, dpath="report/offer_excel", default=False),
samples=get_model_samples,
wrapper:
"v3.5.2/utils/datavzrd"
"v3.12.1/utils/datavzrd"


rule go_enrichment_datavzrd:
input:
config="results/datavzrd/go_terms/{model}_{gene_fdr}.go_term_fdr_{go_term_fdr}.yaml",
config=workflow.source_path("../resources/datavzrd/go-enrichment-template.yaml"),
significant_terms="results/tables/go_terms/{model}.go_term_enrichment.gene_fdr_{gene_fdr}.go_term_fdr_{go_term_fdr}.sig_terms.tsv",
enrichment="results/tables/go_terms/{model}.go_term_enrichment.gene_fdr_{gene_fdr}.go_term_fdr_{go_term_fdr}.tsv",
sig_go="results/tables/go_terms/{model}.go_term_enrichment.gene_fdr_{gene_fdr}.go_term_fdr_{go_term_fdr}.sig_terms.tsv",
output:
report(
directory(
Expand All @@ -116,5 +73,8 @@ rule go_enrichment_datavzrd:
),
log:
"logs/datavzrd-report/go_enrichment-{model}/go_enrichment-{model}_{gene_fdr}.go_term_fdr_{go_term_fdr}.log",
params:
offer_excel=lookup(within=config, dpath="report/offer_excel", default=False),
samples=get_model_samples,
wrapper:
"v3.5.2/utils/datavzrd"
"v3.12.1/utils/datavzrd"
3 changes: 3 additions & 0 deletions workflow/rules/diffexp.smk
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,9 @@ rule sleuth_init:
group:
"sleuth-init"
threads: 6
resources:
# based on: https://github.com/pachterlab/sleuth/issues/139#issuecomment-331157007
mem_mb=lambda wc, threads: threads * 8000,
script:
"../scripts/sleuth-init.R"

Expand Down
2 changes: 2 additions & 0 deletions workflow/rules/diffsplice.smk
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ rule init_isoform_switch:
seq_dir=lambda _, output: os.path.dirname(output.seqs[0]),
min_effect_size=config["diffsplice"]["min_effect_size"],
fdr=config["diffsplice"]["fdr"],
resources:
mem_mb=lambda wc, input: 3 * input.size_mb,
script:
"../scripts/isoform-switch-analysis-init.R"

Expand Down
Loading
Loading