Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Improve datavzrd tables #93

Merged
merged 109 commits into from
Aug 15, 2024
Merged
Show file tree
Hide file tree
Changes from 92 commits
Commits
Show all changes
109 commits
Select commit Hold shift + click to select a range
2f5322e
Add initial changes
Apr 24, 2024
675126d
First finished draft
Apr 24, 2024
35aefde
Improve datavzrd tables
May 24, 2024
1ff92de
fix wrong paths
May 24, 2024
e96e3c8
fix wrong paths another time
May 24, 2024
d150643
Add meta comparisons and tidy up
Jun 5, 2024
f87548b
Repair volcano plots
Jun 6, 2024
406af87
Remove _se from confidence interval
Jun 6, 2024
b4ba7dd
Update spia columns if no results produced
Jun 7, 2024
50806cb
Compute gene_ratio before selecting all columns
Jun 7, 2024
ddcf4db
Merge branch 'main' into locotact_p18
dlaehnemann Jun 24, 2024
c505482
fix: remove accidental empty line in datavzrd.smk
dlaehnemann Jun 24, 2024
fad56f3
fix: use lambda expression to get method wildcard into function argum…
dlaehnemann Jun 24, 2024
0fb709a
chore: make the formatter happy
dlaehnemann Jun 24, 2024
b2f9ae8
chore: clean up unneeded workflow/scripts/postprocess_spia.py
dlaehnemann Jun 24, 2024
a9dda2f
Use yte in datavzrd
Jun 26, 2024
cfb2b85
Merge branch 'main' into locotact_p18
Addimator Jun 26, 2024
cc97b93
Merge branch 'main' into locotact_p18
johanneskoester Jun 27, 2024
b3333f8
Change display of q/pvals, remove NaN rows in diffexp
Jun 27, 2024
c9ddbbe
Merge branch 'locotact_p18' of github.com:snakemake-workflows/rna-seq…
Jun 27, 2024
69085eb
Update workflow/scripts/postprocess_diffexp.py
Addimator Jun 27, 2024
e3574c1
Small fixes
Jun 28, 2024
76fc2bf
Merge branch 'locotact_p18' of github.com:snakemake-workflows/rna-seq…
Jun 28, 2024
d41b94a
fix: postprocess for all diffexp levels, use proper table for constra…
johanneskoester Jul 4, 2024
354aaac
bump datavzrd
johanneskoester Jul 4, 2024
e0d4b76
do not create pi-value sorting for genes-aggregated (no beta-values a…
johanneskoester Jul 5, 2024
baac24e
fix arg
johanneskoester Jul 5, 2024
f6cc352
remove and put debugs
Jul 8, 2024
0e56286
Merge branch 'locotact_p18' of github.com:snakemake-workflows/rna-seq…
Jul 8, 2024
2162866
Ise pd.join instead of self-implementing it
Jul 8, 2024
9ebfe31
Fix merge of dfs
Jul 8, 2024
fe53c1f
sort diffexp bei signd_pi_val prefix only
Jul 8, 2024
afd1222
Change meta comparison description
Addimator Jul 9, 2024
53a8083
Chanke links between datavzrd views
Jul 9, 2024
a115fcb
Merge branch 'locotact_p18' of github.com:snakemake-workflows/rna-seq…
Jul 9, 2024
95a0c43
Add meta comparisons
Jul 10, 2024
1570504
bug fix for meta comps
Jul 10, 2024
09e8115
Improve description of meta comps in config
Jul 10, 2024
bb02ad9
Add meta comparisons to README
Jul 10, 2024
5763ca8
Add meta_comparisons to CI config
Jul 10, 2024
4116490
Add meta comparisons to three prime test config
Jul 11, 2024
be2a426
Make sort more intuitive
Jul 12, 2024
e04ee95
Merge branch 'main' into locotact_p18
johanneskoester Jul 18, 2024
32a1a06
Cahnge path of meta_comp in config
Jul 18, 2024
1ae3869
Merge branch 'locotact_p18' of github.com:snakemake-workflows/rna-seq…
Jul 18, 2024
b8ea3dc
Do own pca in order to allow scaling
Jul 22, 2024
c10d797
change import of pca script
Jul 23, 2024
d6aa33e
make config setting for excluding nas in pca
Jul 23, 2024
5eac767
Sort tables and show sorting in description
Jul 23, 2024
bb59405
show scale in both plots in go meta comp
Jul 23, 2024
970293e
Insert dealing with meta comp description
Jul 23, 2024
304a87c
Add samples.tsv in report
Jul 23, 2024
72ac058
Add linkouts to tables in meta_comp
Jul 23, 2024
cb1ffea
fix meta comp description
Jul 23, 2024
a7adf8e
Show inputs in report
Jul 23, 2024
6e1e2ed
Show inputs in report 2
Jul 23, 2024
f2d8be0
Show signed_pi_val for diffexp
Jul 23, 2024
5f94cd0
Show input data in report
Jul 24, 2024
52e1e6e
Fix error in report
Jul 24, 2024
b2847e1
Link genes and pathways ind meta comps
Jul 24, 2024
f73fa54
Show scale on right plot of meta comp
Jul 24, 2024
e7922e7
Merging on go enrichment wrong (quick fix)
Jul 24, 2024
fd9e801
Add gene analysis for pathways
Addimator Jul 31, 2024
5af5638
snakefmt does complains
Addimator Aug 1, 2024
2d9754d
Add pca_exclude_NA to test config
Addimator Aug 1, 2024
43f3292
Merge branch 'locotact_p18' of https://github.com/snakemake-workflows…
Addimator Aug 1, 2024
6bc7936
Add orgDb to test config
Addimator Aug 1, 2024
419705b
Add orgDB and pca_NA to other test config
Addimator Aug 1, 2024
a9fbd8f
remove units and samples datavzrd config because its too specific
Addimator Aug 1, 2024
fa7fd12
Improve docu and typos
Addimator Aug 1, 2024
23c1478
improve go enrichment postprocessing
Addimator Aug 1, 2024
abba1ae
Improve meta comparision config structure
Addimator Aug 1, 2024
dbc2aaa
minor go enrichment postprocessing bug fixes
Addimator Aug 1, 2024
313ba25
improve readme
Addimator Aug 1, 2024
1843888
oops, redo samples and units
Addimator Aug 1, 2024
f312681
Make snakefmt happy
Addimator Aug 1, 2024
f56d431
Change datavzrd colum to study items
Addimator Aug 2, 2024
15802bb
Merge branch 'locotact_p18' of https://github.com/snakemake-workflows…
Addimator Aug 2, 2024
b53f96a
Revert overwrite changes in three prime config
Addimator Aug 2, 2024
284132b
Improve config comments
Addimator Aug 2, 2024
a4b357f
Resolve problems with configs
Addimator Aug 2, 2024
c6fe66c
Bug fixes
Aug 2, 2024
1a1feab
Merge branch 'locotact_p18' of github.com:snakemake-workflows/rna-seq…
Aug 2, 2024
5fbb3d8
Delete unnecessary file
Aug 5, 2024
ba7fcfb
Typo
Addimator Aug 5, 2024
896bb3d
Small fixes in config
Addimator Aug 5, 2024
6451dbd
Improve descriptions
Addimator Aug 5, 2024
0c50f34
Refactor a lot of PR remarks
Addimator Aug 9, 2024
c7a41ff
Give correct config path
Addimator Aug 9, 2024
a5f1317
Fix pca_exclude_NAs path
Addimator Aug 9, 2024
d3478de
Merge branch 'locotact_p18' of https://github.com/snakemake-workflows…
Addimator Aug 9, 2024
92d770a
fix: datavzrd_input path
Addimator Aug 9, 2024
76131ce
Change heading level of meta comparision in readme
Addimator Aug 12, 2024
5c613c8
Make string easier to read
Addimator Aug 12, 2024
ebbf539
fix species tin linkout
Aug 12, 2024
004d2ee
remove a lot of unnecessary dependencies
Addimator Aug 12, 2024
17f9711
Convert jupyter notebooks to python files
Addimator Aug 12, 2024
69e7e07
Merge branch 'locotact_p18' of https://github.com/snakemake-workflows…
Addimator Aug 12, 2024
301204e
Include samples in workflow.rst
Aug 12, 2024
b317173
Merge branch 'locotact_p18' of github.com:snakemake-workflows/rna-seq…
Aug 12, 2024
244c861
update for linting
Addimator Aug 12, 2024
1eea5b6
Remove outcommented caption for linting
Aug 12, 2024
055c4ee
Merge branch 'locotact_p18' of github.com:snakemake-workflows/rna-seq…
Aug 12, 2024
f60bf62
Apply suggestions from code review
Addimator Aug 13, 2024
c4a63b3
fix get exclude_nas
Addimator Aug 13, 2024
c827a9d
remove unnecessary line
Addimator Aug 13, 2024
9704f6a
Merge branch 'main' into locotact_p18
Addimator Aug 15, 2024
dcb1d70
fix typo
Addimator Aug 15, 2024
78f3ae2
improve descriptions
Addimator Aug 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
!config/units.tsv
!LICENSE
!README.md
local/*
resources
resources/*
results
Expand Down
27 changes: 25 additions & 2 deletions .test/config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,17 +16,25 @@ resources:
# ensembl species name
species: homo_sapiens
# ensembl release version
release: "104"
release: "112"
# genome build
build: GRCh38
# pfam release to use for annotation of domains in differential splicing analysis
pfam: "33.0"
# Choose strategy for selecting representative transcripts for each gene.
# Possible values:
# - canonical (use the canonical transcript from ensembl, only works for human at the moment)
# - mostsignificant (use the most significant transcript)
# - path/to/any/file.txt (a path to a file with ensembl transcript IDs to use;
# the user has to ensure that there is only one ID per gene given)
representative_transcripts: canonical
ontology:
# gene ontology to download, used e.g. in goatools
gene_ontology: "http://current.geneontology.org/ontology/go-basic.obo"

pca:
# If set to true, samples with NA values in the specified covariate column will be removed for PCA computation;
pca_exclude_NAs: false
Addimator marked this conversation as resolved.
Show resolved Hide resolved
labels:
# columns of sample sheet to use for PCA
- condition
Expand Down Expand Up @@ -96,11 +104,26 @@ enrichment:
# the species specified by resources -> ref -> species above
pathway_database: "panther"

meta_comparisons:
# comparison is only run if set to `true`
activate: false
# Define here the comparisons under interest
comparisons:
# Define any name for comparison. You can add as many comparisions as you want
model_X_vs_model_Y:
items:
# Define the two underlying models for the comparison. The models must be defined in the diffexp/models in the config
# items must be of form <arbitrary label>: <existing diffexp model from config>
X: model_X
Y: model_Y
# Define label for datavzrd report
label: model X vs. model Y

report:
# make this `true`, to get excel files for download in the snakemake
# report, BUT: this can drastically increase the runtime of datavzrd report
# generation, especially on larger cohorts
offer_excel: true
offer_excel: false
Addimator marked this conversation as resolved.
Show resolved Hide resolved

bootstrap_plots:
# desired false discovery rate for bootstrap plots, i.e. a lower FDR will result in fewer boxplots generated
Expand Down
28 changes: 26 additions & 2 deletions .test/three_prime/config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,6 @@ experiment:
vendor: lexogen
plot-qc: all



resources:
ref:
# ensembl species name
Expand All @@ -23,12 +21,20 @@ resources:
build: GRCh38
# pfam release to use for annotation of domains in differential splicing analysis
pfam: "33.0"
# Choose strategy for selecting representative transcripts for each gene.
# Possible values:
# - canonical (use the canonical transcript from ensembl, only works for human at the moment)
# - mostsignificant (use the most significant transcript)
# - path/to/any/file.txt (a path to a file with ensembl transcript IDs to use;
# the user has to ensure that there is only one ID per gene given)
representative_transcripts: canonical
ontology:
# gene ontology to download, used e.g. in goatools
gene_ontology: "http://current.geneontology.org/ontology/go-basic.obo"

pca:
# If set to true, samples with NA values in the specified covariate column will be removed for PCA computation.
pca_exclude_NAs: false
Addimator marked this conversation as resolved.
Show resolved Hide resolved
labels:
# columns of sample sheet to use for PCA
- condition
Expand Down Expand Up @@ -97,6 +103,24 @@ enrichment:
# pathway database to use in SPIA, needs to be available for
# the species specified by resources -> ref -> species above
pathway_database: "panther"
# OrgDB Genome wide annotation package (https://www.bioconductor.org/packages/release/BiocViews.html#___OrgDb) for the species under consideration.
# Only required if you want to have a gene analysis for your pathways. Else NA
orgDb: org.Hs.eg.db
Addimator marked this conversation as resolved.
Show resolved Hide resolved

meta_comparisons:
# comparison is only run if set to `true`
activate: false
# Define here the comparisons under interest
comparisons:
# Define any name for comparison. You can add as many comparisions as you want
model_X_vs_model_Y:
items:
# Define the two underlying models for the comparison. The models must be defined in the diffexp/models in the config
# items must be of form <arbitrary label>: <existing diffexp model from config>
X: model_X
Y: model_Y
# Define label for datavzrd report
label: model X vs. model Y

bootstrap_plots:
# desired false discovery rate for bootstrap plots, i.e. a lower FDR will result in fewer boxplots generated
Expand Down
6 changes: 6 additions & 0 deletions config/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,3 +80,9 @@ Changes to the recommendations are motivated as follows:
* `-a "r1adapter=A{18}AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC;min_overlap=3;max_error_rate=0.100000"`: We remove A{18}, as this is handled by `--poly-a`. We increase `min_overlap` to 7 and set the `max_error_rate` to the Illumina error rate of about 0.005, both to avoid spurious adapter matches being removed.
* `-g "r1adapter=AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC;min_overlap=20"`: This is not needed any more, as `-a` option will lead to complete removal of read sequence if adapter is found at the start of the read, see: https://cutadapt.readthedocs.io/en/stable/guide.html#rightmost
* `--discard-trimmed`: We omit this, as the `-a` with the adapter sequence will lead to complete read sequence removal if adapter is found at start, and the `--minimum-length` will then discard such empty reads.

#### meta comparisons
Addimator marked this conversation as resolved.
Show resolved Hide resolved
Meta comparisons allow for comparing two full models against each other.
The axes represent the log2-fold changes (beta-scores) for the two models, with each point representing a gene.
Points on the diagonal indicate no difference between the comparisons, while deviations from the diagonal suggest differences in gene expression between the treatments.
For more details see the comments in the `config.yaml`
Addimator marked this conversation as resolved.
Show resolved Hide resolved
25 changes: 23 additions & 2 deletions config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,6 @@ experiment:
# this allows to plot QC of aligned read postion for specific transcripts (or 'all' transcripts)
plot-qc: all



resources:
ref:
# ensembl species name
Expand All @@ -35,6 +33,8 @@ resources:
gene_ontology: "http://current.geneontology.org/ontology/go-basic.obo"

pca:
# If set to true, samples with NA values in the specified covariate column will be removed for PCA computation;
pca_exclude_NAs: false
Addimator marked this conversation as resolved.
Show resolved Hide resolved
labels:
# columns of sample sheet to use for PCA
- condition
Expand Down Expand Up @@ -105,6 +105,27 @@ enrichment:
# the species specified by resources -> ref -> species above
pathway_database: "reactome"

meta_comparisons:
dlaehnemann marked this conversation as resolved.
Show resolved Hide resolved
# comparison is only run if set to `true`
activate: false
# Define here the comparisons under interest
comparisons:
# Define any name for comparison. You can add as many comparisions as you want
model_X_vs_model_Y:
items:
# Define the two underlying models for the comparison. The models must be defined in the diffexp/models in the config
# items must be of form <arbitrary label for plot-axis>: <existing diffexp model from config>
X: model_X
Y: model_Y
# Define label for datavzrd report
label: model X vs. model Y

report:
# make this `true`, to get excel files for download in the snakemake
# report, BUT: this can drastically increase the runtime of datavzrd report
# generation, especially on larger cohorts
offer_excel: false

bootstrap_plots:
# desired false discovery rate for bootstrap plots, i.e. a lower FDR will result in fewer boxplots generated
FDR: 0.01
Expand Down
2 changes: 1 addition & 1 deletion config/samples.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@ sample condition batch_effect
A treated batch1
B untreated batch1
C treated batch2
D untreated batch2
D untreated batch2
2 changes: 1 addition & 1 deletion config/units.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ A 1 raw/a.chr21.1.fq raw/a.chr21.2.fq
B 1 raw/b.chr21.1.fq raw/b.chr21.2.fq
B 2 300 14 raw/b.chr21.1.fq
C 1 raw/a.chr21.1.fq raw/a.chr21.2.fq
D 1 raw/b.chr21.1.fq raw/b.chr21.2.fq
D 1 raw/b.chr21.1.fq raw/b.chr21.2.fq
1 change: 1 addition & 0 deletions workflow/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ include: "rules/diffsplice.smk"
include: "rules/enrichment.smk"
include: "rules/datavzrd.smk"
include: "rules/bam.smk"
include: "rules/meta_comparisons.smk"


rule all:
Expand Down
4 changes: 4 additions & 0 deletions workflow/envs/pandas.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
channels:
- conda-forge
dependencies:
- pandas =2.2.1
4 changes: 4 additions & 0 deletions workflow/envs/polars.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
channels:
- conda-forge
dependencies:
- polars =1.2.1
16 changes: 16 additions & 0 deletions workflow/envs/pystats.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
channels:
- conda-forge
- nodefaults
dependencies:
- polars =0.20.28
- pyreadr =0.5
- altair =5.2
- pyarrow =16.1
- vegafusion =1.6
- vegafusion-python-embed =1.6
- vl-convert-python =1.2
- jupyter_core =5.7
- ipykernel =6.29
- nbconvert =7.14
- notebook =7.0
- jupyterlab_code_formatter =1.4
dlaehnemann marked this conversation as resolved.
Show resolved Hide resolved
6 changes: 6 additions & 0 deletions workflow/report/meta_compare.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Meta comparisons for {{ snakemake.wildcards.meta_comp }}.
The axes represent the log2-fold changes (beta-scores) for the two models, with each point representing a gene.
Points on the diagonal indicate no difference between the comparisons, while deviations from the diagonal suggest differences in gene expression between the treatments.
The color encodes the corresponding q-value.
By clicking on points, their label can be displayed.
Holding the Shift key allows to select or deselect labels for multiple genes.
1 change: 1 addition & 0 deletions workflow/report/units.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Unit sheet containing all considered units, which can be multiple units for a single sample (for example, when the same biological sample was sequenced across multiple lanes and demultiplexed into separate lan-specific fastq files). The annotations in this file determine how the workflow internally handles units.
Addimator marked this conversation as resolved.
Show resolved Hide resolved
1 change: 0 additions & 1 deletion workflow/report/workflow.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,2 @@
After adapter removal with `Cutadapt <http://cutadapt.readthedocs.io>`_, transcripts were quantified with `Kallisto <https://pachterlab.github.io/kallisto/>`_.
Integrated normalization and differential expression analysis was conducted with `Sleuth <https://pachterlab.github.io/sleuth>`_ following standard procedure as outlined in the manual.
For sample metadata, see {{ snakemake.config["samples"] }}_.
dlaehnemann marked this conversation as resolved.
Show resolved Hide resolved
99 changes: 99 additions & 0 deletions workflow/resources/custom_vega_plots/circle_diagram_genes.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"width": 35,
"height": 35,
"data": {
"values": []
},
"layer": [
{
"mark": "arc",
"encoding": {
"theta": {
"field": "amount",
"type": "quantitative"
},
"color": {
"field": "category",
"type": "nominal",
"scale": {
"domain": [
"DE_genes",
"genes"
],
"range": [
"#f2e34c",
"#31a354"
]
},
"legend": null
},
"tooltip": [
{
"field": "category",
"type": "nominal"
},
{
"field": "amount",
"type": "quantitative"
}
]
}
},
{
"mark": {
"type": "text",
"baseline": "middle",
"align": "center",
"dx": 2,
"fontSize": 9,
"color": "white"
},
"encoding": {
"text": {
"field": "percentage",
"type": "quantitative",
"format": "0.2%"
}
}
},
{
"transform": [
{
"pivot": "category",
"value": "amount",
"groupby": [
"percentage"
]
}
],
"mark": "rule",
"encoding": {
"tooltip": [
{
"field": "genes",
"type": "nominal"
},
{
"field": "DE_genes",
"type": "quantitative"
}
]
},
"params": [
{
"name": "hover",
"select": {
"type": "point",
"fields": [
"percentage"
],
"nearest": true,
"on": "mouseover",
"clear": "mouseout"
}
}
]
}
]
}
Loading
Loading