Add two distance metrics, three-way comparison and bootstrapping #608

wxicu · 2024-05-26T19:41:04Z

PR Checklist

Referenced issue is linked Distance Metrics Enhancement: New metrics, 3-way-comparisons, Bootstrapping #603
If you've fixed a bug or added code that should be tested, add tests!
Documentation in docs is updated

Description of changes
Add distance metrics: mean_var_distn and mahalanobis

Technical details

Additional context

There is a question whether to store/return/accept the expensive inverse of the covariance matrix for mahalanobis metrix in the draft implementations on the hackathon branch. Do you think it make sense to save the invert in a multidimensional array? I didnt come up with a more efficient solution.

for more information, see https://pre-commit.ci

Zethson

Thank you very much!

Left some comments.

We might be able to speed it up with numba but it's not that important for now.
Can we use the aggregation functions for other distances? As it is, you can pass them generally, but then it's only used in one distance? This is super weird UX. This needs improvement.

pertpy/tools/_distances/_distances.py

tests/tools/_distances/test_distances.py

wxicu · 2024-06-03T12:59:29Z

It takes about 5 seconds to calculate KDE of a group pair for the metric mean_var_disn. There are 32 groups in the test dataset so I subset to only 5 of them for speeding up.

Zethson

I am honestly not sure whether it makes to use the aggregation functions for all of them^^

I think I might have gotten it wrong. It says "pseudobulk" but I think that how you're using them in the distance now is something different? Do I get it wrong?

pertpy/tools/_distances/_distances.py

tests/tools/_distances/test_distances.py

pertpy/tools/_distances/_distances.py

Co-authored-by: Lukas Heumos <[email protected]>

for more information, see https://pre-commit.ci

Co-authored-by: Lukas Heumos <[email protected]>

Co-authored-by: Eljas Roellin <[email protected]>

for more information, see https://pre-commit.ci

wxicu · 2024-06-06T19:37:35Z

I am honestly not sure whether it makes to use the aggregation functions for all of them^^

I think I might have gotten it wrong. It says "pseudobulk" but I think that how you're using them in the distance now is something different? Do I get it wrong?

To be honest I copied "aggregation part" from the hackathon branch where aggregation function accepts np.mean/variance/median etc.
From my understanding: currently the majority of the distance metrics calculate the distance between 2 pseudobulk vectors. For those metrics, the aggregation function just provides different modes to aggregate the counts then just mean expression. For metrics which not really calculate the distance between 2 pseudobulk, , e.g. mean_var_distribution, we dont use any aggregation funtcion there.

However the problem is that I am not sure whether it makes sense to generate pseudobulk vector with variance. So for now I would like to keep only mean, median and add sum for aggregation functions in distance metrics. For mediod, I dont think it is really an aggregation method, but a clustering method? So i also removed it. What do you think?

Zethson · 2024-06-06T19:41:37Z

@wxicu all of this sounds reasonable to me.

for more information, see https://pre-commit.ci

tests/tools/test_metrics_3g.py

wxicu · 2024-06-12T21:41:35Z

But.....I didn't come up with more descriptive names for the parameters

Zethson

Good stuff!

pertpy/tools/_distances/_distances.py

tests/tools/test_metrics_3g.py

pertpy/tools/_differential_gene_expression/_dge_comparison.py

pertpy/tools/_distances/_distances.py

wxicu

mergeeeeeee

All changes have been adapted

Zethson

Not in the docs yet because the API needs an overhaul. But good enough for the figure for the manuscript.

The tests are getting slower and slower but well we need another take on that.

wxicu and others added 3 commits May 26, 2024 13:42

add two distance metrics

fd21dc1

add obsm_key param to distance test

3af7d89

[pre-commit.ci] auto fixes from pre-commit.com hooks

3fe911b

for more information, see https://pre-commit.ci

wxicu requested a review from Zethson May 26, 2024 21:45

Zethson previously requested changes May 29, 2024

View reviewed changes

wxicu and others added 4 commits June 3, 2024 00:08

add agg fct

6d419c3

speed up tests

ca86025

Merge branch 'main' into distance

0830535

add type

9fd4c2b

add description

fc71eae

wxicu requested a review from Zethson June 3, 2024 18:42

Zethson reviewed Jun 3, 2024

View reviewed changes

Zethson requested a review from eroell June 3, 2024 19:44

eroell suggested changes Jun 4, 2024

View reviewed changes

wxicu and others added 10 commits June 5, 2024 09:32

Update pertpy/tools/_distances/_distances.py

09e5fea

Co-authored-by: Lukas Heumos <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

ad23ca6

for more information, see https://pre-commit.ci

Update pertpy/tools/_distances/_distances.py

dc74884

Co-authored-by: Lukas Heumos <[email protected]>

Update pertpy/tools/_distances/_distances.py

c774cd2

Co-authored-by: Lukas Heumos <[email protected]>

Update pertpy/tools/_distances/_distances.py

d413d67

Co-authored-by: Eljas Roellin <[email protected]>

Update pertpy/tools/_distances/_distances.py

b7f2cf7

Co-authored-by: Eljas Roellin <[email protected]>

Update pertpy/tools/_distances/_distances.py

e71f81c

Co-authored-by: Eljas Roellin <[email protected]>

Update pertpy/tools/_distances/_distances.py

edaa6e6

Co-authored-by: Eljas Roellin <[email protected]>

update code

317cfd5

[pre-commit.ci] auto fixes from pre-commit.com hooks

47b4134

for more information, see https://pre-commit.ci

wxicu and others added 3 commits June 6, 2024 21:43

fix drug

d261410

[pre-commit.ci] auto fixes from pre-commit.com hooks

4fd29a0

for more information, see https://pre-commit.ci

add bootstrapping and metrics_3g

30baefe

wxicu mentioned this pull request Jun 7, 2024

test_drug_dgidb - KeyError: 'drug_claim_name' #624

Closed

wxicu changed the title ~~Add two distance metrics~~ Add two distance metrics, three-way comparison and bootstrapping Jun 7, 2024

Zethson reviewed Jun 10, 2024

View reviewed changes

tests/tools/test_metrics_3g.py Outdated Show resolved Hide resolved

wxicu added 3 commits June 10, 2024 11:36

remove test classes

57b14c3

drop test classes

78d00fa

update compare_de

052fd00

correct the comments

3a8eac6

Zethson reviewed Jun 13, 2024

View reviewed changes

wxicu added 6 commits June 13, 2024 14:02

speed tests

63ed17a

speed up tests

f9e0d36

split metrics_3g

2e65f9d

fix pre-commit

2e7acf3

pin numpy <2

69163ff

unpin numpy

67c54be

wxicu requested review from Zethson and eroell June 20, 2024 19:10

Zethson reviewed Jun 20, 2024

View reviewed changes

pertpy/tools/_differential_gene_expression/_dge_comparison.py Outdated Show resolved Hide resolved

pertpy/tools/_differential_gene_expression/_dge_comparison.py Outdated Show resolved Hide resolved

pertpy/tools/_distances/_distances.py Outdated Show resolved Hide resolved

wxicu and others added 4 commits June 20, 2024 22:02

speed up mahalanobis distance

6e32f37

use scipy to calculate mahalanobis distance

620e645

rename DGE to DGEEVAL

10d3483

Merge branch 'main' into distance

4a07252

wxicu enabled auto-merge (squash) June 24, 2024 09:41

wxicu commented Jun 24, 2024

View reviewed changes

wxicu disabled auto-merge June 24, 2024 09:46

wxicu enabled auto-merge (squash) June 24, 2024 09:49

wxicu requested a review from Zethson June 24, 2024 09:50

Zethson approved these changes Jun 24, 2024

View reviewed changes

wxicu merged commit a22aaab into main Jun 24, 2024
3 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add two distance metrics, three-way comparison and bootstrapping #608

Add two distance metrics, three-way comparison and bootstrapping #608

wxicu commented May 26, 2024 •

edited

Loading

Zethson left a comment

wxicu commented Jun 3, 2024

Zethson left a comment

wxicu commented Jun 6, 2024 •

edited

Loading

Zethson commented Jun 6, 2024

wxicu commented Jun 12, 2024

Zethson left a comment

wxicu left a comment

Zethson left a comment

Add two distance metrics, three-way comparison and bootstrapping #608

Add two distance metrics, three-way comparison and bootstrapping #608

Conversation

wxicu commented May 26, 2024 • edited Loading

Zethson left a comment

Choose a reason for hiding this comment

wxicu commented Jun 3, 2024

Zethson left a comment

Choose a reason for hiding this comment

wxicu commented Jun 6, 2024 • edited Loading

Zethson commented Jun 6, 2024

wxicu commented Jun 12, 2024

Zethson left a comment

Choose a reason for hiding this comment

wxicu left a comment

Choose a reason for hiding this comment

Zethson left a comment

Choose a reason for hiding this comment

wxicu commented May 26, 2024 •

edited

Loading

wxicu commented Jun 6, 2024 •

edited

Loading