-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add two distance metrics, three-way comparison and bootstrapping #608
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much!
Left some comments.
- We might be able to speed it up with numba but it's not that important for now.
- Can we use the aggregation functions for other distances? As it is, you can pass them generally, but then it's only used in one distance? This is super weird UX. This needs improvement.
It takes about 5 seconds to calculate KDE of a group pair for the metric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am honestly not sure whether it makes to use the aggregation functions for all of them^^
I think I might have gotten it wrong. It says "pseudobulk" but I think that how you're using them in the distance now is something different? Do I get it wrong?
Co-authored-by: Lukas Heumos <[email protected]>
for more information, see https://pre-commit.ci
Co-authored-by: Lukas Heumos <[email protected]>
Co-authored-by: Lukas Heumos <[email protected]>
Co-authored-by: Eljas Roellin <[email protected]>
Co-authored-by: Eljas Roellin <[email protected]>
Co-authored-by: Eljas Roellin <[email protected]>
Co-authored-by: Eljas Roellin <[email protected]>
for more information, see https://pre-commit.ci
To be honest I copied "aggregation part" from the hackathon branch where aggregation function accepts np.mean/variance/median etc. However the problem is that I am not sure whether it makes sense to generate pseudobulk vector with variance. So for now I would like to keep only mean, median and add sum for aggregation functions in distance metrics. For mediod, I dont think it is really an aggregation method, but a clustering method? So i also removed it. What do you think? |
@wxicu all of this sounds reasonable to me. |
But.....I didn't come up with more descriptive names for the parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mergeeeeeee
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in the docs yet because the API needs an overhaul. But good enough for the figure for the manuscript.
The tests are getting slower and slower but well we need another take on that.
PR Checklist
docs
is updatedDescription of changes
Add distance metrics: mean_var_distn and mahalanobis
Technical details
Additional context
There is a question whether to store/return/accept the expensive inverse of the covariance matrix for mahalanobis metrix in the draft implementations on the hackathon branch. Do you think it make sense to save the invert in a multidimensional array? I didnt come up with a more efficient solution.