Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Trees for data structures #37

Open
maxcw opened this issue Apr 30, 2020 · 1 comment
Open

Use Trees for data structures #37

maxcw opened this issue Apr 30, 2020 · 1 comment
Labels
enhancement New feature of request

Comments

@maxcw
Copy link

maxcw commented Apr 30, 2020

It looks like all distances are currently being calculated, which is expensive. Borrowing from sklearn, BallTree and KDTree could be used to speed up nearest neighbor calculations.

@vc1492a
Copy link
Owner

vc1492a commented May 1, 2020

Thanks @maxcw! There is a parallel effort for integrating Local Outlier Probabilities (LoOP) into scikit-learn, see this pull request.

It was something I was working on some time ago but haven't updated due to lack of time and interest from others. PyNomaly aims to be a standalone library which minimizes the number of required dependencies. There are ways to improve the speed of the current code for nearest neighbor calculations (like the parallelism issue you opened), but integrating scikit-learn capability into PyNomaly is not something that's currently in scope. It may be a better idea to update the PR with refreshed code that enables LoOP in scikit-learn (and thus the fast nearest neighbor calculations). If you want to contribute to that PR, I would be willing to jump back in and do it together. I don't think adding a dependency on scikit-learn is the right approach - however open to other suggestions as to how to improve the speed of the nearest neighbor calculations using trees as the data structures.

Not sure if you are aware or if this helps your work, but when using PyNomaly you can always bring your own distance and neighbor matrix as is shown in the examples in the readme, meaning using external libraries to calculate the distance and neighbor matrix is an option - simply provide those to PyNomaly thereafter.

@vc1492a vc1492a added the enhancement New feature of request label May 5, 2020
@vc1492a vc1492a added this to the Lightning Speed milestone Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature of request
Projects
None yet
Development

No branches or pull requests

2 participants