Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add cuda backend for KLU #35

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

add cuda backend for KLU #35

wants to merge 9 commits into from

Conversation

joamatab
Copy link
Contributor

@joamatab joamatab commented May 22, 2024

@flaport
Copy link
Owner

flaport commented May 22, 2024

Is there a cuda compatible sparse matrix solver we can use in stead of klu here?

@joamatab
Copy link
Contributor Author

yes, @bdice used cupy for that

where can we find some benchmark code for it?

it would be great to compare CPU to GPU

@bdice
Copy link

bdice commented May 23, 2024

Hi @joamatab and @flaport -- first, thank you for your time. I met with @joamatab at PyCon as part of the Accelerated Python sprint. We discussed using a CUDA-based backend for this library. CuPy seemed like the easiest choice.

Here's a brief rundown of what this PR contains:

The new "cuda" backend requires cupy but it is not set as the default, because it is not compatible with JAX JIT and thus cannot be used for optimization. I don't know how crucial JAX JIT / differentiable backends are for the problems you typically use here.

The "cuda" backend uses CuPy, which calls into cuSolver. The cupyx.scipy.sparse.linalg.spsolve function does not support batched sparse solves. It seems like this is a common use case in sax -- but the only solution I have for now is to use a raw for loop, which may not be ideal for performance. There may be future CUDA libraries that serve this use case with a fully-batched solver, which would be able to provide further acceleration on batches of smaller sparse matrices.

The main use case I would see for this backend is to enable sparse solves when you have a small number of very large sparse matrices. For a performance evaluation, I would try this with a very large sparse matrix. I wasn't able to find an example for benchmarking this in the repository, so I haven't pursued that any further.

I also ported some of the test/example notebooks into proper tests. Running the quick start notebook caught some errors in the CUDA backend, which were easily fixable but were not covered by the existing tests. I also expanded the tests to compare the CUDA and KLU backends for the sample data provided in a test notebook.

It was great to meet @joamatab and I hope this is helpful -- I won't be able to commit significantly more time here, except to address some PR reviews. If you try it out and see good (or bad) performance, please let me know! I am interested in seeing how it performs on large sparse matrices. Please feel free to give it a try. If you try it and find it's not worth adding for any reason, I won't be offended if you close the PR. It was fun to learn about this solver and the problems you're using it for!

Best wishes to you, and thanks for maintaining this as an open-source project!

@flaport
Copy link
Owner

flaport commented May 24, 2024

Hi @bdice , Thank you so much for your contribution. Adding a CUDA backend has been something I wanted for a long time! I'm currently in a big move so I won't have much time to review this week, but rest assured that this is one of the first things on my todo list next week! I'll also add a benchmarking suite for future reference. I'm interested to see where it lands :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants