-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG:GPU calling failure when starting first clustering #743
Comments
I'm sorry, I'm having a hard time understanding the issue. What do you mean it "automatically chooses the cpu?" How are you determining that? Are you getting an error during sorting? What version of Kilosort4 is this? |
@jacobpennington Hi,I,m sorry about my stodgy description. the kilosort version is V 4.0.13. I mean, I choose the GPU as for the pytorch in the GUI, but during the clustering part, the GPU doesn't work at all observing in my task manager. And inversely, the CPU is 100% working. |
The task manager is not an accurate way to check GPU usage for python processes. If you want to check usage while sorting is running, you can use the |
@jacobpennington Hi,here I show the detail information about the GPU usage: extracting spike part and clustering part. |
Okay. Can you please clarify if you get an error during sorting, or think it's taking too long, or some other problem? Not all steps of the sorting process use the GPU. |
Yes, it takes too long to process in the first clustering part. Because in my own computer with GPU-NVIDIA 1650S 4GB, rather than in this win server, it can solve the entire sorting in about 3h, but in this environment, it takes 2 hours in the firsting clustering part and without any processing. |
Are you sure your pytorch installation is set up correctly? You said you're using CUDA toolkit version 11.8, but your screenshots show version 12.4. If you are using 12.4, I would recommend you try setting up a new environment using toolkit version 11.8 and then see if the issue persists. Some other users have reported difficulties using 12.4. |
@jacobpennington hi,this figure showed above means the highest edition CUDA which my server support, and what I've installed in CUDA 11.8 showed under. |
Ah, I see. Unfortunately, I'm not sure what else I can try to debug for you if the sorting works fine on a different machine. The next thing I would try is the following, just to make sure nothing got messed up with the installation and there aren't any conflicts with other packages:
If you still get the same error after trying that, please upload |
Hi,jacob,here is my log. Because it takes too long to process the fisrt clustering part, so I interrupt it. |
I have exactly the same problem, "first clustering "extremely slow (>200 s/it) probably because no GPU usage, first 2 steps 2 it/s as expected. On Task manager I can isolate CUDA usage and it goes down at the "first clustering" step. I am calling kilosrt directly, not via spikeinterface. I reinstalled CUDA toolkit and pytorch but no improvement. |
Here a video of the issue |
same with yours |
I could solve this by using a totally fresh environment, not just uninstall>reinstall. However now I get the CUDA out of memory error 😰 |
Describe the issue:
It is normal during the spike extracting, but when it comes to the first clustering part, kilosort will automatically choose the cpu to computer rather than the GPU.
Reproduce the bug:
Error message:
No response
Version information:
kilosort: latest version; CUDA toolkit: 11.8; NVIDIA driver: windows server 2022 standard; GUP: NVIDIA RTX A2000 12GB
The text was updated successfully, but these errors were encountered: