BUG：GPU calling failure when starting first clustering #743

Alchemist-Y · 2024-07-23T10:47:41Z

Describe the issue:

It is normal during the spike extracting, but when it comes to the first clustering part, kilosort will automatically choose the cpu to computer rather than the GPU.

Reproduce the bug:

GPU calling is falied during the first clustering

Error message:

No response

Version information:

kilosort: latest version; CUDA toolkit: 11.8; NVIDIA driver: windows server 2022 standard; GUP: NVIDIA RTX A2000 12GB

jacobpennington · 2024-07-23T20:43:23Z

I'm sorry, I'm having a hard time understanding the issue. What do you mean it "automatically chooses the cpu?" How are you determining that? Are you getting an error during sorting? What version of Kilosort4 is this?

Alchemist-Y · 2024-07-24T00:41:54Z

@jacobpennington Hi，I,m sorry about my stodgy description. the kilosort version is V 4.0.13. I mean, I choose the GPU as for the pytorch in the GUI, but during the clustering part, the GPU doesn't work at all observing in my task manager. And inversely, the CPU is 100% working.

jacobpennington · 2024-07-24T01:04:38Z

The task manager is not an accurate way to check GPU usage for python processes. If you want to check usage while sorting is running, you can use the nvidia-smi command in a terminal or powershell.

Alchemist-Y · 2024-07-24T06:59:18Z

@jacobpennington Hi，here I show the detail information about the GPU usage: extracting spike part and clustering part.

jacobpennington · 2024-07-25T02:25:18Z

Okay. Can you please clarify if you get an error during sorting, or think it's taking too long, or some other problem? Not all steps of the sorting process use the GPU.

Alchemist-Y · 2024-07-25T05:20:33Z

Okay. Can you please clarify if you get an error during sorting, or think it's taking too long, or some other problem? Not all steps of the sorting process use the GPU.

Yes, it takes too long to process in the first clustering part. Because in my own computer with GPU-NVIDIA 1650S 4GB, rather than in this win server, it can solve the entire sorting in about 3h, but in this environment, it takes 2 hours in the firsting clustering part and without any processing.

jacobpennington · 2024-07-27T00:31:13Z

Are you sure your pytorch installation is set up correctly? You said you're using CUDA toolkit version 11.8, but your screenshots show version 12.4. If you are using 12.4, I would recommend you try setting up a new environment using toolkit version 11.8 and then see if the issue persists. Some other users have reported difficulties using 12.4.

Alchemist-Y · 2024-08-09T15:10:56Z

@jacobpennington hi，this figure showed above means the highest edition CUDA which my server support, and what I've installed in CUDA 11.8 showed under.

jacobpennington · 2024-08-09T17:04:50Z

Ah, I see. Unfortunately, I'm not sure what else I can try to debug for you if the sorting works fine on a different machine. The next thing I would try is the following, just to make sure nothing got messed up with the installation and there aren't any conflicts with other packages:

Restart the machine.
Create a new conda environment to use only for Kilosort (it looks like you're using the base environment right now).
Follow the steps in the readme again to install Kilosort and pytorch.
Retry the sorting.

If you still get the same error after trying that, please upload kilosort4.log from the results directory from the new sorting attempt. Screenshots of the Kilosort4 GUI with that recording loaded might also be helpful, if you're using the GUI.

Alchemist-Y · 2024-08-27T03:03:01Z

Hi，jacob，here is my log. Because it takes too long to process the fisrt clustering part, so I interrupt it.
kilosort4.log

RobertoDF · 2024-08-29T00:17:20Z

I have exactly the same problem, "first clustering "extremely slow (>200 s/it) probably because no GPU usage, first 2 steps 2 it/s as expected. On Task manager I can isolate CUDA usage and it goes down at the "first clustering" step. I am calling kilosrt directly, not via spikeinterface. I reinstalled CUDA toolkit and pytorch but no improvement.

RobertoDF · 2024-08-29T23:36:35Z

Here a video of the issue
https://www.dropbox.com/scl/fi/vaipnzqoxsyrrvbkuy3o8/Untitled.m4v?rlkey=watrlnib7q6a30a3sbzqovpoc&dl=0

Alchemist-Y · 2024-08-30T02:38:21Z

Here a video of the issue https://www.dropbox.com/scl/fi/vaipnzqoxsyrrvbkuy3o8/Untitled.m4v?rlkey=watrlnib7q6a30a3sbzqovpoc&dl=0

same with yours

RobertoDF · 2024-08-31T08:30:34Z

I could solve this by using a totally fresh environment, not just uninstall>reinstall. However now I get the CUDA out of memory error 😰

Alchemist-Y changed the title ~~BUG：GPU calling failure when starting first clusting~~ BUG：GPU calling failure when starting first clustering Jul 23, 2024

Lathomas42 mentioned this issue Aug 7, 2024

Fixed bug where cuda reserved memory climbs throughout process while allocated memory stays low #758

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG：GPU calling failure when starting first clustering #743

BUG：GPU calling failure when starting first clustering #743

Alchemist-Y commented Jul 23, 2024 •

edited

Loading

jacobpennington commented Jul 23, 2024

Alchemist-Y commented Jul 24, 2024 •

edited

Loading

jacobpennington commented Jul 24, 2024

Alchemist-Y commented Jul 24, 2024

jacobpennington commented Jul 25, 2024

Alchemist-Y commented Jul 25, 2024

jacobpennington commented Jul 27, 2024 •

edited

Loading

Alchemist-Y commented Aug 9, 2024

jacobpennington commented Aug 9, 2024

Alchemist-Y commented Aug 27, 2024

RobertoDF commented Aug 29, 2024 •

edited

Loading

RobertoDF commented Aug 29, 2024

Alchemist-Y commented Aug 30, 2024

RobertoDF commented Aug 31, 2024

BUG：GPU calling failure when starting first clustering #743

BUG：GPU calling failure when starting first clustering #743

Comments

Alchemist-Y commented Jul 23, 2024 • edited Loading

Describe the issue:

Reproduce the bug:

Error message:

Version information:

jacobpennington commented Jul 23, 2024

Alchemist-Y commented Jul 24, 2024 • edited Loading

jacobpennington commented Jul 24, 2024

Alchemist-Y commented Jul 24, 2024

jacobpennington commented Jul 25, 2024

Alchemist-Y commented Jul 25, 2024

jacobpennington commented Jul 27, 2024 • edited Loading

Alchemist-Y commented Aug 9, 2024

jacobpennington commented Aug 9, 2024

Alchemist-Y commented Aug 27, 2024

RobertoDF commented Aug 29, 2024 • edited Loading

RobertoDF commented Aug 29, 2024

Alchemist-Y commented Aug 30, 2024

RobertoDF commented Aug 31, 2024

Alchemist-Y commented Jul 23, 2024 •

edited

Loading

Alchemist-Y commented Jul 24, 2024 •

edited

Loading

jacobpennington commented Jul 27, 2024 •

edited

Loading

RobertoDF commented Aug 29, 2024 •

edited

Loading