Fixed bug where cuda reserved memory climbs throughout process while allocated memory stays low #758
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This seems to be a bug in torch, however it appears that when reference some fragment of large torch arrays, even when the variable goes out of scope (Xg in cluster), if a portion of it is copied / referenced, the whole array will remain in memory, as reserved memory. You can force torch to release this memory by calling empty_cache. I am not sure if this is specific to my setup, however my system specs are:
GPU: 1080 Ti u
OS: ubuntu 20.04
Cuda: 11.8
Torch: 2.3.1+cu118
Kilosort: 0.1.dev1248+gc664741
The impact of this change is easily viewable by adding after the cluster call.
I think this is related to bugs:
#746
#670
#743
After this change I can sort a file that would fail 100% of the time without this change. when reverting it fails again. My GPU memory consumption actually is drastically lower using this change.