-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA memory usage continuously increases #77
Comments
Hi! I am facing the same issue. I tried replacing the CustomCascadeROIHeads with the StandardROIHeads, trying to confirm if the problem, but the same problem persists. I have the feeling that the problem is in CenterNet, but I still was not not able to pinpoint where. |
I've encountered this issue as well. It seems to happen with the two-stage CenterNet2 models. The workaround that I've found is running the model with the following versions: detectron2=v0.6, pytorch=1.8.1, python=3.6, and cuda=11.1 |
Thank you! 👍 It seemed to have solved the problem here as well! |
Dear authors,
Thank you for the great work and clean code.
I am using the CenterNet2 default configuration (from Base-CenterNet2.yaml), however, when training, I observe that the memory reserved by CUDA keeps increasing until the training fails due to CUDA OOM error. When I replace the CenterNet2 with the default RPN, the issue disappears.
I tried adding
gc.collect()
andtorch.cuda.empty_cache()
to the training loop with no success.Have you noticed such behavior in the past, or could you please provide some hints on what could be the issue? Below I also provide some reference screenshots.
Note: in my project, there are several things that differ from the abovementioned configuration: I train on 50% of COCO dataset and I use LazyConfig to initialize the model. However, I reimplemented the configuration twice and both face the same issue, so it is unlikely there is a bug in my code.
(observe that memory allocation keeps increasing on both images)
The text was updated successfully, but these errors were encountered: