-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timeout during finetuning #26
Comments
I haven't tested the videos with a large dataset. So I hanven't encountered the problem you've said. When using large dataset with image dataset, it doesn't happen so it looks like some kind of video preprocessing problem. I'll look look into it and let you know when I get it. Thanks for the issue. Also does the memory run out when the training are in the middle of the process? Does it looks like a memory leak? |
Thank you for the reply! Yes, the memory only runs out in the middle of the training. At the beginning it was always fine. I set bs=8 per gpu, grad accum=1 or 2. I use Valley dataset, containing 702K video data. Training one epoch, it got time out around 50% -- 80% training iterations, with increasing memory usage on GPU. I use deepspeed zero3. |
Can You see if the resolution of the each video is different? |
Let me have a try! I will get back to you later, thanks! |
I tried |
You can decrease the num_frames maybe. Also the 4 for the num_crops is the best hyperparameter in multi-image/video. |
Hi,
Thanks for sharing the code. I'm using it to fine-tune on videos by freezing the visual encoder and projector, and tuning the LLM. Initially, everything works well, but as training progresses, I notice that GPU memory usage keeps increasing. I'm using 8 H100s, but eventually, the process times out due to running out of memory. Have you encountered this issue before? Any insights you might have would be greatly appreciated. Thank you!
The text was updated successfully, but these errors were encountered: