-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minimal training setup #146
Comments
Yep we did do initial runs using OPT 1.3B and ViT-L-14. We used the following hyper-parameters (which are exactly the same as the README apart from the batch size):
We see 'signs of life' (relevant predictions for images etc.) after ~10k steps (which is around 5M samples seen of LAION 2B). You should also be able to get signs of life by training only on LAION 2B (there is a PR out to do so although I haven't tested it out yet). I am not sure what is the exact number of steps you should train for if you are just training on LAION. Lmk if there is any other details I can provide! |
Thank you very much for the amazing work, I am wondering if it is possible to make the trained OPT-1.3B model available? |
@ccliu2 Thanks for your interest! Yes we can release a 1.3B model but it will probably be in our next release as we want to make sure performance is on par with DeepMind's version. |
We've released a 3B model using MPT-1B; we ended up finding that MPT-1B had stronger performance than an OPT-1.3B backbone. Closing this for now. Thanks for your interest! |
Hi! Could you share the smallest/minimal setup you trained to get signs of life before the full 7B run? (Maybe something with the OPT 1.3B like the README suggests?)
The text was updated successfully, but these errors were encountered: