Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimal training setup #146

Closed
sachit-menon opened this issue Apr 1, 2023 · 4 comments
Closed

Minimal training setup #146

sachit-menon opened this issue Apr 1, 2023 · 4 comments

Comments

@sachit-menon
Copy link

Hi! Could you share the smallest/minimal setup you trained to get signs of life before the full 7B run? (Maybe something with the OPT 1.3B like the README suggests?)

@anas-awadalla
Copy link
Collaborator

anas-awadalla commented Apr 1, 2023

Yep we did do initial runs using OPT 1.3B and ViT-L-14.

We used the following hyper-parameters (which are exactly the same as the README apart from the batch size):

  • Batch size (LAION 2B): 512
  • Batch size (MMC4): 256
  • loss_multiplier_laion: 0.2
  • lr_scheduler: constant
  • warmup_steps: 5000
  • mmc4_textsim_threshold: 30
  • use_media_placement_augmentation

We see 'signs of life' (relevant predictions for images etc.) after ~10k steps (which is around 5M samples seen of LAION 2B). You should also be able to get signs of life by training only on LAION 2B (there is a PR out to do so although I haven't tested it out yet). I am not sure what is the exact number of steps you should train for if you are just training on LAION. Lmk if there is any other details I can provide!

@ccliu2
Copy link

ccliu2 commented Apr 4, 2023

Thank you very much for the amazing work, I am wondering if it is possible to make the trained OPT-1.3B model available?

@anas-awadalla
Copy link
Collaborator

anas-awadalla commented Apr 4, 2023

@ccliu2 Thanks for your interest! Yes we can release a 1.3B model but it will probably be in our next release as we want to make sure performance is on par with DeepMind's version.

@i-gao
Copy link
Collaborator

i-gao commented Jun 30, 2023

We've released a 3B model using MPT-1B; we ended up finding that MPT-1B had stronger performance than an OPT-1.3B backbone. Closing this for now. Thanks for your interest!

@i-gao i-gao closed this as completed Jun 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants