-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Model release #1
Comments
Yep! Please respond! |
...I'm not sure if it's even possible for the 175B model to be distributed in a reasonable manner. The size of the 1.5B GPT-2 model was about 6GB on disk, which would imply that the 175B model is at least 700GB! |
I think it’s safe to say I won’t be replicating this one anytime soon |
Sure it is. Artifacts larger than 700GB are distributed all the time. I distribute Danbooru2019 via BitTorrent & rsync and that's like 3300GB! I would not advise distributing GPT-3 via GCP/AWS buckets, to say the least, but it would be easy and cheap ($30/month) to use a dedicated server to seed a GPT-3 torrent, for example. |
Not to detract from the difficulties of distributing the model, but the paper notes that training is performed in full half-precision, which would put the number of parameters at around 350GB. |
We need distilGPT-3! |
By comparison Nvidia Megatron 11B, trained by Facebook AI in fairseq is provided as 19GB tar gz file hosted on their server farm: https://dl.fbaipublicfiles.com/fairseq/models/model_parallel/megatron_11b.tar.gz |
dang it. It is here finally. |
maybe we need evaporation-GPT-3 |
The FP16 point is good; that would mean the smaller models noted above would be even smaller than usual, which is good for everyone! That may limit the supported hardware unless a way to cast up to FP32 is added. (likely something PyTorch can do) |
Fine-tuning for normal people is out of the question due to model size. Shouldn't inference still be possible if weights are loaded and applied incrementally? Especially if system rather than GPU memory is used for intermediate computations. |
Big gap between 13B and 175B; there's probably some sweet spots for a few folks in there if something could be made available. |
Technically you could do that, but it would be impractically slow. You'd still need at least 350 GB of RAM (some cloud instances have this) or you'd be waiting for disk -> RAM transfers of 350 GB for each token generated. For a 600 MB/s SSD that would take 10 minutes and cap the output speed at 6 tokens per hour. With at least 350 GB of RAM the bottleneck would be RAM -> GPU transfers. If the speed is 2.3 GB/s that would take 2.5 minutes. So that caps the possible inference speed at 24 tokens per hour, or somewhere around 50 characters. Edit: It might be faster to run fully on CPUs using >350 GB RAM than to transfer to the GPU for every token. |
Still lower then recent Call of Duty games so. |
Gosh, I would really like to see something put together here to give people more access to this and tool around with it like GPT-2. If openAI could release a cloud platform, I would gladly pay-to-play as I have disagreed with devs in the past on GPT release format. I think building a container system for language models could be the key to OpenAI making money they can reappropriate to research and also being fair to developers. I really don’t think there is any danger in language models |
Great work by the OpenAI team! The paper does not discuss it, so I'll be the first to ask:
What's the release plan for the model definition & weights? Will it be tiered by size, like GPT-2?
The text was updated successfully, but these errors were encountered: