Model release #1

JulianSlzr · 2020-05-29T01:28:38Z

Great work by the OpenAI team! The paper does not discuss it, so I'll be the first to ask:

What's the release plan for the model definition & weights? Will it be tiered by size, like GPT-2?

Devetec · 2020-05-29T01:48:30Z

Yep! Please respond!

minimaxir · 2020-05-29T02:07:02Z

...I'm not sure if it's even possible for the 175B model to be distributed in a reasonable manner.

The size of the 1.5B GPT-2 model was about 6GB on disk, which would imply that the 175B model is at least 700GB!

vanyacohen · 2020-05-29T02:15:17Z

I think it’s safe to say I won’t be replicating this one anytime soon

gwern · 2020-05-29T02:19:17Z

...I'm not sure if it's even possible for the 175B model to be distributed in a reasonable manner.

Sure it is. Artifacts larger than 700GB are distributed all the time. I distribute Danbooru2019 via BitTorrent & rsync and that's like 3300GB! I would not advise distributing GPT-3 via GCP/AWS buckets, to say the least, but it would be easy and cheap ($30/month) to use a dedicated server to seed a GPT-3 torrent, for example.

parasj · 2020-05-29T04:36:52Z

Not to detract from the difficulties of distributing the model, but the paper notes that training is performed in full half-precision, which would put the number of parameters at around 350GB.

Grandiferr · 2020-05-29T06:19:00Z

We need distilGPT-3!

loretoparisi · 2020-05-29T06:29:56Z

By comparison Nvidia Megatron 11B, trained by Facebook AI in fairseq is provided as 19GB tar gz file hosted on their server farm:

https://dl.fbaipublicfiles.com/fairseq/models/model_parallel/megatron_11b.tar.gz

theneuronprogrammer · 2020-05-29T08:03:06Z

dang it. It is here finally.

nlp4whp · 2020-05-29T14:26:50Z

We need distilGPT-3!

maybe we need evaporation-GPT-3

AdamDanielKing · 2020-05-29T14:35:54Z

Most of us can hardly dream of using the full model. You'd need to partition it across more than (350 GB) / (16 GB) ~ 22 GPUs just to run it! Training with the Adam optimizer (as they mention) would require at least 3 times as many (~66 GPUs), plus extra space for the activations. There are more memory-efficient optimizers though.

But there are 8 models in the paper, 4 of which are smaller than GPT-2, so some of those will probably be useful if OpenAI chooses to release them. 🙂

minimaxir · 2020-05-29T16:39:26Z

The FP16 point is good; that would mean the smaller models noted above would be even smaller than usual, which is good for everyone!

That may limit the supported hardware unless a way to cast up to FP32 is added. (likely something PyTorch can do)

poset · 2020-05-29T19:50:57Z

Fine-tuning for normal people is out of the question due to model size. Shouldn't inference still be possible if weights are loaded and applied incrementally? Especially if system rather than GPU memory is used for intermediate computations.

fredbuhl · 2020-05-29T20:19:56Z

Big gap between 13B and 175B; there's probably some sweet spots for a few folks in there if something could be made available.

AdamDanielKing · 2020-05-29T20:30:04Z

Fine-tuning for normal people is out of the question due to model size. Shouldn't inference still be possible if weights are loaded and applied incrementally? Especially if system rather than GPU memory is used for intermediate computations.

Technically you could do that, but it would be impractically slow. You'd still need at least 350 GB of RAM (some cloud instances have this) or you'd be waiting for disk -> RAM transfers of 350 GB for each token generated. For a 600 MB/s SSD that would take 10 minutes and cap the output speed at 6 tokens per hour.

With at least 350 GB of RAM the bottleneck would be RAM -> GPU transfers. If the speed is 2.3 GB/s that would take 2.5 minutes. So that caps the possible inference speed at 24 tokens per hour, or somewhere around 50 characters.

Edit: It might be faster to run fully on CPUs using >350 GB RAM than to transfer to the GPU for every token.

ugurkanates · 2020-05-30T10:04:03Z

...I'm not sure if it's even possible for the 175B model to be distributed in a reasonable manner.

The size of the 1.5B GPT-2 model was about 6GB on disk, which would imply that the 175B model is at least 700GB!

Still lower then recent Call of Duty games so.

4R7I5T · 2020-06-02T03:02:48Z

Gosh, I would really like to see something put together here to give people more access to this and tool around with it like GPT-2.

If openAI could release a cloud platform, I would gladly pay-to-play as I have disagreed with devs in the past on GPT release format. I think building a container system for language models could be the key to OpenAI making money they can reappropriate to research and also being fair to developers.

I really don’t think there is any danger in language models

AdamDanielKing mentioned this issue May 29, 2020

Add upcoming GPT-3 model huggingface/transformers#4658

Closed

4 tasks

graykode mentioned this issue Aug 23, 2020

support inference large models such as gpt-3 in storage calculation. graykode/matorage#16

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model release #1

Model release #1

JulianSlzr commented May 29, 2020

Devetec commented May 29, 2020

minimaxir commented May 29, 2020 •

edited

Loading

vanyacohen commented May 29, 2020

gwern commented May 29, 2020

parasj commented May 29, 2020

Grandiferr commented May 29, 2020

loretoparisi commented May 29, 2020

theneuronprogrammer commented May 29, 2020

nlp4whp commented May 29, 2020

AdamDanielKing commented May 29, 2020

minimaxir commented May 29, 2020

poset commented May 29, 2020 •

edited

Loading

fredbuhl commented May 29, 2020

AdamDanielKing commented May 29, 2020 •

edited

Loading

ugurkanates commented May 30, 2020

4R7I5T commented Jun 2, 2020

Model release #1

Model release #1

Comments

JulianSlzr commented May 29, 2020

Devetec commented May 29, 2020

minimaxir commented May 29, 2020 • edited Loading

vanyacohen commented May 29, 2020

gwern commented May 29, 2020

parasj commented May 29, 2020

Grandiferr commented May 29, 2020

loretoparisi commented May 29, 2020

theneuronprogrammer commented May 29, 2020

nlp4whp commented May 29, 2020

AdamDanielKing commented May 29, 2020

minimaxir commented May 29, 2020

poset commented May 29, 2020 • edited Loading

fredbuhl commented May 29, 2020

AdamDanielKing commented May 29, 2020 • edited Loading

ugurkanates commented May 30, 2020

4R7I5T commented Jun 2, 2020

minimaxir commented May 29, 2020 •

edited

Loading

poset commented May 29, 2020 •

edited

Loading

AdamDanielKing commented May 29, 2020 •

edited

Loading