Skip to content
This repository has been archived by the owner on Sep 19, 2020. It is now read-only.

Model release #1

Open
JulianSlzr opened this issue May 29, 2020 · 16 comments
Open

Model release #1

JulianSlzr opened this issue May 29, 2020 · 16 comments

Comments

@JulianSlzr
Copy link

Great work by the OpenAI team! The paper does not discuss it, so I'll be the first to ask:

What's the release plan for the model definition & weights? Will it be tiered by size, like GPT-2?

@Devetec
Copy link

Devetec commented May 29, 2020

Yep! Please respond!

@minimaxir
Copy link

minimaxir commented May 29, 2020

...I'm not sure if it's even possible for the 175B model to be distributed in a reasonable manner.

The size of the 1.5B GPT-2 model was about 6GB on disk, which would imply that the 175B model is at least 700GB!

@vanyacohen
Copy link

I think it’s safe to say I won’t be replicating this one anytime soon

@gwern
Copy link

gwern commented May 29, 2020

...I'm not sure if it's even possible for the 175B model to be distributed in a reasonable manner.

Sure it is. Artifacts larger than 700GB are distributed all the time. I distribute Danbooru2019 via BitTorrent & rsync and that's like 3300GB! I would not advise distributing GPT-3 via GCP/AWS buckets, to say the least, but it would be easy and cheap ($30/month) to use a dedicated server to seed a GPT-3 torrent, for example.

@parasj
Copy link

parasj commented May 29, 2020

Not to detract from the difficulties of distributing the model, but the paper notes that training is performed in full half-precision, which would put the number of parameters at around 350GB.

@Grandiferr
Copy link

We need distilGPT-3!

@loretoparisi
Copy link

By comparison Nvidia Megatron 11B, trained by Facebook AI in fairseq is provided as 19GB tar gz file hosted on their server farm:

https://dl.fbaipublicfiles.com/fairseq/models/model_parallel/megatron_11b.tar.gz

@theneuronprogrammer
Copy link

dang it. It is here finally.

@nlp4whp
Copy link

nlp4whp commented May 29, 2020

We need distilGPT-3!

maybe we need evaporation-GPT-3

@AdamDanielKing
Copy link

Most of us can hardly dream of using the full model. You'd need to partition it across more than (350 GB) / (16 GB) ~ 22 GPUs just to run it! Training with the Adam optimizer (as they mention) would require at least 3 times as many (~66 GPUs), plus extra space for the activations. There are more memory-efficient optimizers though.

But there are 8 models in the paper, 4 of which are smaller than GPT-2, so some of those will probably be useful if OpenAI chooses to release them. 🙂
image

@minimaxir
Copy link

The FP16 point is good; that would mean the smaller models noted above would be even smaller than usual, which is good for everyone!

That may limit the supported hardware unless a way to cast up to FP32 is added. (likely something PyTorch can do)

@poset
Copy link

poset commented May 29, 2020

Fine-tuning for normal people is out of the question due to model size. Shouldn't inference still be possible if weights are loaded and applied incrementally? Especially if system rather than GPU memory is used for intermediate computations.

@fredbuhl
Copy link

Big gap between 13B and 175B; there's probably some sweet spots for a few folks in there if something could be made available.

@AdamDanielKing
Copy link

AdamDanielKing commented May 29, 2020

Fine-tuning for normal people is out of the question due to model size. Shouldn't inference still be possible if weights are loaded and applied incrementally? Especially if system rather than GPU memory is used for intermediate computations.

Technically you could do that, but it would be impractically slow. You'd still need at least 350 GB of RAM (some cloud instances have this) or you'd be waiting for disk -> RAM transfers of 350 GB for each token generated. For a 600 MB/s SSD that would take 10 minutes and cap the output speed at 6 tokens per hour.

With at least 350 GB of RAM the bottleneck would be RAM -> GPU transfers. If the speed is 2.3 GB/s that would take 2.5 minutes. So that caps the possible inference speed at 24 tokens per hour, or somewhere around 50 characters.

Edit: It might be faster to run fully on CPUs using >350 GB RAM than to transfer to the GPU for every token.

@ugurkanates
Copy link

...I'm not sure if it's even possible for the 175B model to be distributed in a reasonable manner.

The size of the 1.5B GPT-2 model was about 6GB on disk, which would imply that the 175B model is at least 700GB!

Still lower then recent Call of Duty games so.

@4R7I5T
Copy link

4R7I5T commented Jun 2, 2020

Gosh, I would really like to see something put together here to give people more access to this and tool around with it like GPT-2.

If openAI could release a cloud platform, I would gladly pay-to-play as I have disagreed with devs in the past on GPT release format. I think building a container system for language models could be the key to OpenAI making money they can reappropriate to research and also being fair to developers.

I really don’t think there is any danger in language models

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests