Add upcoming GPT-3 model #4658

stefan-it · 2020-05-29T01:00:46Z

🌟 New model addition

Model description

The GPT-3 paper just landed on ArXiv: https://arxiv.org/abs/2005.14165.

Would be great to integrate it into Transformers, whenever models are available.

Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.

Open source status

GitHub repository is available: here
the model implementation is available: (give details)
the model weights are available: (give details)
who are the authors: (mention them, if possible by @gh-username)

LouisCastricato · 2020-05-29T01:44:22Z

My god, the paper hasn't even been up for a day...

Said being, +1

moinnadeem · 2020-05-29T02:34:49Z

So who can run 175B parameters and what do I have to do for a favor?

AdamDanielKing · 2020-05-29T04:54:33Z

The full model will be at least 350 GB (16-bit parameters). You'd need to partition it across more than (350 GB) / (16 GB) ~ 22 GPUs just to run it! Not to mention the egress costs of making a model that size available.

Of course, the paper shows 8 different-sized models, 4 of which are smaller than GPT-2, so some of those could be practical. 🙂

GraphGrailAi · 2020-05-29T13:06:42Z

Is there any Colab to test at least GPT-3 XL ?

AdamDanielKing · 2020-05-29T14:37:03Z

Is there any Colab to test at least GPT-3 XL ?

They haven't released any code or pretrained models yet. See the issue on the official repo: openai/gpt-3#1

minimaxir · 2020-05-29T16:37:10Z

Note that the released models may be FP16, which may require forcing FP16 for use/finetuning (and therefore hardware-limited), or casting up to FP32.

andifunke · 2020-05-29T22:05:20Z

Of course, the paper shows 8 different-sized models, 4 of which are smaller than GPT-2, so some of those could be practical. slightly_smiling_face

One of the main benefits of the smaller gpt-3 models compared to their gpt-2 counterparts could be the increased context length of 2048 tokens.

enzoampil · 2020-05-30T09:28:23Z

Yeah, personally, I wouldn't be able to use the models that won't fit in a Tesla P100

AdamDanielKing · 2020-06-06T16:21:39Z

The GPT-3 repo is now archived (read-only) so perhaps OpenAI isn't planning on releasing anything this time around.

ljlueloff · 2020-06-11T19:44:35Z

The GPT-3 repo is now archived (read-only) so perhaps OpenAI isn't planning on releasing anything this time around.

That is a crying shame, because my system could do-er... :(

flarn2006 · 2020-06-11T20:19:38Z

Hopefully they have a better excuse than last time.

ljlueloff · 2020-06-11T20:24:53Z

Hopefully they have a better excuse than last time.

@flarn2006 You mean the....ooohhhh we created something scary and have soggy diapers excuse with GPT-3?

ljlueloff · 2020-06-11T20:32:02Z

@flarn2006 If they don't make excuses or drag their feet, and I finish my system build in a relatively congruent time frame, hopefully I can help...

fen0s · 2020-06-20T07:48:27Z

A little update: OpenAI's now running their own API with GPT-3 on it. https://beta.openai.com
You can apply for access, but seems like they're aiming mostly for big companies, not researchers. Sad, way too sad.

stefan-it · 2020-06-20T09:09:43Z

But who put the "Open" in OpenAI then 🤔

yassineAlouini · 2020-07-19T19:37:12Z

I guess we will need to "fundraise" enough GPU-compute to run the GPT3 model. 😄

fen0s · 2020-07-20T09:11:10Z

It should be possible to run lower-models on regular GPUs, like 1b model. But we don't have the model itself, and seems that OpenAI is against releasing it and would rather commercialize it :(

sagarreddypatil · 2020-07-27T20:10:25Z

I wonder if you could hardcode the 175B model into an electronic chip(like an ASIC but more specific)

yassineAlouini · 2020-08-01T19:17:35Z

I wonder if you could hardcode the 175B model into an electronic chip(like an ASIC but more specific)

Very interesting as an idea. @StealthySemicolon do you have reference to other similar work done in the past?

sagarreddypatil · 2020-08-01T19:35:17Z

I wonder if you could hardcode the 175B model into an electronic chip(like an ASIC but more specific)

Very interesting as an idea. @StealthySemicolon do you have reference to other similar work done in the past?

No, just a hunch. Even if I did know how to do this, it's not like OpenAI would publicly release the model weights...

shashankMadan-designEsthetics · 2020-08-16T19:42:02Z

Guys when is this gonna be integrated!?

fen0s · 2020-08-17T09:10:48Z

When OpenAI decides to release GPT-3 open-sourcely, but this won't happen it seems, they just want to sell access to big corporations.

shashankMadan-designEsthetics · 2020-08-18T17:16:53Z

https://bdtechtalks.com/2020/08/17/openai-gpt-3-commercial-ai/amp/

Here it goes...

bhack · 2020-09-21T16:08:56Z

https://arxiv.org/abs/2009.07118
https://github.com/timoschick/pet

OverlordQ · 2020-09-22T20:05:31Z

Hopefully they have a better excuse than last time.

Because Microsoft gave us money.

Clickative · 2020-09-23T21:44:14Z

GPT-3 is not coming out anytime soon :(

shashankMadan-designEsthetics · 2020-09-24T11:05:43Z

this thread signifies capitalism's pros and cons at the same time...😅

albusdemens · 2020-10-21T22:32:16Z

The full model will be at least 350 GB (16-bit parameters). You'd need to partition it across more than (350 GB) / (16 GB) ~ 22 GPUs just to run it! Not to mention the egress costs of making a model that size available.

Of course, the paper shows 8 different-sized models, 4 of which are smaller than GPT-2, so some of those could be practical. 🙂

@AdamDanielKing is there a way to estimate the size of the GPT-3 XL model?

stale · 2020-12-24T11:26:20Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ItsIgnacioPortal · 2020-12-25T02:38:12Z

we're still waiting.. :(

srulikbd · 2021-01-04T21:59:55Z

it seems that a replication of GPT3 might be open source soon!! :
https://www.eleuther.ai/
https://github.com/EleutherAI

flarn2006 · 2021-01-05T00:34:48Z

Nice! Hope that works out!

…

On Mon, Jan 4, 2021, 5:00 PM srulikbd ***@***.***> wrote: it seems that a replication of GPT3 might be open source soon!! : https://www.eleuther.ai/ https://github.com/EleutherAI — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#4658 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFHZUO5O7O247EHFDWH7Q3SYI27BANCNFSM4NNTN5GQ> .

NielsRogge · 2022-10-17T11:44:41Z

Closing this as GPT-3 won't be open-sourced unfortunately.

Have a look at an open-source effort (a 176-billion parameter multilingual language model called BLOOM) to replicate it here:

blog post: https://bigscience.huggingface.co/blog/bloom
model: https://huggingface.co/bigscience/bloom.

Besides that, EleutherAI and other groups (such as this one) have been working on several open-source variants of GPT-3.

Yusuf-YENICERI · 2023-01-28T18:38:00Z

Don't worry, if they made it, some other people going to make it, inshaAllah.

There are already replications, so wait for that.

sarahwang93 · 2023-06-02T11:29:32Z

anyone told me is there GPT-3 available? the official one

Yusuf-YENICERI · 2023-06-02T21:44:15Z

@sarahwang93 No. It's not open sourced and they won't probably. Because they are able to make money using that.

Replying to myself: Yes you are right. Other people made millions of it, Alhamdulillah.

sarahwang93 · 2023-06-03T02:45:00Z

@Yusuf-YENICERI Hope that they could opensource after they made enough money, my phd dissertation is waiting for it.

Yusuf-YENICERI · 2023-06-03T03:13:41Z

@sarahwang93 Why do you need it? You can't run it simply. It's a real huge model. Maybe 700GB VRAM required to run it. If you want to know about how its made you can check the paper of it.

There are other open source models. You may want to check them.

NielsRogge · 2023-06-05T07:48:30Z

There's also the Open LLM leaderboard which benchmarks all openly available LLMs on 4 benchmarks: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard.

Of course this is not perfect as it only includes 4 benchmarks, but is still gives a nice overview of the best open-source LLMs out there.

Yusuf-YENICERI · 2023-06-05T09:06:25Z

@NielsRogge
https://chat.lmsys.org/?arena
This is better for simplicity, InshaAllah.

stefan-it added the New model label May 29, 2020

thomwolf mentioned this issue May 29, 2020

GPT-3 #4679

Closed

graykode mentioned this issue Aug 23, 2020

support inference large models such as gpt-3 in storage calculation. graykode/matorage#16

Open

stale bot added the wontfix label Dec 24, 2020

stale bot removed the wontfix label Dec 25, 2020

aolko mentioned this issue Mar 22, 2021

Add GPT-Neo #10844

Closed

3 tasks

NielsRogge closed this as completed Oct 17, 2022

Add upcoming GPT-3 model #4658

Add upcoming GPT-3 model #4658

Comments

stefan-it commented May 29, 2020 • edited Loading

🌟 New model addition

Model description

Open source status

LouisCastricato commented May 29, 2020 • edited Loading

moinnadeem commented May 29, 2020

AdamDanielKing commented May 29, 2020

GraphGrailAi commented May 29, 2020

AdamDanielKing commented May 29, 2020

minimaxir commented May 29, 2020

andifunke commented May 29, 2020

enzoampil commented May 30, 2020 • edited Loading

AdamDanielKing commented Jun 6, 2020

ljlueloff commented Jun 11, 2020

flarn2006 commented Jun 11, 2020

ljlueloff commented Jun 11, 2020

ljlueloff commented Jun 11, 2020

fen0s commented Jun 20, 2020 • edited Loading

stefan-it commented Jun 20, 2020

yassineAlouini commented Jul 19, 2020

fen0s commented Jul 20, 2020

sagarreddypatil commented Jul 27, 2020

yassineAlouini commented Aug 1, 2020

sagarreddypatil commented Aug 1, 2020

shashankMadan-designEsthetics commented Aug 16, 2020

fen0s commented Aug 17, 2020

shashankMadan-designEsthetics commented Aug 18, 2020 • edited Loading

bhack commented Sep 21, 2020

OverlordQ commented Sep 22, 2020

Clickative commented Sep 23, 2020

shashankMadan-designEsthetics commented Sep 24, 2020

albusdemens commented Oct 21, 2020 • edited Loading

stale bot commented Dec 24, 2020

ItsIgnacioPortal commented Dec 25, 2020

srulikbd commented Jan 4, 2021

flarn2006 commented Jan 5, 2021 via email

NielsRogge commented Oct 17, 2022

Yusuf-YENICERI commented Jan 28, 2023

sarahwang93 commented Jun 2, 2023

Yusuf-YENICERI commented Jun 2, 2023

sarahwang93 commented Jun 3, 2023

Yusuf-YENICERI commented Jun 3, 2023

NielsRogge commented Jun 5, 2023

Yusuf-YENICERI commented Jun 5, 2023

stefan-it commented May 29, 2020 •

edited

Loading

LouisCastricato commented May 29, 2020 •

edited

Loading

enzoampil commented May 30, 2020 •

edited

Loading

fen0s commented Jun 20, 2020 •

edited

Loading

shashankMadan-designEsthetics commented Aug 18, 2020 •

edited

Loading

albusdemens commented Oct 21, 2020 •

edited

Loading