Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add upcoming GPT-3 model #4658

Closed
1 of 4 tasks
stefan-it opened this issue May 29, 2020 · 40 comments
Closed
1 of 4 tasks

Add upcoming GPT-3 model #4658

stefan-it opened this issue May 29, 2020 · 40 comments

Comments

@stefan-it
Copy link
Collaborator

stefan-it commented May 29, 2020

🌟 New model addition

Model description

The GPT-3 paper just landed on ArXiv: https://arxiv.org/abs/2005.14165.

Would be great to integrate it into Transformers, whenever models are available.

Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.

Open source status

  • GitHub repository is available: here
  • the model implementation is available: (give details)
  • the model weights are available: (give details)
  • who are the authors: (mention them, if possible by @gh-username)
@LouisCastricato
Copy link

LouisCastricato commented May 29, 2020

My god, the paper hasn't even been up for a day...

Said being, +1

@moinnadeem
Copy link

So who can run 175B parameters and what do I have to do for a favor?

@AdamDanielKing
Copy link

The full model will be at least 350 GB (16-bit parameters). You'd need to partition it across more than (350 GB) / (16 GB) ~ 22 GPUs just to run it! Not to mention the egress costs of making a model that size available.

Of course, the paper shows 8 different-sized models, 4 of which are smaller than GPT-2, so some of those could be practical. 🙂

image

@GraphGrailAi
Copy link

Is there any Colab to test at least GPT-3 XL ?

@AdamDanielKing
Copy link

Is there any Colab to test at least GPT-3 XL ?

They haven't released any code or pretrained models yet. See the issue on the official repo: openai/gpt-3#1

@minimaxir
Copy link

Note that the released models may be FP16, which may require forcing FP16 for use/finetuning (and therefore hardware-limited), or casting up to FP32.

@thomwolf thomwolf mentioned this issue May 29, 2020
@andifunke
Copy link

Of course, the paper shows 8 different-sized models, 4 of which are smaller than GPT-2, so some of those could be practical. slightly_smiling_face

One of the main benefits of the smaller gpt-3 models compared to their gpt-2 counterparts could be the increased context length of 2048 tokens.

@enzoampil
Copy link
Contributor

enzoampil commented May 30, 2020

Yeah, personally, I wouldn't be able to use the models that won't fit in a Tesla P100

@AdamDanielKing
Copy link

The GPT-3 repo is now archived (read-only) so perhaps OpenAI isn't planning on releasing anything this time around.

@ljlueloff
Copy link

The GPT-3 repo is now archived (read-only) so perhaps OpenAI isn't planning on releasing anything this time around.

That is a crying shame, because my system could do-er... :(

@flarn2006
Copy link

Hopefully they have a better excuse than last time.

@ljlueloff
Copy link

Hopefully they have a better excuse than last time.

@flarn2006 You mean the....ooohhhh we created something scary and have soggy diapers excuse with GPT-3?

@ljlueloff
Copy link

@flarn2006 If they don't make excuses or drag their feet, and I finish my system build in a relatively congruent time frame, hopefully I can help...

@fen0s
Copy link

fen0s commented Jun 20, 2020

A little update: OpenAI's now running their own API with GPT-3 on it. https://beta.openai.com
You can apply for access, but seems like they're aiming mostly for big companies, not researchers. Sad, way too sad.

@stefan-it
Copy link
Collaborator Author

But who put the "Open" in OpenAI then 🤔

@yassineAlouini
Copy link

I guess we will need to "fundraise" enough GPU-compute to run the GPT3 model. 😄

@fen0s
Copy link

fen0s commented Jul 20, 2020

It should be possible to run lower-models on regular GPUs, like 1b model. But we don't have the model itself, and seems that OpenAI is against releasing it and would rather commercialize it :(

@sagarreddypatil
Copy link

I wonder if you could hardcode the 175B model into an electronic chip(like an ASIC but more specific)

@yassineAlouini
Copy link

I wonder if you could hardcode the 175B model into an electronic chip(like an ASIC but more specific)

Very interesting as an idea. @StealthySemicolon do you have reference to other similar work done in the past?

@sagarreddypatil
Copy link

I wonder if you could hardcode the 175B model into an electronic chip(like an ASIC but more specific)

Very interesting as an idea. @StealthySemicolon do you have reference to other similar work done in the past?

No, just a hunch. Even if I did know how to do this, it's not like OpenAI would publicly release the model weights...

@shashankMadan-designEsthetics

Guys when is this gonna be integrated!?

@fen0s
Copy link

fen0s commented Aug 17, 2020

When OpenAI decides to release GPT-3 open-sourcely, but this won't happen it seems, they just want to sell access to big corporations.

@shashankMadan-designEsthetics
Copy link

shashankMadan-designEsthetics commented Aug 18, 2020

https://bdtechtalks.com/2020/08/17/openai-gpt-3-commercial-ai/amp/

Here it goes...

@bhack
Copy link
Contributor

bhack commented Sep 21, 2020

https://arxiv.org/abs/2009.07118
https://github.com/timoschick/pet

@OverlordQ
Copy link

Hopefully they have a better excuse than last time.

Because Microsoft gave us money.

@Clickative
Copy link

GPT-3 is not coming out anytime soon :(

@shashankMadan-designEsthetics

this thread signifies capitalism's pros and cons at the same time...😅

@albusdemens
Copy link

albusdemens commented Oct 21, 2020

The full model will be at least 350 GB (16-bit parameters). You'd need to partition it across more than (350 GB) / (16 GB) ~ 22 GPUs just to run it! Not to mention the egress costs of making a model that size available.

Of course, the paper shows 8 different-sized models, 4 of which are smaller than GPT-2, so some of those could be practical. 🙂

image

@AdamDanielKing is there a way to estimate the size of the GPT-3 XL model?

@stale
Copy link

stale bot commented Dec 24, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Dec 24, 2020
@ItsIgnacioPortal
Copy link

we're still waiting.. :(

@stale stale bot removed the wontfix label Dec 25, 2020
@srulikbd
Copy link

srulikbd commented Jan 4, 2021

it seems that a replication of GPT3 might be open source soon!! :
https://www.eleuther.ai/
https://github.com/EleutherAI

@flarn2006
Copy link

flarn2006 commented Jan 5, 2021 via email

@aolko aolko mentioned this issue Mar 22, 2021
3 tasks
@NielsRogge
Copy link
Contributor

Closing this as GPT-3 won't be open-sourced unfortunately.

Have a look at an open-source effort (a 176-billion parameter multilingual language model called BLOOM) to replicate it here:

Besides that, EleutherAI and other groups (such as this one) have been working on several open-source variants of GPT-3.

@Yusuf-YENICERI
Copy link

Don't worry, if they made it, some other people going to make it, inshaAllah.

There are already replications, so wait for that.

@sarahwang93
Copy link

anyone told me is there GPT-3 available? the official one

@Yusuf-YENICERI
Copy link

@sarahwang93 No. It's not open sourced and they won't probably. Because they are able to make money using that.

Replying to myself: Yes you are right. Other people made millions of it, Alhamdulillah.

@sarahwang93
Copy link

@Yusuf-YENICERI Hope that they could opensource after they made enough money, my phd dissertation is waiting for it.

@Yusuf-YENICERI
Copy link

@sarahwang93 Why do you need it? You can't run it simply. It's a real huge model. Maybe 700GB VRAM required to run it. If you want to know about how its made you can check the paper of it.

There are other open source models. You may want to check them.

@NielsRogge
Copy link
Contributor

There's also the Open LLM leaderboard which benchmarks all openly available LLMs on 4 benchmarks: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard.

Of course this is not perfect as it only includes 4 benchmarks, but is still gives a nice overview of the best open-source LLMs out there.

@Yusuf-YENICERI
Copy link

@NielsRogge
https://chat.lmsys.org/?arena
This is better for simplicity, InshaAllah.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests