Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Phi-3-mini-4k-instruct checkpoint #1341

Merged
merged 36 commits into from
Jul 1, 2024
Merged

Add Phi-3-mini-4k-instruct checkpoint #1341

merged 36 commits into from
Jul 1, 2024

Conversation

rasbt
Copy link
Collaborator

@rasbt rasbt commented Apr 23, 2024

  • Verify Phi-3-mini-4k-instruct configs
  • Add prompt style
  • Add other config files (will be added in a separate PR)
  • Add test_model.py
  • Add support in convert_hf_checkpoint.py
  • Add support in convert_lit_checkpoint.py
  • Add tests for conversion scripts
  • Add to test_prompts.py
  • Update 2 tables in README
  • Update download_model_weights.md

@rasbt rasbt marked this pull request as draft April 23, 2024 15:14
@Andrei-Aksionov
Copy link
Collaborator

There is a modeling_*.py file.
Good luck 🙂.

@rasbt
Copy link
Collaborator Author

rasbt commented Apr 23, 2024

There is a modeling_*.py file.
Good luck 🙂.

Haha, I finally get the weights loaded but of course it's never easy ... of course it's generating gibberish

⚡ phi-3-checkpoint ~/litgpt litgpt chat --checkpoint_dir checkpoints/microsoft/Phi-3-mini-4k-instruct
Now chatting with Phi-3-mini-4k-instruct.
To exit, press 'Enter' on an empty prompt.

Seed set to 1234
>> Prompt: What do llamas eat?
>> Reply: epsonniformes }).selves }).SSIONunicívo }). EverythingFormsћassaiejalphutureievediennesenticaciónicaciónMilMinigh ninassaselvesselves exhaustselvesonnselvesktionΗracheracheionedΗ Avenoted Bij_+versionsmastevosepsselvesmobileselvesilleryassaucealphasseestoreselvesférFormsiej Mu Kaiser oppienngnatteversionsionedionedversionsSSIONectionaccoossFormassaselves_+uminatesonoSSIONológissancecenteecause_+ienn选uraleʋ Stepalphigosionaliilonverte }).ienn }).ativo Sternsonoiejuralassawnkademselves│uraleativaionedvos_+utschversionsponiej_+icacióniejiewerológvoasonverte shoutioned位ionedIdentmobi

Let the easter egg hunt begin 😭

@rasbt
Copy link
Collaborator Author

rasbt commented Apr 24, 2024

Some more tidbits via Daniel Han:

Phi 3 (3.8B) got released! The paper said it was just a Llama arch, but I found some quirks while adding this to
@UnslothAI
:

  1. Sliding window of 2047? Mistral v1 4096. So does Phi mini have SWA? (And odd num?) Max RoPE position is 4096?
  2. Upcasted RoPE? Like Gemma?
  3. Dynamic RoPE for 128K context lengths
  4. Fused MLP & QKV - need to unfuse
  5. MMLU evals are very different betw the Phi team Llama-3 team - why?

@Andrei-Aksionov
Copy link
Collaborator

Ok, it's becoming more interesting.
Somewhat I expected from LlaMA 3, but it didn't deliver.

litgpt/model.py Outdated Show resolved Hide resolved
@rasbt
Copy link
Collaborator Author

rasbt commented Apr 24, 2024

@Andrei-Aksionov
Copy link
Collaborator

Current code is an ugly state, but at least the model produces the same output as HF one.
The most notable change is that now Phi3 model doesn't use parallel_residual in contrast to Phi1.5 and Phi2.

The missing piece is the Tokenizer: it has a smaller vocab size (32k vs 50k) that was extended by 64 special tokens.
If I'm not mistaken, the current code doesn't add these tokens.

litgpt/model.py Outdated Show resolved Hide resolved
@rasbt
Copy link
Collaborator Author

rasbt commented Apr 25, 2024

The missing piece is the Tokenizer: it has a smaller vocab size (32k vs 50k) that was extended by 64 special tokens.
If I'm not mistaken, the current code doesn't add these tokens.

Yeah, that sounds about right based on the Phi-3 paper:

To best benefit the open source community, phi-3-mini is built upon a similar block structure as Llama-2 [TLI+23] and uses the same tokenizer with vocabulary size of 320641

litgpt/prompts.py Outdated Show resolved Hide resolved
litgpt/prompts.py Outdated Show resolved Hide resolved
@Andrei-Aksionov Andrei-Aksionov changed the title Add phi-3 checkpoint Add Phi-3-mini-4k-instruct checkpoint Jun 28, 2024
@Andrei-Aksionov Andrei-Aksionov marked this pull request as ready for review June 28, 2024 11:26
@Andrei-Aksionov
Copy link
Collaborator

Alt Text

Required some number of changes but it works.
Also tried a quick LoRA finetune — no issues there.

@rasbt Could you check the changes in READMEs? Not 100% sure that I've done them correctly.

@rasbt
Copy link
Collaborator Author

rasbt commented Jun 28, 2024

Thanks so much! I am currently moving and offline until weekend/monday. Will take a look when I am back!

@Andrei-Aksionov Andrei-Aksionov mentioned this pull request Jun 29, 2024
14 tasks
Copy link
Collaborator Author

@rasbt rasbt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tremendous PR. Thanks so much @Andrei-Aksionov . Just tried the model and it works great!

Copy link
Collaborator Author

@rasbt rasbt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

@rasbt
Copy link
Collaborator Author

rasbt commented Jul 1, 2024

I think the failing tests are because of the new Eval Harness release: https://pypi.org/project/lm-eval/#history

I can look into it in a separate PR

@Andrei-Aksionov
Copy link
Collaborator

Yep, this is the reason.
I "love" when bug-fix releases break code.

@rasbt
Copy link
Collaborator Author

rasbt commented Jul 1, 2024

All good now. Big thanks again @Andrei-Aksionov !!

@rasbt rasbt merged commit 0663b47 into main Jul 1, 2024
9 checks passed
@rasbt rasbt deleted the phi-3-checkpoint branch July 1, 2024 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants