New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add Typing to Llama Training #9

Open

lennart-finke wants to merge 8 commits into main from typing

Collaborator

lennart-finke commented Aug 5, 2024

Description

Added Jax and regular typing to llama_train.py

Related Issue

Should close #4.

How Has This Been Tested?

Executing the script in a Colab instance.

lennart-finke added 2 commits

August 5, 2024 15:34


          Add typing

c7e53ad


          Added jax typing

e72fd41

Collaborator Author

lennart-finke commented Aug 5, 2024

Alright, will have to work on this some more as the checks indicate.


          Fixed pyright issues

ff9088f

Collaborator Author

lennart-finke commented Aug 5, 2024

Now it might go through. If someone has time, I recommend double checking the jaxtyping hints though, as I was not entirely sure everywhere.

lennart-finke added 2 commits

August 5, 2024 21:14


          Fixing dependencies

cf4fc51


          Moving llama training script

d4a6e6f

danbraunai requested changes

View reviewed changes

Owner

danbraunai left a comment

Thanks for this. Bunch of comments. Btw I think you may need to merge main into this branch

simple_stories_train/train_llama.py Outdated

Comment on lines 18 to 41

-              This implementation is based on
-              - llm.c,           licensed under MIT ((c) 2024 Andrei Karpathy) and
-              - TransformerLens, licensed under MIT ((c) 2022 TransformerLensOrg).
-              MIT License:
-              Permission is hereby granted, free of charge, to any person obtaining a copy
-              of this software and associated documentation files (the "Software"), to deal
-              in the Software without restriction, including without limitation the rights
-              to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-              copies of the Software, and to permit persons to whom the Software is
-              furnished to do so, subject to the following conditions:
-              The above copyright notice and this permission notice shall be included in all
-              copies or substantial portions of the Software.
-              THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-              IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-              FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-              AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-              LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-              OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
-              SOFTWARE.

Owner

danbraunai Aug 8, 2024

I think this was accidentally deleted?

simple_stories_train/train_llama.py Outdated Show resolved Hide resolved

simple_stories_train/train_llama.py Outdated

                   rotary_dim: int = 768 // 12  # i.e. same as d_head
                   rotary_base: int = 10000
                   n_ctx: int = 1024
                   n_key_value_heads: int = (
 // 4
                   )  # Note that llama 3.1 n_key_value_heads does not scale with n_heads
-                  use_grouped_query_attention: bool = True

Owner

danbraunai Aug 8, 2024

And these

simple_stories_train/train_llama.py Outdated

Comment on lines 77 to 78

		self.kv_attn = nn.Linear(config.n_embd, 2 * config.n_embd // self.repeat_kv_heads)
		self.q_attn = nn.Linear(config.n_embd, config.n_embd)

Owner

danbraunai Aug 8, 2024

The bias argument has disappeared here, and for other attributes down below too.

simple_stories_train/train_llama.py Outdated

-                      self.c_proj = nn.Linear(config.n_embd, config.n_embd, bias=config.attn_bias)
-                      self.c_proj.LLMC_RESIDUAL_SCALE_FLAG = 1
+                      self.c_proj = nn.Linear(config.n_embd, config.n_embd)
+                      self.LLMC_RESIDUAL_SCALE_FLAG = 1

Owner

danbraunai Aug 8, 2024

Why this change? Seems like you want to keep it on the c_proj

simple_stories_train/train_llama.py Show resolved Hide resolved

simple_stories_train/train_llama.py Outdated

                       print0(f"DataLoader: total number of tokens: {ntok_total:,} across {len(self.files)} files")
                       # kick things off
-                      self.current_shard = None
+                      self.current_shard = -1

Owner

danbraunai Aug 8, 2024

How come this change was made?

simple_stories_train/train_llama.py Outdated

Comment on lines 621 to 626

+              # -----------------------------------------------------------------------------
+              # Python -> C bridge utilities for saving params/grads/activations to .bin files
+              def write_fp32(tensor: torch.Tensor, file: BufferedWriter):
+                  t = tensor.detach().cpu().to(torch.float32)

Owner

danbraunai Aug 8, 2024

I think we deleted all the below from main. I'm guessing you hadn't pulled that latest main onto your branch?

Collaborator Author

lennart-finke Aug 9, 2024

Yes, looks like it.

simple_stories_train/train_llama.py Outdated

Comment on lines 811 to 812

		import argparse
		import time

Owner

danbraunai Aug 8, 2024

best to import at top of file

simple_stories_train/train_llama.py Outdated

Comment on lines 1060 to 1063

+                  # -------------------------------------------------------------------------
+                  # PyTorch -> C bridge: save some weights and state for C to load later as reference
+                  # do one forward pass to generate ground truth for our C tests

Owner

danbraunai Aug 8, 2024

I think we deleted all this too

lennart-finke added 3 commits

August 9, 2024 13:37


          Fixed typing issues

bb2d23d


          Addressing more typing issues

6b005ba


          Merge branch 'main' into typing

6fb9961

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet