YAMLS for MPT runs inherit global max_seq_len in model config (#409)

* mpt configs inherit global max_seq_len in YAML * update hf_eval yaml with max_seq_len override --------- Co-authored-by: Vitaliy Chiley <[email protected]>
mosaicml · Jul 1, 2023 · 5c14661 · 5c14661
1 parent 37bf6f5
commit 5c14661
Show file tree

Hide file tree

Showing 4 changed files with 5 additions and 0 deletions.
diff --git a/TUTORIAL.md b/TUTORIAL.md
@@ -217,6 +217,7 @@ Now that we have our data ready, we can slightly modify `scripts/train/yamls/fin
 ```bash
 composer scripts/train/train.py scripts/train/yamls/finetune/mpt-7b_domain_adapt.yaml max_seq_len=4096 ...
 ```
+> Note that this override where we set `max_seq_len=4096` in the above command works because of how the whole YAML is set up. Importantly, the YAML is configured with `model.config_overrides.max_seq_len: ${max_seq_len}`, which tells the MPT model to override its default max sequence length with the value set for `max_seq_len`.
 
 You will see some info logs including your configs, and then training will start.
 

diff --git a/scripts/eval/yamls/hf_eval.yaml b/scripts/eval/yamls/hf_eval.yaml
@@ -26,6 +26,8 @@ models:
 #     pretrained_model_name_or_path: mosaicml/mpt-7b
 #     init_device: cpu
 #     pretrained: true
+#     config_overrides:
+#       max_seq_len: ${max_seq_len}
 #   tokenizer:
 #     name: mosaicml/mpt-7b
 #     kwargs:

diff --git a/scripts/train/finetune_example/mpt-7b-arc-easy--gpu.yaml b/scripts/train/finetune_example/mpt-7b-arc-easy--gpu.yaml
@@ -10,6 +10,7 @@ model:
   pretrained_model_name_or_path: mosaicml/mpt-7b
   pretrained: true  # false: only use the architecture; true: initialize with pretrained weights
   config_overrides:
+    max_seq_len: ${max_seq_len}
     attn_config:
       attn_impl: triton
       # Set this to `true` if using `train_loader.dataset.packing_ratio` below

diff --git a/scripts/train/yamls/finetune/mpt-7b_dolly_sft.yaml b/scripts/train/yamls/finetune/mpt-7b_dolly_sft.yaml
@@ -9,6 +9,7 @@ model:
   pretrained: true
   pretrained_model_name_or_path: mosaicml/mpt-7b
   config_overrides:
+    max_seq_len: ${max_seq_len}
     attn_config:
       attn_impl: triton
       # Set this to `true` if using `train_loader.dataset.packing_ratio` below