Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
jacobfulano committed Jun 29, 2023
1 parent 1930a09 commit a935d87
Showing 1 changed file with 4 additions and 1 deletion.
5 changes: 4 additions & 1 deletion scripts/train/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,10 @@ You can read more about the benefits of using mosaicml-streaming [here](https://
NOTE: If you only want to profile these LLMs, we recommend that you **download and prepare the `train_small` and `val_small` splits**,
and skip the full `train` and `val` splits. You'll just need to replace `split: train` with `split: train_small`
and `split: val` with `split: val_small` in your run YAML's dataloader config.
You can also accomplish this in your CLI command like so: `composer train.py ... train_loader.dataset.split=train_small eval_loader.dataset.split=val_small`
You can also accomplish this in your CLI command like so:
```bash
composer train.py ... train_loader.dataset.split=train_small eval_loader.dataset.split=val_small
```
Alternatively, feel free to substitute our dataloader with one of your own in `train.py`.

### Converting C4 to streaming dataset `.mds` format
Expand Down

0 comments on commit a935d87

Please sign in to comment.