Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
jacobfulano committed Jun 29, 2023
1 parent b6d450b commit 343c34a
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion scripts/data_prep/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Data preparation

This folder contains scripts for converting text data from original sources (HF, JSON) to the Mosaic [StreamingDataset](https://github.com/mosaicml/streaming) format for consumption by our training scripts.
This folder contains scripts for converting text data from original sources (HF, JSON) to the Mosaic [StreamingDataset](https://github.com/mosaicml/streaming) format for consumption by our training scripts. StreamingDataset is designed to make training on large datasets from cloud storage as fast, cheap, and scalable as possible. In particular, it is custom built for multi-node, distributed training for large models while maximizing correctness guarantees, performance, and ease of use.


## Converting a pretraining dataset
Expand Down

0 comments on commit 343c34a

Please sign in to comment.