Skip to content

Latest commit

 

History

History
33 lines (23 loc) · 1.1 KB

README.md

File metadata and controls

33 lines (23 loc) · 1.1 KB

t5_11

housing our model example of fine tuning an 11B t5 with FSDP to create a world-class grammar checker. children_correction

to get going...

pip install -r requirements.txt

a large and small dataset are already present in the project (grammar_train.csv = small, gtrain_150K.csv = large).

to baseline your environment or this model (adjust nproc to equal your gpu count):

torchrun --nnodes=1 --nproc_per_node=8 --rdzv_id=101 --rdzv_endpoint="localhost:5679" main_benchmark.py  

On an A100 (p4d.24xlarge) you should expect to see:

benchmark_t5

To train with mp spawn:

python main.py

Or better, with torchrun:

torchrun --nnodes=1 --nproc_per_node=8 --rdzv_id=101 --rdzv_endpoint="localhost:5679" main_elastic.py  

You can control the model size, dataset size, batch size, etc. all in the config/defaults.py