Skip to content

Commit

Permalink
add llama example (#1382)
Browse files Browse the repository at this point in the history
* add llama example

* lint

* more lint

* introduct use_peft flag

* update readme

* address comments

---------

Co-authored-by: Prathik Rao <[email protected]@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
  • Loading branch information
prathikr and Prathik Rao authored Sep 19, 2023
1 parent 89d08c4 commit 7fc27f6
Show file tree
Hide file tree
Showing 3 changed files with 841 additions and 0 deletions.
54 changes: 54 additions & 0 deletions examples/onnxruntime/training/text-classification/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,60 @@ limitations under the License.

# Text classification

By running the script [`run_classification.py`](https://github.com/huggingface/optimum/blob/main/examples/onnxruntime/training/text-classification/run_classification.py),
we will be able to leverage the [`ONNX Runtime`](https://github.com/microsoft/onnxruntime) accelerator to fine-tune the models from the
[HuggingFace hub](https://huggingface.co/models) for text classification task.


__The following example applies the acceleration features powered by ONNX Runtime.__


### ONNX Runtime Training

The following example fine-tunes [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on the [Amazon Reviews Dataset](https://huggingface.co/datasets/amazon_reviews_multi).

```bash
torchrun --nproc_per_node=NUM_GPUS_YOU_HAVE run_classification.py \
--model_name_or_path meta-llama/Llama-2-7b-hf \
--dataset_name amazon_reviews_multi \
--dataset_config_name en \
--shuffle_train_dataset \
--metric_name accuracy \
--text_column_name 'review_title,review_body,product_category' \
--text_column_delimiter ' ' \
--label_column_name stars \
--do_train \
--do_eval \
--fp16 \
--max_seq_length 128 \
--per_device_train_batch_size 16 \
--learning_rate 2e-5 \
--num_train_epochs 1 \
--deepspeed zero_stage_2.json \
--use_peft \
--output_dir /tmp/ort-llama-2/
```

### Performance

We get the following results for [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) using mixed-precision-training/LoRA/ZeRO-Stage-2 under PyTorch and ONNX Runtime backends. 8 Nvidia V100 cards were used to run the
experiment for 10 epochs:

| Model | Backend | Runtime(s) | Train samples(/s) |
| --------------------------- |------------- | --------------- | ------------------- |
| meta-llama/Llama-2-7b-hf | PyTorch | 17035.9055 | 117.399 |
| meta-llama/Llama-2-7b-hf | ONNX Runtime | 15532.2403 | 128.764 |

We observe the gain of ONNX Runtime compared to PyTorch as follow:

| Model | Latency | Throughput |
| ------------------------- | ------- | ---------- |
| meta-llama/Llama-2-7b-hf | 8.83% | 9.68% |

#### DeepSpeed

[zero_stage_2.json](https://github.com/huggingface/optimum/blob/main/examples/onnxruntime/training/text-classification/zero_stage_2.json) is an example DeepSpeed config file to enable Stage-2 parameter sharing for training meta-llama/Llama-2-7b. More information can be found at [DeepSpeed's official repo](https://github.com/microsoft/DeepSpeed).

## GLUE Tasks

By running the script [`run_glue.py`](https://github.com/huggingface/optimum/blob/main/examples/onnxruntime/training/text-classification/run_glue.py),
Expand Down
Loading

0 comments on commit 7fc27f6

Please sign in to comment.