Skip to content

Commit

Permalink
Update doc
Browse files Browse the repository at this point in the history
  • Loading branch information
michaelbenayoun committed May 3, 2024
1 parent 77c667a commit 1f7dec4
Show file tree
Hide file tree
Showing 7 changed files with 389 additions and 117 deletions.
6 changes: 2 additions & 4 deletions docs/source/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@
title: Notebooks
- local: training_tutorials/fine_tune_bert
title: Fine-tune BERT for Text Classification on AWS Trainium
- local: training_tutorials/fine_tune_llama_7b
title: Fine-tune Llama 2 7B on AWS Trainium
- local: training_tutorials/finetune_llm
title: Fine-tune Llama 3 8B on AWS Trainium
title: Training Tutorials
- sections:
- local: inference_tutorials/notebooks
Expand All @@ -26,8 +26,6 @@
title: Generate images with Stable Diffusion models on AWS Inferentia
title: Inference Tutorials
- sections:
- local: guides/overview
title: Overview
- local: guides/setup_aws_instance
title: Set up AWS Trainium instance
- local: guides/sagemaker
Expand Down
4 changes: 2 additions & 2 deletions docs/source/guides/distributed_training.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@ But there is a caveat: each Neuron core is an independent data-parallel worker b
To alleviate that, `optimum-neuron` supports parallelism features enabling you to harness the full power of your Trainium instance:

1. [ZeRO-1](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/tutorials/training/zero1_gpt2.html): It is an optimization of data-parallelism which consists in sharding the optimizer state (which usually represents half of the memory needed on the device) over the data-parallel ranks.
2. [Tensor Parallelism](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tensor_parallelism_overview.html): It is a technique which consists in sharding each of your model parameters along a given dimension on multiple devices. The number of devices to shard your parameters on is called the `tensor_parallel_size`.
3. [Pipeline Parallelism](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/pipeline_parallelism_overview.html): **coming soon!**
2. [Tensor Parallelism](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tensor_parallelism_overview.html): It is a technique which consists in sharding each of your model parameters along a given dimension on multiple devices. It also known as intra-layer model parallelism. The number of devices to shard your parameters on is called the `tensor_parallel_size`.
3. [Pipeline Parallelism](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/pipeline_parallelism_overview.html): It is a technique consisting in sharding the model block layers on multiple devices. It is also known as inter-layer model parallelism. The number of devices to shard your layers on is called the `pipeline_parallel_size`.


The good news is that is it possible to combine those techniques, and `optimum-neuron` makes it very easy!
Expand Down
21 changes: 21 additions & 0 deletions docs/source/guides/setup_aws_instance.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,13 @@ limitations under the License.

# Set up AWS Trainium instance

In this guide, we will show you:

1. How to create an AWS Trainium instance
2. How to use and run Jupyter Notebooks on your instance

## Create an AWS Trainium Instance

The simplest way to work with AWS Trainium and Hugging Face Transformers is the [Hugging Face Neuron Deep Learning AMI](https://aws.amazon.com/marketplace/pp/prodview-gr3e6yiscria2) (DLAMI). The DLAMI comes with all required libraries pre-packaged for you, including the Neuron Drivers, Transformers, Datasets, and Accelerate.

To create an EC2 Trainium instance, you can start from the console or the Marketplace. This guide will start from the [EC2 console](https://console.aws.amazon.com/ec2sp/v2/).
Expand Down Expand Up @@ -96,4 +103,18 @@ instance-id: i-0570615e41700a481
+--------+--------+--------+---------+
```

## Configuring `Jupyter Notebook` on your AWS Trainium Instance

With the instance is up and running, we can ssh into it.
But instead of developing inside a terminal it is also possible to use a `Jupyter Notebook` environment. We can use it for preparing our dataset and launching the training (at least when working on a single node).

For this, we need to add a port for forwarding in the `ssh` command, which will tunnel our localhost traffic to the Trainium instance.

```bash
PUBLIC_DNS="" # IP address, e.g. ec2-3-80-....
KEY_PATH="" # local path to key, e.g. ssh/trn.pem

ssh -L 8080:localhost:8080 -i ${KEY_NAME}.pem ubuntu@$PUBLIC_DNS
```

You are done! You can now start using the Trainium accelerators with Hugging Face Transformers. Check out the [Fine-tune Transformers with AWS Trainium](./fine_tune) guide to get started.
Loading

0 comments on commit 1f7dec4

Please sign in to comment.