Update doc

huggingface · May 3, 2024 · 1f7dec4 · 1f7dec4
1 parent 77c667a
commit 1f7dec4
Show file tree

Hide file tree

Showing 7 changed files with 389 additions and 117 deletions.
diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml
@@ -12,8 +12,8 @@
       title: Notebooks
     - local: training_tutorials/fine_tune_bert
       title: Fine-tune BERT for Text Classification on AWS Trainium
-    - local: training_tutorials/fine_tune_llama_7b
-      title: Fine-tune Llama 2 7B on AWS Trainium
+    - local: training_tutorials/finetune_llm
+      title: Fine-tune Llama 3 8B on AWS Trainium
     title: Training Tutorials
   - sections:
     - local: inference_tutorials/notebooks
@@ -26,8 +26,6 @@
       title: Generate images with Stable Diffusion models on AWS Inferentia
     title: Inference Tutorials
   - sections:
-    - local: guides/overview
-      title: Overview
     - local: guides/setup_aws_instance
       title: Set up AWS Trainium instance
     - local: guides/sagemaker

diff --git a/docs/source/guides/distributed_training.mdx b/docs/source/guides/distributed_training.mdx
@@ -18,8 +18,8 @@ But there is a caveat: each Neuron core is an independent data-parallel worker b
 To alleviate that, `optimum-neuron` supports parallelism features enabling you to harness the full power of your Trainium instance:
 
   1. [ZeRO-1](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/tutorials/training/zero1_gpt2.html): It is an optimization of data-parallelism which consists in sharding the optimizer state (which usually represents half of the memory needed on the device) over the data-parallel ranks.
-  2. [Tensor Parallelism](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tensor_parallelism_overview.html): It is a technique which consists in sharding each of your model parameters along a given dimension on multiple devices. The number of devices to shard your parameters on is called the `tensor_parallel_size`. 
-  3. [Pipeline Parallelism](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/pipeline_parallelism_overview.html): **coming soon!**
+  2. [Tensor Parallelism](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tensor_parallelism_overview.html): It is a technique which consists in sharding each of your model parameters along a given dimension on multiple devices. It also known as intra-layer model parallelism. The number of devices to shard your parameters on is called the `tensor_parallel_size`. 
+  3. [Pipeline Parallelism](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/pipeline_parallelism_overview.html): It is a technique consisting in sharding the model block layers on multiple devices. It is also known as inter-layer model parallelism. The number of devices to shard your layers on is called the `pipeline_parallel_size`.
 
 
 The good news is that is it possible to combine those techniques, and `optimum-neuron` makes it very easy!

diff --git a/docs/source/guides/setup_aws_instance.mdx b/docs/source/guides/setup_aws_instance.mdx
@@ -16,6 +16,13 @@ limitations under the License.
 
 # Set up AWS Trainium instance
 
+In this guide, we will show you:
+
+1. How to create an AWS Trainium instance
+2. How to use and run Jupyter Notebooks on your instance
+
+## Create an AWS Trainium Instance
+
 The simplest way to work with AWS Trainium and Hugging Face Transformers is the [Hugging Face Neuron Deep Learning AMI](https://aws.amazon.com/marketplace/pp/prodview-gr3e6yiscria2) (DLAMI). The DLAMI comes with all required libraries pre-packaged for you, including the Neuron Drivers, Transformers, Datasets, and Accelerate.
 
 To create an EC2 Trainium instance, you can start from the console or the Marketplace. This guide will start from the [EC2 console](https://console.aws.amazon.com/ec2sp/v2/).
@@ -96,4 +103,18 @@ instance-id: i-0570615e41700a481
 +--------+--------+--------+---------+
 ```
 
+## Configuring `Jupyter Notebook` on your AWS Trainium Instance
+
+With the instance is up and running, we can ssh into it. 
+But instead of developing inside a terminal it is also possible to use a `Jupyter Notebook` environment. We can use it for preparing our dataset and launching the training (at least when working on a single node). 
+
+For this, we need to add a port for forwarding in the `ssh` command, which will tunnel our localhost traffic to the Trainium instance.
+
+```bash
+PUBLIC_DNS="" # IP address, e.g. ec2-3-80-....
+KEY_PATH="" # local path to key, e.g. ssh/trn.pem
+
+ssh -L 8080:localhost:8080 -i ${KEY_NAME}.pem ubuntu@$PUBLIC_DNS
+```
+
 You are done! You can now start using the Trainium accelerators with Hugging Face Transformers. Check out the [Fine-tune Transformers with AWS Trainium](./fine_tune) guide to get started.