Skip to content

Commit

Permalink
Merge branch 'huggingface:main' into feature_general_sampling
Browse files Browse the repository at this point in the history
  • Loading branch information
bocchris-aws authored Jul 31, 2023
2 parents 0608680 + 0630fd0 commit a5cdcdf
Show file tree
Hide file tree
Showing 102 changed files with 7,374 additions and 597 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/doc-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ jobs:
- name: Setup environment
run: |
pip install ".[quality]"
pip install ".[quality, diffusers]"
- name: Make documentation
shell: bash
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/doc-pr-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ jobs:

- name: Setup environment
run: |
pip install ".[quality]"
pip install ".[quality, diffusers]"
- name: Make documentation
shell: bash
Expand Down
105 changes: 105 additions & 0 deletions .github/workflows/test_inf2.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
name: Optimum neuron / Test INF2

on:
push:
branches: [ main ]
pull_request:
branches: [ main ]

jobs:
start-runner:
name: Start self-hosted EC2 runner
runs-on: ubuntu-latest
env:
AWS_REGION: us-east-1
EC2_AMI_ID: ami-03f5b2e86a2a937e7
EC2_INSTANCE_TYPE: inf2.8xlarge
EC2_SUBNET_ID: subnet-859322b4,subnet-b7533b96,subnet-47cfad21,subnet-a396b2ad,subnet-06576a4b,subnet-df0f6180
EC2_SECURITY_GROUP: sg-0bb210cd3ec725a13
EC2_IAM_ROLE: optimum-ec2-github-actions-role
outputs:
label: ${{ steps.start-ec2-runner.outputs.label }}
ec2-instance-id: ${{ steps.start-ec2-runner.outputs.ec2-instance-id }}
steps:
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v1
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ env.AWS_REGION }}
- name: Start EC2 runner
id: start-ec2-runner
uses: philschmid/philschmid-ec2-github-runner@main
with:
mode: start
github-token: ${{ secrets.GH_PERSONAL_ACCESS_TOKEN }}
ec2-image-id: ${{ env.EC2_AMI_ID }}
ec2-instance-type: ${{ env.EC2_INSTANCE_TYPE }}
subnet-id: ${{ env.EC2_SUBNET_ID }}
security-group-id: ${{ env.EC2_SECURITY_GROUP }}
iam-role-name: ${{ env.EC2_IAM_ROLE }}
aws-resource-tags: > # optional, requires additional permissions
[
{"Key": "Name", "Value": "ec2-optimum-github-runner"},
{"Key": "GitHubRepository", "Value": "${{ github.repository }}"}
]
do-the-job:
name: Run INF2 tests
needs: start-runner # required to start the main job when the runner is ready
runs-on: ${{ needs.start-runner.outputs.label }} # run the job on the newly created runner
env:
AWS_REGION: us-east-1
steps:
- name: Checkout
uses: actions/checkout@v2
- name: Re-install neuronx driver
run: |
sudo apt remove aws-neuronx-dkms -y
sudo apt remove aws-neuronx-tools -y
. /etc/os-release
sudo tee /etc/apt/sources.list.d/neuron.list > /dev/null <<EOF
deb https://apt.repos.neuron.amazonaws.com ${VERSION_CODENAME} main
EOF
wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | sudo apt-key add -
sudo apt-get update -y
sudo apt-get install linux-headers-$(uname -r) -y
sudo apt-get install aws-neuronx-dkms=2.* -y
sudo apt-get install aws-neuronx-tools=2.* -y
sudo apt-get install aws-neuronx-collectives=2.* -y
sudo apt-get install aws-neuronx-runtime-lib=2.* -y
export PATH=/opt/aws/neuron/bin:$PATH
- name: Install python dependencies
run: |
sudo apt install python3.8-venv -y
python3 -m venv aws_neuron_venv_pytorch
source aws_neuron_venv_pytorch/bin/activate
python -m pip install -U pip
python -m pip config set global.extra-index-url https://pip.repos.neuron.amazonaws.com
python -m pip install .[neuronx,tests]
- name: Run tests
run: |
source aws_neuron_venv_pytorch/bin/activate
pytest -m is_inferentia_test tests
stop-runner:
name: Stop self-hosted EC2 runner
needs:
- start-runner # required to get output from the start-runner job
- do-the-job # required to wait when the main job is done
runs-on: ubuntu-latest
env:
AWS_REGION: us-east-1
if: ${{ always() }} # required to stop the runner even if the error happened in the previous jobs
steps:
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v1
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ env.AWS_REGION }}
- name: Stop EC2 runner
uses: philschmid/philschmid-ec2-github-runner@main
with:
mode: stop
github-token: ${{ secrets.GH_PERSONAL_ACCESS_TOKEN }}
label: ${{ needs.start-runner.outputs.label }}
ec2-instance-id: ${{ needs.start-runner.outputs.ec2-instance-id }}
10 changes: 6 additions & 4 deletions .github/workflows/test_on_trainium.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,13 @@ name: Optimum Neuron - Trainium-dependent Tests

on:
push:
branches:
- main
branches: [ main ]
paths:
- "optimum/**.py"
pull_request:
branches:
- main
branches: [ main ]
paths:
- "optimum/**.py"

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ limitations under the License.

# Optimum Neuron

🤗 Optimum Neuron is the interface between the 🤗 Transformers library and AWS Accelerators including [AWS Trainium](https://aws.amazon.com/machine-learning/trainium/?nc1=h_ls) and [AWS Inferentia](https://aws.amazon.com/machine-learning/inferentia/?nc1=h_ls).
🤗 Optimum Neuron is the interface between the 🤗 Transformers library and AWS Accelerators including [AWS Trainium](https://aws.amazon.com/machine-learning/trainium/?nc1=h_ls) and [AWS Inferentia](https://aws.amazon.com/machine-learning/inferentia/?nc1=h_ls).
It provides a set of tools enabling easy model loading, training and inference on single- and multi-Accelerator settings for different downstream tasks.
The list of officially validated models and tasks is available [here](TODO:). Users can try other models and tasks with only few changes.

Expand Down Expand Up @@ -58,14 +58,14 @@ pip install -r requirements.txt
There are two main classes one needs to know:
- TrainiumArgumentParser: inherits the original [HfArgumentParser](https://huggingface.co/docs/transformers/main/en/internal/trainer_utils#transformers.HfArgumentParser) in Transformers with additional checks on the argument values to make sure that they will work well with AWS Trainium instances.
- [TrainiumTrainer](https://huggingface.co/docs/optimum/neuron/package_reference/trainer): this version trainer takes care of doing the proper checks and changes to the supported models to make them trainable on AWS Trainium instances.
- [NeuronTrainer](https://huggingface.co/docs/optimum/neuron/package_reference/trainer): this version trainer takes care of doing the proper checks and changes to the supported models to make them trainable on AWS Trainium instances.
The [TrainiumTrainer](https://huggingface.co/docs/optimum/neuron/package_reference/trainer) is very similar to the [🤗 Transformers Trainer](https://huggingface.co/docs/transformers/main_classes/trainer), and adapting a script using the Trainer to make it work with Trainium will mostly consist in simply swapping the Trainer class for the TrainiumTrainer one.
The [NeuronTrainer](https://huggingface.co/docs/optimum/neuron/package_reference/trainer) is very similar to the [🤗 Transformers Trainer](https://huggingface.co/docs/transformers/main_classes/trainer), and adapting a script using the Trainer to make it work with Trainium will mostly consist in simply swapping the Trainer class for the NeuronTrainer one.
That's how most of the [example scripts](https://github.com/huggingface/optimum-neuron/tree/main/examples) were adapted from their [original counterparts](https://github.com/huggingface/transformers/tree/main/examples/pytorch).
```diff
from transformers import TrainingArguments
+from optimum.neuron import TrainiumTrainer as Trainer
+from optimum.neuron import NeuronTrainer as Trainer
training_args = TrainingArguments(
# training arguments...
Expand Down
Binary file added docs/assets/guides/models/01-sd-image.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 4 additions & 2 deletions docs/source/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,17 +17,19 @@
- local: guides/setup_aws_instance
title: Set up AWS Trainium instance
- local: guides/cache_system
title: Trainium model cache
title: Neuron model cache
- local: guides/fine_tune
title: Fine-tune Transformers with AWS Trainium
- local: guides/export_model
title: Export a model to Inferentia
- local: guides/models
title: Neuron models for inference
- local: guides/pipelines
title: Inference pipelines with AWS Neuron
title: How-To Guides
- sections:
- local: package_reference/trainer
title: Trainium Trainer
title: Neuron Trainer
- local: package_reference/export
title: Inferentia Exporter
- local: package_reference/configuration
Expand Down
Loading

0 comments on commit a5cdcdf

Please sign in to comment.