This repository has been archived by the owner on May 28, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 93
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
<img width="1253" alt="Screen Shot 2023-06-05 at 4 35 41 PM" src="https://github.com/anyscale/aviary/assets/20109646/9e71db45-dd3b-4fb8-88f8-2ec28a78ae6e"> Follow up: * Autoscaler: The current image is missing some dependencies for KubeRay autoscaling. * Frontend: The frontend cannot be launched directly due to some dependency issues (e.g. `gradio`, `pymongo`, `boto3`...). --------- Co-authored-by: Antoni Baum <[email protected]>
- Loading branch information
1 parent
4c6b7ce
commit 5084d32
Showing
3 changed files
with
244 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,173 @@ | ||
# Deploy Aviary on Amazon EKS using KubeRay | ||
* Note that this document will be extended to include Ray autoscaling and the deployment of multiple models in the near future. | ||
|
||
## Step 1: Create a Kubernetes cluster on Amazon EKS | ||
|
||
Follow the first two steps in this [AWS documentation](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-console.html#) | ||
to: (1) Create your Amazon EKS cluster (2) Configure your computer to communicate with your cluster. | ||
|
||
## Step 2: Create node groups for the Amazon EKS cluster | ||
|
||
You can follow "Step 3: Create nodes" in this [AWS documentation](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-console.html#) to create node groups. The following section provides more detailed information. | ||
|
||
### Create a CPU node group | ||
|
||
Create a CPU node group for all Pods except Ray GPU workers, such as KubeRay operator, Ray head, and CoreDNS Pods. | ||
|
||
* Create a CPU node group | ||
* Instance type: [**m5.xlarge**](https://aws.amazon.com/ec2/instance-types/m5/) (4 vCPU; 16 GB RAM) | ||
* Disk size: 256 GB | ||
* Desired size: 1, Min size: 0, Max size: 1 | ||
|
||
### Create a GPU node group | ||
|
||
Create a GPU node group for Ray GPU workers. | ||
|
||
* Create a GPU node group | ||
* Add a Kubernetes taint to prevent CPU Pods from being scheduled on this GPU node group | ||
* Key: ray.io/node-type, Value: worker, Effect: NoSchedule | ||
* AMI type: Bottlerocket NVIDIA (BOTTLEROCKET_x86_64_NVIDIA) | ||
* Instance type: [**g5.12xlarge**](https://aws.amazon.com/ec2/instance-types/g5/) (4 GPU; 96 GB GPU Memory; 48 vCPUs; 192 GB RAM) | ||
* Disk size: 1024 GB | ||
* Desired size: 1, Min size: 0, Max size: 1 | ||
|
||
Because this tutorial is for deploying 1 LLM, the maximum size of this GPU node group is 1. | ||
If you want to deploy multiple LLMs in this cluster, you may need to increase the value of the max size. | ||
|
||
**Warning: GPU nodes are extremely expensive. Please remember to delete the cluster if you no longer need it.** | ||
|
||
## Step 3: Verify the node groups | ||
|
||
If you encounter permission issues with `eksctl`, you can navigate to your AWS account's webpage and copy the | ||
credential environment variables, including `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, and `AWS_SESSION_TOKEN`, | ||
from the "Command line or programmatic access" page. | ||
|
||
```sh | ||
eksctl get nodegroup --cluster ${YOUR_EKS_NAME} | ||
|
||
# CLUSTER NODEGROUP STATUS CREATED MIN SIZE MAX SIZE DESIRED CAPACITY INSTANCE TYPE IMAGE ID ASG NAME TYPE | ||
# ${YOUR_EKS_NAME} cpu-node-group ACTIVE 2023-06-05T21:31:49Z 0 1 1 m5.xlarge AL2_x86_64 eks-cpu-node-group-... managed | ||
# ${YOUR_EKS_NAME} gpu-node-group ACTIVE 2023-06-05T22:01:44Z 0 1 1 g5.12xlarge BOTTLEROCKET_x86_64_NVIDIA eks-gpu-node-group-... managed | ||
``` | ||
|
||
## Step 4: Install the DaemonSet for NVIDIA device plugin for Kubernetes | ||
|
||
If you encounter permission issues with `kubectl`, you can follow "Step 2: Configure your computer to communicate with your cluster" | ||
in the [AWS documentation](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-console.html#). | ||
|
||
You can refer to the [Amazon EKS optimized accelerated Amazon Linux AMIs](https://docs.aws.amazon.com/eks/latest/userguide/eks-optimized-ami.html#gpu-ami) | ||
or [NVIDIA/k8s-device-plugin](https://github.com/NVIDIA/k8s-device-plugin) repository for more details. | ||
|
||
```sh | ||
# Install the DaemonSet | ||
kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.9.0/nvidia-device-plugin.yml | ||
|
||
# Verify that your nodes have allocatable GPUs | ||
kubectl get nodes "-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu" | ||
|
||
# Example output: | ||
# NAME GPU | ||
# ip-....us-west-2.compute.internal 4 | ||
# ip-....us-west-2.compute.internal <none> | ||
``` | ||
|
||
## Step 5: Install a KubeRay operator | ||
|
||
```sh | ||
# Install both CRDs and KubeRay operator v0.5.0. | ||
helm repo add kuberay https://ray-project.github.io/kuberay-helm/ | ||
helm install kuberay-operator kuberay/kuberay-operator --version 0.5.0 | ||
|
||
# It should be scheduled on the CPU node. If it is not, something is wrong. | ||
``` | ||
|
||
## Step 6: Create a RayCluster with Aviary | ||
|
||
```sh | ||
# path: deploy/kuberay | ||
kubectl apply -f kuberay.yaml | ||
``` | ||
|
||
Something is worth noticing: | ||
* The `tolerations` for workers must match the taints on the GPU node group. | ||
```yaml | ||
# Please add the following taints to the GPU node. | ||
tolerations: | ||
- key: "ray.io/node-type" | ||
operator: "Equal" | ||
value: "worker" | ||
effect: "NoSchedule" | ||
``` | ||
* Update `rayStartParams.resources` for Ray scheduling. The `mosaicml--mpt-7b-chat.yaml` file uses both `accelerator_type_cpu` and `accelerator_type_a10`. | ||
```yaml | ||
# Ray head: The Ray head has a Pod resource limit of 2 CPUs. | ||
rayStartParams: | ||
resources: '"{\"accelerator_type_cpu\": 2}"' | ||
# Ray workers: The Ray worker has a Pod resource limit of 48 CPUs and 4 GPUs. | ||
rayStartParams: | ||
resources: '"{\"accelerator_type_cpu\": 48, \"accelerator_type_a10\": 4}"' | ||
``` | ||
|
||
## Step 7: Deploy a LLM model with Aviary | ||
|
||
```sh | ||
# Step 7.1: Log in to the head Pod | ||
export HEAD_POD=$(kubectl get pods --selector=ray.io/node-type=head -o custom-columns=POD:metadata.name --no-headers) | ||
kubectl exec -it $HEAD_POD -- bash | ||
# Step 7.2: Deploy a `mosaicml/mpt-7b-chat` model | ||
aviary run --model ./models/mosaicml--mpt-7b-chat.yaml | ||
|
||
# Step 7.3: Check the Serve application status | ||
serve status | ||
|
||
# [Example output] | ||
# name: default | ||
# app_status: | ||
# status: RUNNING | ||
# message: '' | ||
# deployment_timestamp: 1686006910.9571936 | ||
# deployment_statuses: | ||
# - name: default_mosaicml--mpt-7b-chat | ||
# status: HEALTHY | ||
# message: '' | ||
# - name: default_RouterDeployment | ||
# status: HEALTHY | ||
# message: '' | ||
|
||
# Step 7.4: List all models | ||
export AVIARY_URL="http://localhost:8000" | ||
aviary models | ||
|
||
# [Example output] | ||
# Connecting to Aviary backend at: http://localhost:8000/ | ||
# mosaicml/mpt-7b-chat | ||
|
||
# Step 7.5: Send a query to `mosaicml/mpt-7b-chat`. | ||
aviary query --model mosaicml/mpt-7b-chat --prompt "What are the top 5 most popular programming languages?" | ||
|
||
# [Example output] | ||
# Connecting to Aviary backend at: http://localhost:8000/ | ||
# mosaicml/mpt-7b-chat: | ||
# 1. Python | ||
# 2. Java | ||
# 3. JavaScript | ||
# 4. C++ | ||
# 5. C# | ||
``` | ||
|
||
## Step 8: Clean up resources | ||
|
||
**Warning: GPU nodes are extremely expensive. Please remember to delete the cluster if you no longer need it.** | ||
|
||
```sh | ||
# Step 8.1: Delete the RayCluster | ||
# path: deploy/kuberay | ||
kubectl apply -f kuberay.yaml | ||
|
||
# Step 8.2: Uninstall the KubeRay operator chart | ||
helm uninstall kuberay-operator | ||
|
||
# Step 8.3: Delete the Amazon EKS cluster via AWS Web UI | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
apiVersion: ray.io/v1alpha1 | ||
kind: RayCluster | ||
metadata: | ||
labels: | ||
controller-tools.k8s.io: "1.0" | ||
name: aviary | ||
spec: | ||
rayVersion: '2.4.0' # should match the Ray version in the image of the containers | ||
# Ray head pod template | ||
headGroupSpec: | ||
# The `rayStartParams` are used to configure the `ray start` command. | ||
# See https://github.com/ray-project/kuberay/blob/master/docs/guidance/rayStartParams.md for the default settings of `rayStartParams` in KubeRay. | ||
# See https://docs.ray.io/en/latest/cluster/cli.html#ray-start for all available options in `rayStartParams`. | ||
rayStartParams: | ||
resources: '"{\"accelerator_type_cpu\": 2}"' | ||
dashboard-host: '0.0.0.0' | ||
#pod template | ||
template: | ||
spec: | ||
containers: | ||
- name: ray-head | ||
image: anyscale/aviary:latest | ||
resources: | ||
limits: | ||
cpu: 2 | ||
memory: 8Gi | ||
requests: | ||
cpu: 2 | ||
memory: 8Gi | ||
ports: | ||
- containerPort: 6379 | ||
name: gcs-server | ||
- containerPort: 8265 # Ray dashboard | ||
name: dashboard | ||
- containerPort: 10001 | ||
name: client | ||
workerGroupSpecs: | ||
# the pod replicas in this group typed worker | ||
- replicas: 1 | ||
minReplicas: 0 | ||
maxReplicas: 1 | ||
# logical group name, for this called small-group, also can be functional | ||
groupName: gpu-group | ||
rayStartParams: | ||
resources: '"{\"accelerator_type_cpu\": 48, \"accelerator_type_a10\": 4}"' | ||
#pod template | ||
template: | ||
spec: | ||
containers: | ||
- name: llm | ||
image: anyscale/aviary:latest | ||
lifecycle: | ||
preStop: | ||
exec: | ||
command: ["/bin/sh","-c","ray stop"] | ||
resources: | ||
limits: | ||
cpu: "48" | ||
memory: "192G" | ||
nvidia.com/gpu: 4 | ||
requests: | ||
cpu: "36" | ||
memory: "128G" | ||
nvidia.com/gpu: 4 | ||
# Please add the following taints to the GPU node. | ||
tolerations: | ||
- key: "ray.io/node-type" | ||
operator: "Equal" | ||
value: "worker" | ||
effect: "NoSchedule" |