Skip to content
This repository has been archived by the owner on May 28, 2024. It is now read-only.

Commit

Permalink
KubeRay support (#111)
Browse files Browse the repository at this point in the history
<img width="1253" alt="Screen Shot 2023-06-05 at 4 35 41 PM"
src="https://github.com/anyscale/aviary/assets/20109646/9e71db45-dd3b-4fb8-88f8-2ec28a78ae6e">


Follow up:
* Autoscaler: The current image is missing some dependencies for KubeRay
autoscaling.
* Frontend: The frontend cannot be launched directly due to some
dependency issues (e.g. `gradio`, `pymongo`, `boto3`...).

---------

Co-authored-by: Antoni Baum <[email protected]>
  • Loading branch information
kevin85421 and Yard1 authored Jun 7, 2023
1 parent 4c6b7ce commit 5084d32
Show file tree
Hide file tree
Showing 3 changed files with 244 additions and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -213,7 +213,7 @@ The Gradio app is served using [Ray Serve](https://docs.ray.io/en/latest/serve/i
To run the Aviary frontend locally, you need to set the following environment variable:

```shell
export AVIARY_HOSTNAME=<hostname of the backend, eg. 'http://localhost:8000'>
export AVIARY_URL=<hostname of the backend, eg. 'http://localhost:8000'>
```

Once you have set these environment variables, you can run the frontend with the
Expand Down
173 changes: 173 additions & 0 deletions deploy/kuberay/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
# Deploy Aviary on Amazon EKS using KubeRay
* Note that this document will be extended to include Ray autoscaling and the deployment of multiple models in the near future.

## Step 1: Create a Kubernetes cluster on Amazon EKS

Follow the first two steps in this [AWS documentation](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-console.html#)
to: (1) Create your Amazon EKS cluster (2) Configure your computer to communicate with your cluster.

## Step 2: Create node groups for the Amazon EKS cluster

You can follow "Step 3: Create nodes" in this [AWS documentation](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-console.html#) to create node groups. The following section provides more detailed information.

### Create a CPU node group

Create a CPU node group for all Pods except Ray GPU workers, such as KubeRay operator, Ray head, and CoreDNS Pods.

* Create a CPU node group
* Instance type: [**m5.xlarge**](https://aws.amazon.com/ec2/instance-types/m5/) (4 vCPU; 16 GB RAM)
* Disk size: 256 GB
* Desired size: 1, Min size: 0, Max size: 1

### Create a GPU node group

Create a GPU node group for Ray GPU workers.

* Create a GPU node group
* Add a Kubernetes taint to prevent CPU Pods from being scheduled on this GPU node group
* Key: ray.io/node-type, Value: worker, Effect: NoSchedule
* AMI type: Bottlerocket NVIDIA (BOTTLEROCKET_x86_64_NVIDIA)
* Instance type: [**g5.12xlarge**](https://aws.amazon.com/ec2/instance-types/g5/) (4 GPU; 96 GB GPU Memory; 48 vCPUs; 192 GB RAM)
* Disk size: 1024 GB
* Desired size: 1, Min size: 0, Max size: 1

Because this tutorial is for deploying 1 LLM, the maximum size of this GPU node group is 1.
If you want to deploy multiple LLMs in this cluster, you may need to increase the value of the max size.

**Warning: GPU nodes are extremely expensive. Please remember to delete the cluster if you no longer need it.**

## Step 3: Verify the node groups

If you encounter permission issues with `eksctl`, you can navigate to your AWS account's webpage and copy the
credential environment variables, including `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, and `AWS_SESSION_TOKEN`,
from the "Command line or programmatic access" page.

```sh
eksctl get nodegroup --cluster ${YOUR_EKS_NAME}

# CLUSTER NODEGROUP STATUS CREATED MIN SIZE MAX SIZE DESIRED CAPACITY INSTANCE TYPE IMAGE ID ASG NAME TYPE
# ${YOUR_EKS_NAME} cpu-node-group ACTIVE 2023-06-05T21:31:49Z 0 1 1 m5.xlarge AL2_x86_64 eks-cpu-node-group-... managed
# ${YOUR_EKS_NAME} gpu-node-group ACTIVE 2023-06-05T22:01:44Z 0 1 1 g5.12xlarge BOTTLEROCKET_x86_64_NVIDIA eks-gpu-node-group-... managed
```

## Step 4: Install the DaemonSet for NVIDIA device plugin for Kubernetes

If you encounter permission issues with `kubectl`, you can follow "Step 2: Configure your computer to communicate with your cluster"
in the [AWS documentation](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-console.html#).

You can refer to the [Amazon EKS optimized accelerated Amazon Linux AMIs](https://docs.aws.amazon.com/eks/latest/userguide/eks-optimized-ami.html#gpu-ami)
or [NVIDIA/k8s-device-plugin](https://github.com/NVIDIA/k8s-device-plugin) repository for more details.

```sh
# Install the DaemonSet
kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.9.0/nvidia-device-plugin.yml

# Verify that your nodes have allocatable GPUs
kubectl get nodes "-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu"

# Example output:
# NAME GPU
# ip-....us-west-2.compute.internal 4
# ip-....us-west-2.compute.internal <none>
```

## Step 5: Install a KubeRay operator

```sh
# Install both CRDs and KubeRay operator v0.5.0.
helm repo add kuberay https://ray-project.github.io/kuberay-helm/
helm install kuberay-operator kuberay/kuberay-operator --version 0.5.0

# It should be scheduled on the CPU node. If it is not, something is wrong.
```

## Step 6: Create a RayCluster with Aviary

```sh
# path: deploy/kuberay
kubectl apply -f kuberay.yaml
```

Something is worth noticing:
* The `tolerations` for workers must match the taints on the GPU node group.
```yaml
# Please add the following taints to the GPU node.
tolerations:
- key: "ray.io/node-type"
operator: "Equal"
value: "worker"
effect: "NoSchedule"
```
* Update `rayStartParams.resources` for Ray scheduling. The `mosaicml--mpt-7b-chat.yaml` file uses both `accelerator_type_cpu` and `accelerator_type_a10`.
```yaml
# Ray head: The Ray head has a Pod resource limit of 2 CPUs.
rayStartParams:
resources: '"{\"accelerator_type_cpu\": 2}"'
# Ray workers: The Ray worker has a Pod resource limit of 48 CPUs and 4 GPUs.
rayStartParams:
resources: '"{\"accelerator_type_cpu\": 48, \"accelerator_type_a10\": 4}"'
```

## Step 7: Deploy a LLM model with Aviary

```sh
# Step 7.1: Log in to the head Pod
export HEAD_POD=$(kubectl get pods --selector=ray.io/node-type=head -o custom-columns=POD:metadata.name --no-headers)
kubectl exec -it $HEAD_POD -- bash
# Step 7.2: Deploy a `mosaicml/mpt-7b-chat` model
aviary run --model ./models/mosaicml--mpt-7b-chat.yaml

# Step 7.3: Check the Serve application status
serve status

# [Example output]
# name: default
# app_status:
# status: RUNNING
# message: ''
# deployment_timestamp: 1686006910.9571936
# deployment_statuses:
# - name: default_mosaicml--mpt-7b-chat
# status: HEALTHY
# message: ''
# - name: default_RouterDeployment
# status: HEALTHY
# message: ''

# Step 7.4: List all models
export AVIARY_URL="http://localhost:8000"
aviary models

# [Example output]
# Connecting to Aviary backend at: http://localhost:8000/
# mosaicml/mpt-7b-chat

# Step 7.5: Send a query to `mosaicml/mpt-7b-chat`.
aviary query --model mosaicml/mpt-7b-chat --prompt "What are the top 5 most popular programming languages?"

# [Example output]
# Connecting to Aviary backend at: http://localhost:8000/
# mosaicml/mpt-7b-chat:
# 1. Python
# 2. Java
# 3. JavaScript
# 4. C++
# 5. C#
```

## Step 8: Clean up resources

**Warning: GPU nodes are extremely expensive. Please remember to delete the cluster if you no longer need it.**

```sh
# Step 8.1: Delete the RayCluster
# path: deploy/kuberay
kubectl apply -f kuberay.yaml

# Step 8.2: Uninstall the KubeRay operator chart
helm uninstall kuberay-operator

# Step 8.3: Delete the Amazon EKS cluster via AWS Web UI
```
70 changes: 70 additions & 0 deletions deploy/kuberay/kuberay.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
apiVersion: ray.io/v1alpha1
kind: RayCluster
metadata:
labels:
controller-tools.k8s.io: "1.0"
name: aviary
spec:
rayVersion: '2.4.0' # should match the Ray version in the image of the containers
# Ray head pod template
headGroupSpec:
# The `rayStartParams` are used to configure the `ray start` command.
# See https://github.com/ray-project/kuberay/blob/master/docs/guidance/rayStartParams.md for the default settings of `rayStartParams` in KubeRay.
# See https://docs.ray.io/en/latest/cluster/cli.html#ray-start for all available options in `rayStartParams`.
rayStartParams:
resources: '"{\"accelerator_type_cpu\": 2}"'
dashboard-host: '0.0.0.0'
#pod template
template:
spec:
containers:
- name: ray-head
image: anyscale/aviary:latest
resources:
limits:
cpu: 2
memory: 8Gi
requests:
cpu: 2
memory: 8Gi
ports:
- containerPort: 6379
name: gcs-server
- containerPort: 8265 # Ray dashboard
name: dashboard
- containerPort: 10001
name: client
workerGroupSpecs:
# the pod replicas in this group typed worker
- replicas: 1
minReplicas: 0
maxReplicas: 1
# logical group name, for this called small-group, also can be functional
groupName: gpu-group
rayStartParams:
resources: '"{\"accelerator_type_cpu\": 48, \"accelerator_type_a10\": 4}"'
#pod template
template:
spec:
containers:
- name: llm
image: anyscale/aviary:latest
lifecycle:
preStop:
exec:
command: ["/bin/sh","-c","ray stop"]
resources:
limits:
cpu: "48"
memory: "192G"
nvidia.com/gpu: 4
requests:
cpu: "36"
memory: "128G"
nvidia.com/gpu: 4
# Please add the following taints to the GPU node.
tolerations:
- key: "ray.io/node-type"
operator: "Equal"
value: "worker"
effect: "NoSchedule"

0 comments on commit 5084d32

Please sign in to comment.