Skip to content

Commit

Permalink
Merge pull request #533 from bjwswang/charts
Browse files Browse the repository at this point in the history
feat: add kuberay operator and configure ray clusters in arcadia
  • Loading branch information
bjwswang authored Jan 10, 2024
2 parents 0dd9a1f + 66d1e50 commit ecccf39
Show file tree
Hide file tree
Showing 25 changed files with 51,116 additions and 3 deletions.
63 changes: 63 additions & 0 deletions config/samples/ray.io_v1_raycluster.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
apiVersion: ray.io/v1
kind: RayCluster
metadata:
name: raycluster-kuberay
namespace: kuberay-system
spec:
headGroupSpec:
rayStartParams:
dashboard-host: 0.0.0.0
template:
metadata:
labels:
app.kubernetes.io/instance: raycluster
app.kubernetes.io/name: kuberay
spec:
containers:
- image: kubeagi/ray-ml:2.9.0-py39-vllm
name: ray-head
resources:
limits:
cpu: "1"
memory: 2G
nvidia.com/gpu: 1
requests:
cpu: "1"
memory: 2G
nvidia.com/gpu: 1
volumeMounts:
- mountPath: /tmp/ray
name: log-volume
volumes:
- emptyDir: {}
name: log-volume
workerGroupSpecs:
- groupName: workergroup
replicas: 0
minReplicas: 0
maxReplicas: 5
rayStartParams: {}
template:
metadata:
labels:
app.kubernetes.io/instance: raycluster
app.kubernetes.io/name: kuberay
spec:
containers:
- image: kubeagi/ray-ml:2.9.0-py39-vllm
name: ray-worker
resources:
limits:
cpu: "1"
memory: 1G
nvidia.com/gpu: 1
requests:
cpu: "1"
memory: 1G
nvidia.com/gpu: 1
volumeMounts:
- mountPath: /tmp/ray
name: log-volume
volumes:
- emptyDir: {}
name: log-volume
2 changes: 1 addition & 1 deletion deploy/charts/arcadia/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ apiVersion: v2
name: arcadia
description: A Helm chart(KubeBB Component) for KubeAGI Arcadia
type: application
version: 0.2.8
version: 0.2.9
appVersion: "0.1.0"

keywords:
Expand Down
10 changes: 10 additions & 0 deletions deploy/charts/arcadia/templates/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,15 @@ data:
kind: Datasource
name: '{{ .Release.Name }}-minio'
namespace: '{{ .Release.Namespace }}'
{{- if gt (len .Values.ray.clusters) 0 }}
rayClusters:
{{- range .Values.ray.clusters }}
- name: {{ .name }}
headAddress: {{ .headAddress }}
pythonVersion: {{ .pythonVersion }}
dashboardHost: {{ .dashboardHost }}
{{- end }}
{{- end }}
gateway:
apiServer: 'http://{{ .Release.Name }}-fastchat.{{ .Release.Namespace }}.svc.cluster.local:8000/v1'
controller: 'http://{{ .Release.Name }}-fastchat.{{ .Release.Namespace }}.svc.cluster.local:21001'
Expand All @@ -18,6 +27,7 @@ data:
kind: VectorStore
name: '{{ .Release.Name }}-vectorstore'
namespace: '{{ .Release.Namespace }}'

#streamlit:
# image: 172.22.96.34/cluster_system/streamlit:v1.29.0
# ingressClassName: portal-ingress
Expand Down
14 changes: 12 additions & 2 deletions deploy/charts/arcadia/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ global:
# @param resources Resources to be used
controller:
loglevel: 3
image: kubeagi/arcadia:latest
image: kubeagi/arcadia:v0.1.0-20240110-0dd9a1f
imagePullPolicy: IfNotPresent
resources:
limits:
Expand All @@ -21,7 +21,7 @@ controller:
# @section graphql and bff server
# related project: https://github.com/kubeagi/arcadia/tree/main/apiserver
apiserver:
image: kubeagi/arcadia:latest
image: kubeagi/arcadia:v0.1.0-20240110-0dd9a1f
enableplayground: false
port: 8081
ingress:
Expand Down Expand Up @@ -146,3 +146,13 @@ postgresql:
primary:
initdb:
scriptsConfigMap: pg-init-data

# @section ray is a unified framework for scaling AI and Python applications.In kubeagi,we use ray for distributed inference
ray:
# clusters provided by ray
# For more information on cluster configurations,please refer to http://kubeagi.k8s.com.cn/docs/Configuration/DistributedInference/run-inference-using-ray
clusters:
- name: 3090-2-GPUs
headAddress: raycluster-kuberay-head-svc.kuberay-system.svc:6379
pythonVersion: 3.9.18
dashboardHost: raycluster-kuberay-head-svc.kuberay-system.svc:8265
22 changes: 22 additions & 0 deletions deploy/charts/kuberay-operator/.helmignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Patterns to ignore when building packages.
# This supports shell glob matching, relative path matching, and
# negation (prefixed with !). Only one pattern per line.
.DS_Store
# Common VCS dirs
.git/
.gitignore
.bzr/
.bzrignore
.hg/
.hgignore
.svn/
# Common backup files
*.swp
*.bak
*.tmp
*~
# Various IDEs
.project
.idea/
*.tmproj
.vscode/
6 changes: 6 additions & 0 deletions deploy/charts/kuberay-operator/Chart.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
apiVersion: v2
description: A Helm chart for Kubernetes
name: kuberay-operator
version: 1.0.0
icon: https://github.com/ray-project/ray/raw/master/doc/source/images/ray_header_logo.png
type: application
117 changes: 117 additions & 0 deletions deploy/charts/kuberay-operator/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# KubeRay Operator

This document provides instructions to install both CRDs (RayCluster, RayJob, RayService) and KubeRay operator with a Helm chart.

## Helm

Make sure the version of Helm is v3+. Currently, [existing CI tests](https://github.com/ray-project/kuberay/blob/master/.github/workflows/helm-lint.yaml) are based on Helm v3.4.1 and v3.9.4.

```sh
helm version
```

## Install CRDs and KubeRay operator

* Install a stable version via Helm repository (only supports KubeRay v0.4.0+)
```sh
helm repo add kuberay https://ray-project.github.io/kuberay-helm/

# Install both CRDs and KubeRay operator v1.0.0.
helm install kuberay-operator kuberay/kuberay-operator --version 1.0.0

# Check the KubeRay operator Pod in `default` namespace
kubectl get pods
# NAME READY STATUS RESTARTS AGE
# kuberay-operator-6fcbb94f64-mbfnr 1/1 Running 0 17s
```

* Install the nightly version
```sh
# Step1: Clone KubeRay repository
# Step2: Move to `helm-chart/kuberay-operator`
# Step3: Install KubeRay operator
helm install kuberay-operator .
```

* Install KubeRay operator without installing CRDs
* In some cases, the installation of the CRDs and the installation of the operator may require different levels of admin permissions, so these two installations could be handled as different steps by different roles.
* Use Helm's built-in `--skip-crds` flag to install the operator only. See [this document](https://helm.sh/docs/chart_best_practices/custom_resource_definitions/) for more details.
```sh
# Step 1: Install CRDs only (for cluster admin)
kubectl create -k "github.com/ray-project/kuberay/manifests/cluster-scope-resources?ref=v1.0.0&timeout=90s"
# Step 2: Install KubeRay operator only. (for developer)
helm install kuberay-operator kuberay/kuberay-operator --version 1.0.0 --skip-crds
```
## List the chart
To list the `my-release` deployment:
```sh
helm ls
# NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
# kuberay-operator default 1 2023-09-22 02:57:17.306616331 +0000 UTC deployed kuberay-operator-1.0.0
```
## Uninstall the Chart
```sh
# Uninstall the `kuberay-operator` release
helm uninstall kuberay-operator
# The operator Pod should be removed.
kubectl get pods
# No resources found in default namespace.
```
## Working with Argo CD
If you are using [Argo CD](https://argoproj.github.io) to manage the operator, you will encounter the issue which complains the CRDs too long. Same with [this issue](https://github.com/prometheus-operator/prometheus-operator/issues/4439).
The recommended solution is to split the operator into two Argo apps, such as:
* The first app just for installing the CRDs with `Replace=true` directly, snippet:
```yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: ray-operator-crds
spec:
project: default
source:
repoURL: https://github.com/ray-project/kuberay
targetRevision: v1.0.0-rc.0
path: helm-chart/kuberay-operator/crds
destination:
server: https://kubernetes.default.svc
syncPolicy:
syncOptions:
- Replace=true
...
```
* The second app that installs the Helm chart with `skipCrds=true` (new feature in Argo CD 2.3.0), snippet:
```yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: ray-operator
spec:
source:
repoURL: https://github.com/ray-project/kuberay
targetRevision: v1.0.0-rc.0
path: helm-chart/kuberay-operator
helm:
skipCrds: true
destination:
server: https://kubernetes.default.svc
namespace: ray-operator
syncPolicy:
syncOptions:
- CreateNamespace=true
...
```
Loading

0 comments on commit ecccf39

Please sign in to comment.