Skip to content

On-demand Kubernetes/OpenShift cluster scaling and aggregated resource provisioning

License

Notifications You must be signed in to change notification settings

project-codeflare/instascale

Repository files navigation

InstaScale

Go

InstaScale is a controller that works with Multi-cluster-app-dispatcher (MCAD) to get aggregated resources available in the kubernetes cluster without creating pending pods. It uses machinesets to launch instances on cloud provider to be added to the Kubernetes cluster.

Key features:

  • Acquires aggregated heterogenous instances needed for workload execution.
  • Does not clog Kubernetes control plane.
  • Works with your Kubernetes scheduling system to schedule pods on aggregated resources.
  • Terminates instances on workload completion.

InstaScale and MCAD interaction

  • User submits Multi GPU job(s)
  • Job(s) lands in MCAD queue
  • When resources are not available it triggers scaling i.e. calls InstaScale
  • InstaScale looks at resource requests specified by the user and matches those with the desired Machineset(s) to get nodes.
  • After InstaScal-ing, when aggregate resources are available to run the job MCAD dispatches the job.
  • When job completes, resources obtained for the job are released.

Development

Pre-requisites

  • Installed Go version 1.19
  • Running OpenShift cluster

Building

  • To build locally : make build
  • To run locally : make run

Image creation

  • To build and release a docker image for controller : make IMG=quay.io/project-codeflare/instascale:<TAG> image-build image-push
  • Note that the other contents of the Makefile (as well as the config and bin dirs) exist for future operator development, and are not currently utilized

Deployment

  • Deploy InstaScale (latest) using: make deploy

  • Optionally, to deploy a custom image of InstaScale you can use the custom-deploy make target to build, push, and deploy your image of InstaScale on your Kubernetes cluster:

make custom-deploy ENGINE=<podman or docker> IMG=quay.io/<username>/instascale:<image tag>

Note: This assumes you are logged into your quay.io account on your local machine, and your kubeconfig is pointing to the cluster you want to deploy InstaScale on.

Running an InstaScale deployment locally with Visual Studio Code

  • Deploy MCAD using steps here.

  • In Visual Studio Code update .vscode/launch.json so that "KUBECONFIG" points to your Kubernetes config file.

  • If you changed the namespace in config/default/kustomization.yaml update the args[] in launch.json to include "--configs-namespace=<YOUR_NAMESPACE>", "--ocm-secret-namespace=<YOUR_NAMESPACE>".

  • You can now run the local deployment with the debugger.

Running locally with a OSD cluster

Running InstaScale locally to an OSD cluster requires extra steps from the above.

  • Add the instascale-ocm-secret
    • Get your API token from here
    • Navigate to Workloads -> secrets
    • Select your project to instascale-system
    • Click Create -> Key/value secret
    • Secret name: instascale-ocm-secret
    • Key: token
    • Value: <YOUR_API_TOKEN>
    • Click Create

Scaling Machines with a Self-Managed OCP Cluster using AWS

To scale machines of a certain type you need to create a MachineSet by following this guide here.

  • On your Cluster Dashboard go to Compute -> Create MachineSet.
  • Paste in your new MachineSet you created based off of the guide and click Create.
  • Your MachineSet should now appear.
  • Attempt to scale machines of the same machine type as your MachineSet template using InstaScale.
  • The MachineSet replicas should increase by the number of replicas you have specified.

Testing

Run tests with command:

go test -v ./controllers/

Release process

Prerequisite:

  1. Run instascale-release.yml action.
  2. Verify that instascale-release.yml action passed successfully.