Skip to content

Development notes ‐ operator‐sdk version

Scott Trent edited this page Aug 26, 2024 · 24 revisions

Random notes on developing for SusQL Operator -- recent operator sdk version

(ALWAYS UPDATING!!!!!!)

Changes to the main branch automatically rebuild and push the container image via github action, but non-main branches or forked repos need to be hand-built and pushed to a developer specific location to avoid overwriting the official images.

Sample steps to build and push bundle and container images

Tips:

  • As of August 26, 2024, it is recommended to locally install operator-sdk version 1.36.1.
  • Set the default namespace to a known working location, such as oc project default.
  • To force a newly updated image to be used, bump up the version count in the VERSION file.
  • As desired, export CONTAINER_TOOL to docker or podman before building. (Default is docker.)
  • If an official SusQL operator is installed on the cluster, be sure to uninstall it first.
export BUNDLE_IMG="REPOSITORYURL/REPOSITORYNAME/susql-controller:v$(cat VERSION)"
export IMG=REGISTRYURL/REPOSITORYNAME/susql-controller
export IMAGE_TAG_BASE=${IMG}
export CONTAINER_TOOL=podman
podman login
make all
make bundle-build bundle-push
make operator-build operator-push

Trivial early sanity testing

make test
make run

(Use control-c to terminate make run)

Deploy to cluster

  • Log in to cluster on the command line using command from "Copy login command" on upper right corner of OpenShift web console
  • Be sure to remove previously installed SusQL operators.
  • operator-sdk cleanup susql-operator
  • operator-sdk run bundle ${BUNDLE_IMG}

Simple functional verification

cd susql-operator/test
oc create -f labelgroups.yaml
oc create -f training-job-1.yaml
oc create -f training-job-2.yaml

bash labelgroups.sh
sleep 10
bash labelgroups.sh

# remove test artifacts on completion
oc delete -f training-job-2.yaml
oc delete -f training-job-1.yaml
oc delete -f labelgroups.yaml

Troubleshooting

  • Make sure user monitoring is set up correctly. (Messing with label settings can be an unsupported action, consider verification within a newly created namespace...)
  • Is Kepler source correct?
  • Verify configuration displayed at install and run time
  • Double check that Kepler is functioning (e.g., expected output from OpenShift->Observe->Dashboards, etc)
  • Try looking at OpenShift->Observe->Metrics searches such as:
    • kepler_container_joules_total
    • kepler_container_joules_total{container_namespace="default"}
  • Standard Kepler troubleshooting: https://sustainable-computing.io/usage/trouble_shooting/
  • Look at SusQL controller pod log output

Depending on how the operator is installed it may be in one of the following namespaces:

  • susql-operator-system, openshift-operators, or default
oc project default
oc logs $( oc get pod | grep susql-operator | cut -f 1 -d" " )
  • Verify accessibility and contents of appropriate Prometheus databases.
  • The log level can be changed by editing zapcore.Level(-2) in cmd/main.go and recreating the container image. (Eventually, log level will be configurable.)
  • To allow CLI access to the SusQL container change the final tag in the gcr.io line in the Dockerfile from :nonroot to :debug. For example: FROM gcr.io/distroless/static:debug. This will include busybox and allow entry into the container by specifying an entrypoint of sh. (If you really want to use a full path it would be /busybox/sh.)
  • With a debug container in place modify the containers.command spec in config/manager/manager.yaml to replace - /manager with - /debug-entrypoint.sh