Skip to content

Commit

Permalink
feat: allow custom compute pod affinities, nodeSelector and toleratio…
Browse files Browse the repository at this point in the history
…ns (#935)

* feat: create env var `POD_NAME` and `COMPUTE_POD_AFFINITY` to replace dynamically generated pod affinity for compute pod

Signed-off-by: Guilhem Barthés <[email protected]>

* wip: use affinity from env var

Signed-off-by: Guilhem Barthés <[email protected]>

* chore: affinite and access mode to values

Signed-off-by: ThibaultFy <[email protected]>

* chore: add node selector and tolerations for compute pod

Signed-off-by: ThibaultFy <[email protected]>

* chore: typo yaml

Signed-off-by: ThibaultFy <[email protected]>

* chore: toYaml

Signed-off-by: ThibaultFy <[email protected]>

* chore: toYaml

Signed-off-by: ThibaultFy <[email protected]>

* chore: podname to hostname

Signed-off-by: ThibaultFy <[email protected]>

* Update charts/substra-backend/values.yaml

Co-authored-by: Guilhem Barthés <[email protected]>
Signed-off-by: ThibaultFy <[email protected]>

* chore: change access mode

Signed-off-by: ThibaultFy <[email protected]>

* chore: alpha release

Signed-off-by: ThibaultFy <[email protected]>

* chore: chart doc

Signed-off-by: ThibaultFy <[email protected]>

* chore: access modes

Signed-off-by: ThibaultFy <[email protected]>

* chore: pass persistence.storageClass to backend chart

Signed-off-by: ThibaultFy <[email protected]>

* chore(dev): remove storageClassName

Signed-off-by: ThibaultFy <[email protected]>

* chore: changelog

Signed-off-by: ThibaultFy <[email protected]>

* chore: add logging

Signed-off-by: ThibaultFy <[email protected]>

* chore: more logs

Signed-off-by: ThibaultFy <[email protected]>

* chore: more logs

Signed-off-by: ThibaultFy <[email protected]>

* chore(dev): bump to alpha.4 for more logs

Signed-off-by: ThibaultFy <[email protected]>

* chore: remove debug logging

Signed-off-by: Guilhem Barthés <[email protected]>

* feat: raise uncatched exceptions in `image_transfer/encoder.py::get_manifests_and_list_of_all_blobs`

Signed-off-by: Guilhem Barthés <[email protected]>

---------

Signed-off-by: Guilhem Barthés <[email protected]>
Signed-off-by: ThibaultFy <[email protected]>
Signed-off-by: ThibaultFy <[email protected]>
Co-authored-by: Guilhem Barthés <[email protected]>
  • Loading branch information
ThibaultFy and guilhem-barthes authored Jul 23, 2024
1 parent c26f6cc commit 27621b7
Show file tree
Hide file tree
Showing 10 changed files with 113 additions and 78 deletions.
1 change: 1 addition & 0 deletions backend/image_transfer/encoder.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@ def get_manifests_and_list_of_all_blobs(
raise RegistryPreconditionFailedException(
f"{docker_image} is either not scanned yet or not passing the vulnerability checks."
) from e
raise e
manifests.append(manifest)
blobs_to_pull += blobs
return manifests, blobs_to_pull
Expand Down
2 changes: 2 additions & 0 deletions backend/substrapp/clients/organization.py
Original file line number Diff line number Diff line change
Expand Up @@ -184,7 +184,9 @@ def get(
) -> bytes:
"""Get asset data."""
content = _http_request(_Method.GET, channel, organization_id, url).content

new_checksum = compute_hash(content, key=salt)

if new_checksum != checksum:
raise IntegrityError(f"url {url}: checksum doesn't match {checksum} vs {new_checksum}")
return content
Expand Down
21 changes: 4 additions & 17 deletions backend/substrapp/compute_tasks/compute_pod.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

import kubernetes
import structlog
import yaml
from django.conf import settings

from substrapp.kubernetes_utils import delete_pod
Expand Down Expand Up @@ -120,22 +121,6 @@ def create_pod(
**container_optional_kwargs,
)

pod_affinity = kubernetes.client.V1Affinity(
pod_affinity=kubernetes.client.V1PodAffinity(
required_during_scheduling_ignored_during_execution=[
kubernetes.client.V1PodAffinityTerm(
label_selector=kubernetes.client.V1LabelSelector(
match_expressions=[
kubernetes.client.V1LabelSelectorRequirement(
key="statefulset.kubernetes.io/pod-name", operator="In", values=[os.getenv("HOSTNAME")]
)
]
),
topology_key="kubernetes.io/hostname",
)
]
)
)
image_pull_secret = os.getenv("DOCKER_CONFIG_SECRET_NAME")

if image_pull_secret:
Expand All @@ -144,7 +129,9 @@ def create_pod(
image_pull_secrets = None
spec = kubernetes.client.V1PodSpec(
restart_policy="Never",
affinity=pod_affinity,
affinity=yaml.safe_load(os.getenv("COMPUTE_POD_AFFINITY")),
node_selector=yaml.safe_load(os.getenv("COMPUTE_POD_NODE_SELECTOR")),
tolerations=yaml.safe_load(os.getenv("COMPUTE_POD_TOLERATIONS")),
containers=[container_compute],
volumes=volumes + gpu_volume,
security_context=get_pod_security_context(),
Expand Down
1 change: 0 additions & 1 deletion backend/substrapp/compute_tasks/image_builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,6 @@ def push_blob_to_registry(blob: bytes, tag: str) -> None:
def load_remote_function_image(function: orchestrator.Function, channel: str) -> None:
# Ask the backend owner of the function if it's available
container_image_tag = utils.container_image_tag_from_function(function)

function_image_content = organization_client.get(
channel=channel,
organization_id=function.owner,
Expand Down
5 changes: 5 additions & 0 deletions charts/substra-backend/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,11 @@
# Changelog

<!-- towncrier release notes start -->
## [26.9.0] - 2024-07-22

# Added

Configuration of compute pod `affinity`, `nodeSelector` and `toleration` on `values.yaml` file.

## [26.8.3] - 2024-07-16

Expand Down
4 changes: 2 additions & 2 deletions charts/substra-backend/Chart.yaml
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
apiVersion: v2
name: substra-backend
home: https://github.com/Substra
version: 26.8.3
appVersion: 0.47.0
version: "26.9.0"
appVersion: "0.47.0"
kubeVersion: ">= 1.19.0-0"
description: Main package for Substra
type: application
Expand Down
117 changes: 62 additions & 55 deletions charts/substra-backend/README.md

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions charts/substra-backend/changes/935.changed
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Compute pod `affinity`, `nodeSelector` and `tolerations` are now configured for environment variable defined in the `values.yaml` file.
12 changes: 11 additions & 1 deletion charts/substra-backend/templates/statefulset-worker.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,16 @@ spec:
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: COMPUTE_POD_AFFINITY
value: {{ toYaml .Values.worker.computePod.affinity | quote }}
- name: COMPUTE_POD_NODE_SELECTOR
value: {{ toYaml .Values.worker.computePod.nodeSelector | quote }}
- name: COMPUTE_POD_TOLERATIONS
value: {{ toYaml .Values.worker.computePod.tolerations | quote }}
- name: COMPUTE_POD_RESOURCES
value: {{ toYaml .Values.worker.computePod.resources | quote }}
- name: COMPUTE_POD_MAX_STARTUP_WAIT_SECONDS
Expand Down Expand Up @@ -231,7 +241,7 @@ spec:
- metadata:
name: subtuple
spec:
accessModes: [ "ReadWriteOnce" ]
accessModes: {{ .Values.worker.accessModes }}
{{ include "common.storage.class" .Values.worker.persistence }}
resources:
requests:
Expand Down
27 changes: 25 additions & 2 deletions charts/substra-backend/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -283,7 +283,7 @@ server:
##
honorLabels: false

## @section Substra worker settings
## @section Substra worker settings. Note that you can access the worker pod name using $(POD_NAME) and its node using $(NODE_NAME).
##
worker:
## @param worker.enabled Enable worker service
Expand Down Expand Up @@ -376,6 +376,27 @@ worker:
memory: "1Gi"
limits:
memory: "64Gi"
## @param worker.computePod.nodeSelector Node labels for pod assignment
##
nodeSelector: {}
## @param worker.computePod.tolerations Toleration labels for pod assignment
##
tolerations: []
## @param worker.computePod.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution[0].labelSelector.matchExpressions[0].key Pod affinity rule defnition.
## @param worker.computePod.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution[0].labelSelector.matchExpressions[0].operator Pod affinity rule defnition.
## @param worker.computePod.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution[0].labelSelector.matchExpressions[0].values Pod affinity rule defnition.
## @param worker.computePod.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution[0].topologyKey Pod affinity rule defnition.
##
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: statefulset.kubernetes.io/pod-name
operator: In
values:
- $(POD_NAME)
topologyKey: kubernetes.io/hostname
events:
## @param worker.events.enabled Enable event service
##
Expand Down Expand Up @@ -435,7 +456,9 @@ worker:
## If not set and create is true, a name is generated using the substra.fullname template
##
name: ""

## @param worker.accessModes Access modes for volume
##
accessModes: ["ReadWriteOnce"]
## @section Substra periodic tasks worker settings
##
schedulerWorker:
Expand Down

0 comments on commit 27621b7

Please sign in to comment.