Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for TopologySpreadConstraints #546

Merged
merged 1 commit into from
Jun 17, 2024

Conversation

davidspek
Copy link
Contributor

Description

This PR adds support for using Pod Topology Spread Constraints. This is useful when trying to distribute the scheduling of pods evenly across failure domains such as nodes or availability zones. In many situations this cannot be done with simple Affinity controls.

Related Issue(s)

Please list any related issues and link them here.

Checklist

For operator, please complete the following checklist:

  • run make generate to generate the code.
  • run golangci-lint run to check the code style.
  • run make test to run UT.
  • run make manifests to update the yaml files of CRD.

For helm chart, please complete the following checklist:

  • make sure you have updated the values.yaml
    file of starrocks chart.
  • In scripts directory, run bash create-parent-chart-values.sh to update the values.yaml file of the parent
    chart( kube-starrocks chart).

@CLAassistant
Copy link

CLAassistant commented Jun 14, 2024

CLA assistant check
All committers have signed the CLA.

@yandongxiao yandongxiao self-requested a review June 17, 2024 06:57
Copy link
Collaborator

@yandongxiao yandongxiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your PR!

@@ -265,6 +265,7 @@ func Spec(spec v1.SpecInterface, container corev1.Container, volumes []corev1.Vo
ServiceAccountName: spec.GetServiceAccount(),
TerminationGracePeriodSeconds: spec.GetTerminationGracePeriodSeconds(),
Affinity: spec.GetAffinity(),
TopologySpreadConstraints: spec.GetTopologySpreadConstraints(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a newer feature on k8s. If the user is using an older version of k8s, such as version 1.18, will it cause deployment to fail?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have verified it.

# deployments.apps "kube-starrocks-operator" was not valid:
# * patch: Invalid value: "map[spec:map[aaaa:[]]]": strict decoding error: unknown field "spec.aaaa"```

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TopologySpreadContraints was promoted to beta in Kubernetes 1.18 as far as I can find so quickly. Which was released 4 years ago, or almost half the life of Kubernetes :). I think most projects have given up on supporting such old versions of Kubernetes since it can prevent projects from moving forwards with features. It might be possible to have this feature only get enabled after doing a Kubernetes version check, but I'm not familiar with the code enough to do that quickly, nor am I sure it's worth the effort for maintaining support for Kubernetes versions that people shouldn't have been using anymore for years now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://kubernetes.io/blog/2020/05/introducing-podtopologyspread/
From this blog, topologySpreadConstraints was introduced in 1.18, which is the oldest version starrocks operator support.

@yandongxiao
Copy link
Collaborator

‌‌Could you share the scenarios in which you use TopologySpreadConstraints?

@davidspek
Copy link
Contributor Author

davidspek commented Jun 17, 2024

You need TopologySpreadConstraints if you want to evenly distribute pods across availability zones in scenarios where you might have more pods than the amount of availability zones. This is especially important since in most cloud providers the persistent disks are bound to availability zones. Meaning that if on the first time the pods get scheduled they aren't spread across all the availability zones you will not be able to correct that later (without serious manual effort). A pod anti-affinity rule can work, but only works if the amount of zones is equal to the amount of pods you will ever want to schedule. The motivation section of the docs gives a more complete overview of why this was added and why it can be such an important feature (especially when dealing with persistence and availability zone bound disks).

@yandongxiao yandongxiao merged commit 3017f55 into StarRocks:main Jun 17, 2024
6 checks passed
@Hatlassian
Copy link

This is awesome, thanks for this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants