Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set some default resource requests on the workspace pod #707

Merged
merged 14 commits into from
Oct 7, 2024
Merged

Conversation

EronWright
Copy link
Contributor

@EronWright EronWright commented Oct 4, 2024

Proposed changes

Implements good defaults for the workspace resource, using a "burstable" approach.
Since a workspace pod's utilization is bursty - with low resource usage during idle times and with high resource usage during deployment ops - the pod requests a small amount of resources (64mb, 100m) to be able to idle. A deployment op is able to use much more memory - all available memory on the host.

Users may customize the resources (e.g. to apply different requests and/or limits). For large/complex Pulumi apps, it might make sense to reserve more memory and/or use #694.

The agent takes some pains to stay within the requested amount, using a programmatic form of the GOMEMLIMIT environment variable. The agent detects the requested amount via the Downward API. We don't use GOMEMLIMIT to avoid propagating it to sub-processes, and because the format is a Kubernetes 'quantity'.

It was observed that zombies weren't being cleaned up, and this was leading to resource exhaustion. Fixed by using tini as the entrypoint process (PID 1).

Related issues (optional)

Closes #698

Comment on lines -37 to -41

// SecurityProfileBaselineDefaultImage is the default image used when the security profile is 'baseline'.
SecurityProfileBaselineDefaultImage = "pulumi/pulumi:latest"
// SecurityProfileRestrictedDefaultImage is the default image used when the security profile is 'restricted'.
SecurityProfileRestrictedDefaultImage = "pulumi/pulumi:latest-nonroot"
Copy link
Contributor Author

@EronWright EronWright Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rationale: moving these constants and the associated 'defaulting' logic to webhook/auto/v1alpha1 where, in the future, a true webhook would apply the defaults eagerly. This PR does NOT implement a webhook, it simply applies the defaults during reconciliation for simplicity.

See the latest in webhook scaffolding: kubernetes-sigs/kubebuilder#4150

Copy link
Contributor Author

@EronWright EronWright Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me say, the benefit of applying defaults eagerly (with a webhook) rather than lazily (during reconciliation) is stability; one may change the default later without affecting existing workloads. The implicit becomes explicit.

This comment was marked as outdated.

operator/cmd/main.go Outdated Show resolved Hide resolved
@EronWright EronWright added the impact/no-changelog-required This issue doesn't require a CHANGELOG update label Oct 4, 2024
Copy link
Contributor

@blampe blampe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love that we killed the zombies!

My gut says the SetMemoryLimit call feels premature -- after all , if we OOM it'll probably be due to npm and not our little agent. But it probably doesn't hurt. We should definitely pull some profiles if we observe a leak.

Comment on lines +38 to +43
// // SetupWorkspaceWebhookWithManager registers the webhook for Workspace in the manager.
// func SetupWorkspaceWebhookWithManager(mgr ctrl.Manager) error {
// return ctrl.NewWebhookManagedBy(mgr).For(&autov1alpha1.Workspace{}).
// WithDefaulter(&WorkspaceCustomDefaulter{}).
// Complete()
// }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused boilerplate or something to enable later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be used later when we implement a webhook, which I felt was overkill for this PR.

Dockerfile Show resolved Hide resolved
@EronWright
Copy link
Contributor Author

@blampe here's the article that convinced me to give the system a hint that we're trying to stay within the 'requests'.
https://weaviate.io/blog/gomemlimit-a-game-changer-for-high-memory-applications

@EronWright EronWright merged commit 2a67d4c into v2 Oct 7, 2024
8 of 9 checks passed
@EronWright EronWright deleted the issue-698 branch October 7, 2024 21:30
@blampe
Copy link
Contributor

blampe commented Oct 7, 2024

@blampe here's the article that convinced me to give the system a hint that we're trying to stay within the 'requests'. https://weaviate.io/blog/gomemlimit-a-game-changer-for-high-memory-applications

Right, rephrasing my earlier comment I don't think the agent falls into this high-memory category. It can run under 100MiB and handles one request at a time -- its heap should be pretty quiet :) Child processes will eat most of our memory, hence why it felt premature to me, but again it doesn't really matter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
impact/no-changelog-required This issue doesn't require a CHANGELOG update
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants