Set some default resource requests on the workspace pod #707

EronWright · 2024-10-04T21:02:26Z

Proposed changes

Implements good defaults for the workspace resource, using a "burstable" approach.
Since a workspace pod's utilization is bursty - with low resource usage during idle times and with high resource usage during deployment ops - the pod requests a small amount of resources (64mb, 100m) to be able to idle. A deployment op is able to use much more memory - all available memory on the host.

Users may customize the resources (e.g. to apply different requests and/or limits). For large/complex Pulumi apps, it might make sense to reserve more memory and/or use #694.

The agent takes some pains to stay within the requested amount, using a programmatic form of the GOMEMLIMIT environment variable. The agent detects the requested amount via the Downward API. We don't use GOMEMLIMIT to avoid propagating it to sub-processes, and because the format is a Kubernetes 'quantity'.

It was observed that zombies weren't being cleaned up, and this was leading to resource exhaustion. Fixed by using tini as the entrypoint process (PID 1).

Related issues (optional)

Closes #698

EronWright · 2024-10-04T21:04:42Z

operator/api/auto/v1alpha1/workspace_types.go

-
-	// SecurityProfileBaselineDefaultImage is the default image used when the security profile is 'baseline'.
-	SecurityProfileBaselineDefaultImage = "pulumi/pulumi:latest"
-	// SecurityProfileRestrictedDefaultImage is the default image used when the security profile is 'restricted'.
-	SecurityProfileRestrictedDefaultImage = "pulumi/pulumi:latest-nonroot"


Rationale: moving these constants and the associated 'defaulting' logic to webhook/auto/v1alpha1 where, in the future, a true webhook would apply the defaults eagerly. This PR does NOT implement a webhook, it simply applies the defaults during reconciliation for simplicity.

See the latest in webhook scaffolding: kubernetes-sigs/kubebuilder#4150

Let me say, the benefit of applying defaults eagerly (with a webhook) rather than lazily (during reconciliation) is stability; one may change the default later without affecting existing workloads. The implicit becomes explicit.

operator/cmd/main.go

operator/internal/controller/auto/workspace_controller.go

operator/internal/webhook/auto/v1alpha1/workspace_webhook.go

blampe

Love that we killed the zombies!

My gut says the SetMemoryLimit call feels premature -- after all , if we OOM it'll probably be due to npm and not our little agent. But it probably doesn't hurt. We should definitely pull some profiles if we observe a leak.

blampe · 2024-10-07T20:49:41Z

operator/internal/webhook/auto/v1alpha1/workspace_webhook.go

+// // SetupWorkspaceWebhookWithManager registers the webhook for Workspace in the manager.
+// func SetupWorkspaceWebhookWithManager(mgr ctrl.Manager) error {
+// 	return ctrl.NewWebhookManagedBy(mgr).For(&autov1alpha1.Workspace{}).
+// 		WithDefaulter(&WorkspaceCustomDefaulter{}).
+// 		Complete()
+// }


Unused boilerplate or something to enable later?

To be used later when we implement a webhook, which I felt was overkill for this PR.

Dockerfile

EronWright · 2024-10-07T21:20:26Z

@blampe here's the article that convinced me to give the system a hint that we're trying to stay within the 'requests'.
https://weaviate.io/blog/gomemlimit-a-game-changer-for-high-memory-applications

blampe · 2024-10-07T21:53:33Z

@blampe here's the article that convinced me to give the system a hint that we're trying to stay within the 'requests'. https://weaviate.io/blog/gomemlimit-a-game-changer-for-high-memory-applications

Right, rephrasing my earlier comment I don't think the agent falls into this high-memory category. It can run under 100MiB and handles one request at a time -- its heap should be pretty quiet :) Child processes will eat most of our memory, hence why it felt premature to me, but again it doesn't really matter.

EronWright requested review from blampe and rquitales October 4, 2024 21:02

EronWright commented Oct 4, 2024

View reviewed changes

This comment was marked as outdated.

Sign in to view

EronWright commented Oct 4, 2024

View reviewed changes

operator/cmd/main.go Outdated Show resolved Hide resolved

EronWright commented Oct 4, 2024

View reviewed changes

operator/internal/controller/auto/workspace_controller.go Show resolved Hide resolved

EronWright added the impact/no-changelog-required This issue doesn't require a CHANGELOG update label Oct 4, 2024

mjeffryes assigned EronWright Oct 4, 2024

rquitales reviewed Oct 7, 2024

View reviewed changes

operator/internal/webhook/auto/v1alpha1/workspace_webhook.go Outdated Show resolved Hide resolved

rquitales reviewed Oct 7, 2024

View reviewed changes

operator/internal/webhook/auto/v1alpha1/workspace_webhook.go Outdated Show resolved Hide resolved

EronWright added 13 commits October 7, 2024 14:10

bugfix for fileserver

928cbab

use tini for pulumi container

c24b663

use tini for all containers

14764ef

use workspace dir as working dir

fda6898

set agent memory limit

72c9eb0

default workspace resources

c48402c

remove webhook scaffolding

796f41c

linting

f4f2f9a

fix copyright

ac134bd

remove unnecessary code

ecb323c

Update API docs for securityProfile

ea811b6

fix for make run

8008704

add port to advertised address

d5d708c

EronWright force-pushed the issue-698 branch from 5f45de8 to d5d708c Compare October 7, 2024 21:10

blampe approved these changes Oct 7, 2024

View reviewed changes

bugfix for advertised address

2fb59f1

EronWright merged commit 2a67d4c into v2 Oct 7, 2024
8 of 9 checks passed

EronWright deleted the issue-698 branch October 7, 2024 21:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set some default resource requests on the workspace pod #707

Set some default resource requests on the workspace pod #707

EronWright commented Oct 4, 2024 •

edited

Loading

EronWright Oct 4, 2024 •

edited

Loading

EronWright Oct 7, 2024 •

edited

Loading

This comment was marked as outdated.

blampe left a comment

blampe Oct 7, 2024

EronWright Oct 7, 2024

EronWright commented Oct 7, 2024

blampe commented Oct 7, 2024

Set some default resource requests on the workspace pod #707

Set some default resource requests on the workspace pod #707

Conversation

EronWright commented Oct 4, 2024 • edited Loading

Proposed changes

Related issues (optional)

EronWright Oct 4, 2024 • edited Loading

Choose a reason for hiding this comment

EronWright Oct 7, 2024 • edited Loading

Choose a reason for hiding this comment

This comment was marked as outdated.

blampe left a comment

Choose a reason for hiding this comment

blampe Oct 7, 2024

Choose a reason for hiding this comment

EronWright Oct 7, 2024

Choose a reason for hiding this comment

EronWright commented Oct 7, 2024

blampe commented Oct 7, 2024

EronWright commented Oct 4, 2024 •

edited

Loading

EronWright Oct 4, 2024 •

edited

Loading

EronWright Oct 7, 2024 •

edited

Loading