Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics for currently reconciling stacks #576

Closed
nicu-da opened this issue May 2, 2024 · 1 comment
Closed

Add metrics for currently reconciling stacks #576

nicu-da opened this issue May 2, 2024 · 1 comment
Assignees
Labels
good-first-issue Start here if you'd like to start contributing to Pulumi impact/usability Something that impacts users' ability to use the product easily and intuitively kind/enhancement Improvements or new features size/S Estimated effort to complete (1-2 days).
Milestone

Comments

@nicu-da
Copy link

nicu-da commented May 2, 2024

Hello!

  • Vote on this issue by adding a 👍 reaction
  • If you want to implement this feature, comment to let us know (we'll work with you on design, scheduling, etc.)

Issue details

Affected area/feature

Operator metrics

Add a metric that is similar to the stacks_failing

stacks_failing - a set of gauge time series, labelled by namespace, that gives the number of stacks currently failing (stack.status.lastUpdate.state is failed)

that tracks the stacks that are currently being reconciled.

Compared to controller_runtime_active_workers the new metric should be labeled with the stack name, thus allowing visualization for what metrics are currently being updated, and to add alerts if a stack takes too long to update.

@nicu-da nicu-da added kind/enhancement Improvements or new features needs-triage Needs attention from the triage team labels May 2, 2024
@blampe blampe added impact/usability Something that impacts users' ability to use the product easily and intuitively good-first-issue Start here if you'd like to start contributing to Pulumi size/S Estimated effort to complete (1-2 days). and removed needs-triage Needs attention from the triage team labels May 8, 2024
@cleverguy25
Copy link

Added to epic #586

@blampe blampe added this to the 0.111 milestone Oct 1, 2024
rquitales added a commit that referenced this issue Oct 10, 2024
### Proposed changes

This PR exposes the metrics service in our manifests and adds new unit
tests to ensure the metrics logic is correct.

### Technical changes

1. Added `sigs.k8s.io/controller-runtime/pkg/metrics/filters` dependency
to allow metrics to be secure by default (guidance from kubebuilder
book/scaffold)
2. Adds new kustomize patches and bases to expose the metrics server in
a typical PKO deployment
3. Refactor the `updateStackCallback` function
4. Added the `stacks_reconciling` metric
5. Added ginkgo tests to exercise the Program and Stack metrics
6. Fixed some bugs in our metrics as determined through the unit tests

### Testing

- Manually tested a deployed PKO installation with the Prometheus
Operator
- Added unit tests (using ginkgo)

Note, e2e tests will be added in a followup PR.

### Related issues (optional)

Closes: #576
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good-first-issue Start here if you'd like to start contributing to Pulumi impact/usability Something that impacts users' ability to use the product easily and intuitively kind/enhancement Improvements or new features size/S Estimated effort to complete (1-2 days).
Projects
Status: Done
Development

No branches or pull requests

4 participants