Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flux monitoring docs lost prometheus alert ReconciliationFailure during transition to kube-state-metrics #1686

Open
kingdonb opened this issue Oct 3, 2023 · 1 comment

Comments

@kingdonb
Copy link
Member

kingdonb commented Oct 3, 2023

When we created the new Flux Custom Metrics guide in the Monitoring docs, we lost something:

https://github.com/fluxcd/website/blob/6e5991208cfbc0b1dbeae83855fb68f10230e01a/content/en/flux/guides/monitoring.md#metrics

Buried at the bottom of this metrics section, there was an example similar to in the Flagger docs where we show if a Canary fails, how to build an alert so that someone is able to intervene. The link above has a similar ReconciliationFailure alert that isn't in the Flux docs anymore, we should figure out where to put it back, I'm sure many people have used it!

At the time I found the doc less than helpful because it doesn't make very clear how to add a new alert to Prometheus with the kube-prometheus-stack chart. I did this:

https://github.com/kingdonb/flux2/blob/ddf3c495133a2e49e20c97588887f01bb2f6b104/manifests/monitoring/kube-prometheus-stack/release.yaml#L460-L468

I don't suggest we do that. It must be possible to create a new PrometheusRule resource alongside of the Flux Monitoring deployment of kube-prometheus-stack. I couldn't figure it out in limited time, so I went with "let's rewrite all of the default alerts, but add one more" because of the difficulty of patching an array of values in Helm values. 😬

This might be a good place to provide another example of how to install a chart that manages CRDs and a CRD alongside of it.

I see how we lost this now, it was buried at the bottom of the Metrics section, and the only relevant bit left, without a subheading of its own. Let's add it back? (Just wanted to document this because I have limited time again today, and the day is almost over!)

@kingdonb
Copy link
Member Author

kingdonb commented Oct 3, 2023

This issue should probably have gone in fluxcd/website

@stefanprodan stefanprodan transferred this issue from fluxcd/flux2 Oct 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant