Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade agent to prometheus 2.51 #6754

Closed
wants to merge 16 commits into from

Conversation

ptodev
Copy link
Contributor

@ptodev ptodev commented Mar 22, 2024

PR Description

Upgrading to Prometheus 0.51.0.

Which issue(s) this PR fixes

Fixes grafana/alloy#227

Notes to the Reviewer

There are various TODOs in the code many of which need to be looked into. I'm also not sure how to implement this change. It seems like the default list of scrape_protocols changes if you enable native histograms.

When I start static mode, I get an error such as this:

panic: failed to register "pkg.translator.prometheus.PermissiveLabelSanitization": gate is already registered

This seems to be because feature gate code from OTel was copied to the Prometheus codebase. I tried to resolve this by replacing...

import "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/translator/prometheus"

...with...

"github.com/prometheus/prometheus/storage/remote/otlptranslator/prometheus"

...but it didn't work.

A big part of this PR is not using globally registered metrics for Prometheus Scrape and Service Discovery. The way we initialise the SD and scrape managers has changed. When testing this, it's important that we catch issues with double metric registrations and with improper unregistrations. Try this:

  • Test as much relevant code in static mode and in flow mode as possible.
  • Try set up multiple instances of the same Flow component, multiple instances of the static mode scrape jobs, etc.
  • Try reloading the config by hitting the reload endpoint. This will tell you if unregistration was done properly.

For example, maybe something like this is a good start: static.yml.txt

I also haven't checked if there are any other config options which we need to add.

PR Checklist

  • CHANGELOG.md updated
  • Documentation added
  • Tests updated
  • Config converters updated

@clayton-cornell clayton-cornell requested a review from a team March 23, 2024 00:09
@thampiotr thampiotr self-assigned this Mar 25, 2024
@thampiotr
Copy link
Contributor

thampiotr commented Mar 26, 2024

Had to use two forks:

@thampiotr thampiotr force-pushed the 6580-upgrade-agent-to-prometheus-250 branch from 7b405c3 to 2f5e452 Compare March 26, 2024 17:59
//TODO(ptodev): It looks messy to have a refresh metrics separate from the other SD metrics.
// Can we make this cleaner? The reason it was made like this currently,
// is so that metrics exposed by Prometheus executables don't change.
refreshMetrics := discovery.NewRefreshMetrics(o.Registerer)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels ugly, so we dont have to Register the the refreshMetrics but we do need to unregister them? And then since we can get a newdiscover on the channel we have to unregister there. The fact they aren't balanced from looking at the code makes it rough.

wg.Add(1)
go func() {
c.runDiscovery(newCtx, disc)
c.sdMetrics.Unregister()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like it should break if we create the discovering multiple time in the Update.

@@ -48,14 +48,14 @@ Name | Type | Description | Default | Required
`forward_to` | `list(MetricsReceiver)` | List of receivers to send scraped metrics to. | | yes
`job_name` | `string` | The value to use for the job label if not already set. | component name | no
`extra_metrics` | `bool` | Whether extra metrics should be generated for scrape targets. | `false` | no
`enable_protobuf_negotiation` | `bool` | Whether to enable protobuf negotiation with the client. | `false` | no
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this go into breaking changes documentation?

@rfratto rfratto added variant/static Related to Grafana Agent Static. variant/flow Relatd to Grafana Agent Flow. outdated-dependency Depdency of the project that is out of date labels Apr 9, 2024
@thampiotr
Copy link
Contributor

This is replaced now with grafana/alloy#794 in Alloy.

@thampiotr thampiotr closed this May 16, 2024
@github-actions github-actions bot added the frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed. label Jun 16, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 16, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed. outdated-dependency Depdency of the project that is out of date variant/flow Relatd to Grafana Agent Flow. variant/static Related to Grafana Agent Static.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upgrade Alloy to Prometheus 2.50
4 participants