Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove querier wait time metric. #11233

Merged
merged 9 commits into from
Dec 6, 2023

Conversation

jeschkies
Copy link
Contributor

What this PR does / why we need it:
We would like to know how long a querier worker is idle to understand if workstealing would have an impact. The original metric was too noisy and its cardinality was too high. Instead, we are going to log the wait time.

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • CHANGELOG.md updated
    • If the change is worth mentioning in the release notes, add add-to-release-notes label
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@@ -64,7 +65,9 @@ func (pm *processorManager) concurrency(n int) {
n = 0
}

workerId := 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am guessing the workerId does not matter much here. but there is a chance here of re-using workerIds if the concurrency value changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I wonder what happens then. Are we restarting the querier?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when more schedulers get added? which means each worker gets a smaller concurrency value.
not something that is often done in practice, so it should be alright I think

Copy link
Contributor

@ashwanthgoli ashwanthgoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

github-actions bot commented Nov 21, 2023

Trivy scan found the following vulnerabilities:

@jeschkies jeschkies enabled auto-merge (squash) November 22, 2023 07:26
Copy link
Contributor

@cstyan cstyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would the new native histograms be helpful here as well? these are much less costly and have less relative error between buckets

there's essentially a single series per native histogram metric and buckets are populated sparsely in storage

@@ -58,7 +58,7 @@ func (fp *frontendProcessor) notifyShutdown(ctx context.Context, conn *grpc.Clie
}

// runOne loops, trying to establish a stream to the frontend to begin request processing.
func (fp *frontendProcessor) processQueriesOnSingleStream(ctx context.Context, conn *grpc.ClientConn, address string) {
func (fp *frontendProcessor) processQueriesOnSingleStream(ctx context.Context, conn *grpc.ClientConn, address, _ string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the _ param intentional here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, processQueriesOnSingleStream implements the processor interface and we don't use the worker ID in the case of the frontend processor.

@@ -39,7 +39,7 @@ func TestRecvFailDoesntCancelProcess(t *testing.T) {
running.Store(true)
defer running.Store(false)

mgr.processQueriesOnSingleStream(ctx, cc, "test:12345")
mgr.processQueriesOnSingleStream(ctx, cc, "test:12345", "1")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is "1" necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Since it's ignored we don't need it here.

@jeschkies jeschkies merged commit 5b8d0e6 into grafana:main Dec 6, 2023
7 checks passed
@jeschkies jeschkies deleted the karsten/move-queuing-time branch December 7, 2023 05:42
rhnasc pushed a commit to inloco/loki that referenced this pull request Apr 12, 2024
**What this PR does / why we need it**:
We would like to know how long a querier worker is idle to understand if
workstealing would have an impact. The original metric was too noisy and
its cardinality was too high. Instead, we are going to log the wait
time.

**Checklist**
- [ ] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [ ] Documentation added
- [ ] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](grafana@d10549e)
- [ ] If the change is deprecating or removing a configuration option,
update the `deprecated-config.yaml` and `deleted-config.yaml` files
respectively in the `tools/deprecated-config-checker` directory.
[Example
PR](grafana@0d4416a)

---------

Co-authored-by: Danny Kopping <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants