Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop processing split queries on first error. #11067

Merged
merged 2 commits into from
Oct 27, 2023

Conversation

jeschkies
Copy link
Contributor

What this PR does / why we need it:
The TestMetricsTripperware became flaky after #10688. The race condition in the splitter has been there before but the change exposed it. When an error occurs, e.g. when query would be too large, the splitter would return. However, there was a small time window when the loop could still be fed a new request. #10688 just amplified this effect because it removed the serialization of requests and responses.

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • CHANGELOG.md updated
    • If the change is worth mentioning in the release notes, add add-to-release-notes label
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory.

Copy link
Contributor

@kavirajk kavirajk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍

Copy link
Contributor

@dannykopping dannykopping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was super pairing up with you to find and fix this 💪 great work!

pkg/querier/queryrange/roundtrip_test.go Outdated Show resolved Hide resolved
@jeschkies jeschkies enabled auto-merge (squash) October 27, 2023 11:08
@grafanabot
Copy link
Collaborator

Hello @jeschkies!
Backport pull requests need to be either:

  • Pull requests which address bugs,
  • Urgent fixes which need product approval, in order to get merged,
  • Docs changes.

Please, if the current pull request addresses a bug fix, label it with the type/bug label.
If it already has the product approval, please add the product-approved label. For docs changes, please add the type/docs label.
If the pull request modifies CI behaviour, please add the type/ci label.
If none of the above applies, please consider removing the backport label and target the next major/minor release.
Thanks!

@jeschkies jeschkies added the type/bug Somehing is not working as expected label Oct 27, 2023
@jeschkies jeschkies merged commit edae9d3 into grafana:main Oct 27, 2023
10 checks passed
@grafanabot
Copy link
Collaborator

The backport to k173 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new branch
git switch --create backport-11067-to-k173 origin/k173
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x edae9d32eaf170d88628efc6374d546ce5c68cd4

When the conflicts are resolved, stage and commit the changes:

git add . && git cherry-pick --continue

If you have the GitHub CLI installed:

# Push the branch to GitHub:
git push --set-upstream origin backport-11067-to-k173
# Create the PR body template
PR_BODY=$(gh pr view 11067 --json body --template 'Backport edae9d32eaf170d88628efc6374d546ce5c68cd4 from #11067{{ "\n\n---\n\n" }}{{ index . "body" }}')
# Create the PR on GitHub
echo "${PR_BODY}" | gh pr create --title "[k173] Stop processing split queries on first error." --body-file - --label "size/S" --label "type/bug" --label "backport" --base k173 --milestone k173 --web

Or, if you don't have the GitHub CLI installed (we recommend you install it!):

# Push the branch to GitHub:
git push --set-upstream origin backport-11067-to-k173

# Create a pull request where the `base` branch is `k173` and the `compare`/`head` branch is `backport-11067-to-k173`.

# Remove the local backport branch
git switch main
git branch -D backport-11067-to-k173

@jeschkies jeschkies deleted the karsten/fix-race-condition branch October 27, 2023 11:53
rhnasc pushed a commit to inloco/loki that referenced this pull request Apr 12, 2024
**What this PR does / why we need it**:
The `TestMetricsTripperware` became flaky after grafana#10688. The race
condition in the splitter has been there before but the change exposed
it. When an error occurs, e.g. when query would be too large, the
splitter would return. However, there was a small time window when the
loop could still be fed a new request. grafana#10688 just amplified this effect
because it removed the serialization of requests and responses.

**Checklist**
- [ ] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [ ] Documentation added
- [ ] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](grafana@d10549e)
- [ ] If the change is deprecating or removing a configuration option,
update the `deprecated-config.yaml` and `deleted-config.yaml` files
respectively in the `tools/deprecated-config-checker` directory. <!--
TODO(salvacorts): Add example PR -->

---------

Co-authored-by: Danny Kopping <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants