Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add jitter to retention schedule #5361

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

trinity-1686a
Copy link
Contributor

Description

add optional jitter to retention policy schedule, so that numerous indexes with the same schedule don't hammer down the metastore when they all execute their retention policy
fix #5353

How was this PR tested?

unit test

Copy link

github-actions bot commented Aug 28, 2024

On SSD:

Average search latency is 1.01x that of the reference (lower is better).
Ref run id: 3063, ref commit: 60368df
Link

On GCS:

Average search latency is 1.08x that of the reference (lower is better).
Ref run id: 3064, ref commit: 60368df
Link

@guilload
Copy link
Member

I wonder if we could avoid exposing users to one more knob by jittering by default. We would have to pick a value between [next evaluation..next evaluation+1], maybe biased towards the beginning of the interval. WDYT? Would that be confusing for users?

@trinity-1686a
Copy link
Contributor Author

trinity-1686a commented Sep 12, 2024

i think it would be fine for policies which happen often enough (once every day or more often), but could get confusing if policies put in more delays, especially when entered in cron format. For instance if i put 0 0 * * 0 (every sunday at midnight), I wouldn't be too surprised if the task starts a few minutes or maybe an hour late, but i would be surprised if it started doing something on tuesday (or any other day). We could also run into issues where if we do something daily (0utc assumed), and for some reason it's company policy to restart servers at 1am, only 1/24 indexes would actually be cleaned each day (the other 23/24 are scheduled after 1am, then we restart, look at what the next time to run would be, it's midnight + some delay, sleep for the rest of the day)

We could do something where the jitter_secs is min(1h, time_between_2_schedules), which sounds generally less surprising, and maybe (or maybe not) allow manual configuration to something higher than 1h?

@trinity-1686a trinity-1686a requested review from guilload and removed request for fulmicoton September 13, 2024 16:23
@guilload
Copy link
Member

Yes, min(1h, time_between_2_schedules) is perfect. I meant something along those lines with "biased".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make sure the janitor does not spam the metastore too much
2 participants