-
Notifications
You must be signed in to change notification settings - Fork 330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add maturity date in split metadata and use it to filter mature splits for merges #3585
Conversation
e1cd0b2
to
614f575
Compare
Why? Types are really great for this...
Is the idea here to make sure that existing splits end up mature after the upgrade of quickwit? |
Yes indeed, what I really would like is to have a struct at the seconds precision only.
Not really. In the case where you want to filter splits by Now the truth is that I'm not satisfied with the solution...
Let's add an enum for split maturity. |
d2b814b
to
6416b25
Compare
} | ||
|
||
// Serializer/Deserializer of `Duration` in milliseconds. | ||
pub mod serde_duration_millisecs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
serde_with has an implem too if you want to add the dependency (I agree it is always a tiny bit sad)
I'd prefer secs than millisecs here. I'm not too fond of reading pinball high scores. :)
Also I think we store secs for the timestamps, so millisecs does not make much sense here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, there is a plan led by @guilload about putting all split metadata "timestamps" in milliseconds and adding a small wrapper Timestamp
to avoid having i64
everywhere.
So we will eventually move to a world where everything is in milliseconds. Thus the duration in millis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Milliseconds are much better for unit testing. We have a bunch of time::sleep(1s) in our metastore unit tests, making them slow and flaky, because of the poor resolution of our timestamps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this particular duration, it does not impact the unit tests.
On the contrary, we don't control create_timestamp
, update_timestamp
, and publish_timestamp
values, thus the need for sleep
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved. Good job.
I had a couple of nitpicks. Can you at least consider putting duration in secs (it will be a pain to change in the future)? Strongest point being: the timestamp we compre it with is in secs.
The other change I would like you to consider is
Bound<OffsetDatetime>
instead of Option<SplitMaturityFilter>
.
b57923f
to
c56a05d
Compare
…and avoid confusing the reader.
a60abc3
to
3e2c6a2
Compare
3e2c6a2
to
f9a25ef
Compare
Fix #3576
MergePolicy update
This PR modifies the
MergePolicy
by replacingis_mature
method bysplit_maturity
Migration
Splits from the old quickwit version will be considered mature.