Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add maturity date in split metadata and use it to filter mature splits for merges #3585

Merged
merged 13 commits into from
Jul 6, 2023

Conversation

fmassot
Copy link
Contributor

@fmassot fmassot commented Jun 27, 2023

Fix #3576

MergePolicy update

This PR modifies the MergePolicy by replacing is_mature method by split_maturity

trait MergePolicy {
    fn split_maturity(
        &self,
        split_num_docs: usize,
        split_num_merge_ops: usize,
    ) -> SplitMaturity;
}

/// `SplitMaturity` defines the maturity of a split, is is either `Mature`
/// or becomes mature after a given period.
#[derive(Clone, Copy, Debug, Default, Eq, Serialize, Deserialize, PartialEq)]
#[serde(tag = "type")]
#[serde(rename_all = "snake_case")]
pub enum SplitMaturity {
    #[default]
    Mature,
    Immature {
        #[serde(with = "serde_duration_millisecs")]
        maturation_period: Duration,
    },
}

Migration

Splits from the old quickwit version will be considered mature.

@fmassot fmassot force-pushed the fmassot/add-maturity-date-in-split-metadata branch 8 times, most recently from e1cd0b2 to 614f575 Compare June 28, 2023 15:17
@fmassot fmassot marked this pull request as ready for review June 28, 2023 15:18
@fmassot fmassot requested a review from guilload June 28, 2023 15:18
@fulmicoton
Copy link
Contributor

Ideally, I don't want to use Duration but to have a number of seconds directly.

Why? Types are really great for this...

The split metadata comes with two new methods, one returning the maturity_timestamp. This maturity_timestamp is set to 0 if no time to maturity is defined for this split.
I could have hidden this hackish maturity_timestamp = 0 in the metastore implementations, but I thought having this rule defined at the SplitMetadata level could be helpful for readers.

Is the idea here to make sure that existing splits end up mature after the upgrade of quickwit?

@fmassot
Copy link
Contributor Author

fmassot commented Jul 3, 2023

Ideally, I don't want to use Duration but to have a number of seconds directly.

Why? Types are really great for this...

Yes indeed, what I really would like is to have a struct at the seconds precision only.

Is the idea here to make sure that existing splits end up mature after the upgrade of quickwit?

Not really. In the case where you want to filter splits by maturity_timestamp, I thought it could be useful to expose publicly the way it is built.

Now the truth is that I'm not satisfied with the solution...

I'm thinking of this other solution:

- time_to_maturity: Duration instead of Option<Duration>. If the maturity split is already mature, time_to_maturity is 0.
- maturity_timestamp: i64, equals to create_timestamp + time_to_maturity

This solution has one drawback: the time of the system can be different between servers, so one indexer can potentially fetch splits that are already mature;

Let's add an enum for split maturity.

@fmassot fmassot force-pushed the fmassot/add-maturity-date-in-split-metadata branch 3 times, most recently from d2b814b to 6416b25 Compare July 4, 2023 21:51
}

// Serializer/Deserializer of `Duration` in milliseconds.
pub mod serde_duration_millisecs {
Copy link
Contributor

@fulmicoton fulmicoton Jul 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serde_with has an implem too if you want to add the dependency (I agree it is always a tiny bit sad)

I'd prefer secs than millisecs here. I'm not too fond of reading pinball high scores. :)
Also I think we store secs for the timestamps, so millisecs does not make much sense here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, there is a plan led by @guilload about putting all split metadata "timestamps" in milliseconds and adding a small wrapper Timestamp to avoid having i64 everywhere.

So we will eventually move to a world where everything is in milliseconds. Thus the duration in millis.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Milliseconds are much better for unit testing. We have a bunch of time::sleep(1s) in our metastore unit tests, making them slow and flaky, because of the poor resolution of our timestamps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this particular duration, it does not impact the unit tests.

On the contrary, we don't control create_timestamp, update_timestamp, and publish_timestamp values, thus the need for sleep.

Copy link
Contributor

@fulmicoton fulmicoton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. Good job.

I had a couple of nitpicks. Can you at least consider putting duration in secs (it will be a pain to change in the future)? Strongest point being: the timestamp we compre it with is in secs.

The other change I would like you to consider is
Bound<OffsetDatetime> instead of Option<SplitMaturityFilter>.

@fmassot fmassot force-pushed the fmassot/add-maturity-date-in-split-metadata branch from b57923f to c56a05d Compare July 5, 2023 07:36
@fmassot fmassot force-pushed the fmassot/add-maturity-date-in-split-metadata branch 3 times, most recently from a60abc3 to 3e2c6a2 Compare July 6, 2023 14:09
@fmassot fmassot force-pushed the fmassot/add-maturity-date-in-split-metadata branch from 3e2c6a2 to f9a25ef Compare July 6, 2023 15:16
@fmassot fmassot merged commit 8a684d1 into main Jul 6, 2023
7 checks passed
@fmassot fmassot deleted the fmassot/add-maturity-date-in-split-metadata branch July 6, 2023 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Merge pipeline failed when index has too many splits
3 participants