Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix agg split_size parameter, add docs and test #4627

Merged
merged 3 commits into from
Feb 27, 2024
Merged

fix agg split_size parameter, add docs and test #4627

merged 3 commits into from
Feb 27, 2024

Conversation

PSeitz
Copy link
Contributor

@PSeitz PSeitz commented Feb 25, 2024

fix aggregation parameter split_size propagation to segment_size
improve split_size docs
add split_size parameter test

@PSeitz PSeitz changed the title fix agg split_size parameter, add docs test fix agg split_size parameter, add docs and test Feb 25, 2024
@PSeitz PSeitz requested a review from fmassot February 25, 2024 16:46
// Forward split_size (also shard_size) parameter to segment_size, as
// this is the same in context of Quickwit
if let Some(split_size) = &terms_agg.split_size {
terms_agg.segment_size = Some(*split_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. So we have an argument called split_size introduced in tantivy even though the parameter makes no sense there.

And it is unused... So you are adding glue code to actually have us write split_size into segment_size?

Can't we just remove the split_size property in tantivy and populate the segment_size thing directly?

Is this a serialization issue?

Copy link
Contributor Author

@PSeitz PSeitz Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split_size which aliases to shard_size is supposed to set the number of terms that get send from a node to the root node. That part is not handled in tantivy and therefore unused, but would still make sense in the request structure.
Since we have just one segment in Quickwit currently, we can set it to segment_size.

We could alias shard_size directly to segment_size, but that would be kind of incorrect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you think of a better solution?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could truncate the results to split_size before sending the results between nodes, independently of segment_size. But I think we can just forward the parameter for now, until we have more insight how split_size and segment_size should differ.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split_size which aliases to shard_size is supposed to set the number of terms that get send from a node to the root node. That part is not handled in tantivy and therefore unused, but would still make sense in the request structure.
Since we have just one segment in Quickwit currently, we can set it to segment_size.

That makes no sense at all.

Copy link
Contributor Author

@PSeitz PSeitz Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sending data between nodes is not handled in tantivy, it's still obviously useful to control that with a parameter.

No idea why you would think that is nonsense.

@@ -532,15 +532,22 @@ By default, the top 10 terms with the most documents are returned. Larger values

###### **split_size**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should only call it shard_size?

@@ -532,15 +532,22 @@ By default, the top 10 terms with the most documents are returned. Larger values

###### **split_size**

The get more accurate results, we fetch more than size from each segment/split.
The get more accurate results, we fetch more than size from each segment/split. Aliases to `shard_size`.
Copy link
Contributor

@fulmicoton fulmicoton Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The get more accurate results, we fetch more than size from each segment/split. Aliases to `shard_size`.
The get more accurate results, we fetch more than size from each segment/split.
The parameter also admits `split_size` as an alias for opensearch/elasticsearch compatibility reasons.

fix aggregation parameter split_size propagation
improve split_size docs
add split_size parameter test
rename split_size to shard_size
update docs
remove variable propagation
key: nice
- doc_count: 2
key: cool
doc_count_error_upper_bound: 0
sum_other_doc_count: 0
---
# Test term aggs with split_size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shard

Suggested change
# Test term aggs with split_size
# Test term aggs with shard_size

@fulmicoton
Copy link
Contributor

fulmicoton commented Feb 27, 2024

@PSeitz Don't we need to update the tantivy version for the test to pass, yet the tests are passing how did that work?

Copy link
Contributor

@fulmicoton fulmicoton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to update the tantivy version, or was this done in a different PR?

@PSeitz
Copy link
Contributor Author

PSeitz commented Feb 27, 2024

I think we need to update the tantivy version, or was this done in a different PR?

Yes, that was done in a different PR

@PSeitz PSeitz enabled auto-merge (squash) February 27, 2024 02:36
@PSeitz PSeitz merged commit c81467d into main Feb 27, 2024
4 checks passed
@PSeitz PSeitz deleted the agg_docs branch February 27, 2024 02:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants