Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use search_after in scroll #4280

Merged
merged 7 commits into from
Jan 25, 2024
Merged

use search_after in scroll #4280

merged 7 commits into from
Jan 25, 2024

Conversation

trinity-1686a
Copy link
Contributor

Description

fix #3748
use search_after as an implementation detail in scroll. We lose the ability to replay request with the same scroll_id if pages are for more than 1k document (which isn't something that's technically allowed anyway), but we gain that the underlying request is always for the same amount of documents, and not growing linearly like before.

Fix a bug where asking for pages of >1k doc would yield a first page of 1k, and only have the right number of docs on subsequent pages.
Make it so that we can always answer a page in at most a single underlying query (before it was up to 10 queries. If the page was for 10k docs, we would issue 10 requests for 1k docs)

How was this PR tested?

tested manually and updated unit tests

@@ -491,6 +491,7 @@ async fn search_partial_hits_phase_with_scroll(
)
.next_page(leaf_search_resp.partial_hits.len() as u64);

scroll_ctx.truncate_start();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that you are calling truncate start to handle the case where the request max hits is > SCROLL_BATCH_LEN. If so, I don't understand why this isn't truncate (as opposed to truncate_start) we want to call here.

If this is the use case you are targetting I suggest we :

  • do not cache anything if max_hits is too large
  • return an error and log a warn if scroll is used with max_hits too large.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that the firsts elements in cache have already been consumed. To not store too many elements (so, to reduce memory pressure), we throw away elements so as to keep at most SCROLL_BATCH_LEN elems.
This is arguably not useful, we could keep only one element (to know the values to provide for search_after), and throw everything else

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

I'd rather remove the truncate_before. If I understand correctly, it is there to optimize memory.
The complexity added is non-trivial.

The truncation relies on a very bad spec on elasticsearch: scrolling is not nilpotent.
Calling Elasticsearch twice with the same scroll id increases the scroll. (This is bad engineering because it prevents retries on all kinds of errors.)
In addition, the function has a strange behavior that is not suggested in its name. It truncates, but attempts to leave at least one element.

we throw away elements so as to keep at most SCROLL_BATCH_LEN elems.
Even without the truncation, I think we already have at most SCROLL_BATCH_LEN today. (maybe it is a change you made recently).

Copy link
Contributor Author

@trinity-1686a trinity-1686a Jan 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So do we want to keep scrolling idempotent? Before this PR, it was (though at the cost of having each page being slower and more memory intensive to create than the previous).
After, but with the original truncate_start (or without it being called, but with usage of search_after based scrolling), it is idempotent in some cases, but not always (and never if a page is more than SCROLL_BATCH_LEN)
With the last commit, it rarely is idempotent

Copy link
Contributor

@fulmicoton fulmicoton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really a request change: I did not understand the truncate_start stuff. Please have a look at my comment.

@fulmicoton
Copy link
Contributor

As discussed offline, let's put the search_after key in the scroll key, and avoid caching when max hits is too large,
and let's make the feature idempotent.

) -> ScrollKeyAndStartOffset {
let scroll_ulid: Ulid = Ulid::new();
// technically we could only initialize search_after on first call to next_page, and use
// default() before, but that feels like partial initilization.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// default() before, but that feels like partial initilization.
// default() before, but that feels like partial initialization.

@trinity-1686a trinity-1686a enabled auto-merge (squash) January 25, 2024 10:27
@trinity-1686a trinity-1686a merged commit ca38897 into main Jan 25, 2024
4 checks passed
@trinity-1686a trinity-1686a deleted the trinity--search-after-scroll branch January 25, 2024 10:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve scroll api to rely on search_after parameter
2 participants