Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Scroll API #3705

Merged
merged 2 commits into from
Aug 7, 2023
Merged

Added Scroll API #3705

merged 2 commits into from
Aug 7, 2023

Conversation

fulmicoton
Copy link
Contributor

@fulmicoton fulmicoton commented Aug 3, 2023

Solution described in docs/internals/scroll.md.

Tests
Unit tests for the simplistic KV score.
Unit test for the scroll API
Rest API test comparing output from ElasticSearch.

Closing #3551


That id is then meant to be sent to a scroll rest API.
This API successive calls will then return pages incrementally.
Scroll is limited to 10k items.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Airmail can go up to 20k.

Copy link
Contributor Author

@fulmicoton fulmicoton Aug 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. This is configurable in elastic. For the moment we don't have any limit what so ever.

The scrolled results should be consistent with the state of the original index.
For this reason we need to capture the state of the index at the point of the original request.

If a network error happens between the client and the server at page N, there is no way for the client to ask the reemission of page N.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we "get it right" in the Quickwit implementation and then "expose it wrong" in the ES API?

Copy link
Contributor Author

@fulmicoton fulmicoton Aug 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do it right and expose it right actually. People can retry with the scroll id that failed due to a network error and it will work as you would expect in the case of quickwit

@@ -32,6 +32,7 @@ pub mod pubsub;
pub mod rand;
pub mod rendezvous_hasher;
pub mod runtimes;
pub mod shared_consts;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think quickwit-proto is better suited for this usage since it's becoming the crate where we expose our public and internal APIs.

Copy link
Contributor Author

@fulmicoton fulmicoton Aug 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm.... I am not sure about that one. Maybe if we renamed it quickwit-public-api or something like that.
THe trouble is that both quickwit-commmon and quickwit-proto pull all kinds of shit.

#[derive(Clone, PartialEq, ::prost::Message)]
pub struct PutKvRequest {
#[prost(bytes = "vec", tag = "1")]
pub key: ::prost::alloc::vec::Vec<u8>,
Copy link
Member

@guilload guilload Aug 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We should use Bytes instead of Vec<u8>.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

.count()
.await;

if successful_replication == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd display a warning if any attempt failed and an error if all of them failed. It would be also nice to display the number of failures and which nodes failed. Given the urgency of this PR, feel free to ignore this comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

excellent point.

I went for warn whenever an attempt failed, error if all failed and there was at least one node.

}

#[instrument(skip(search_request, cluster_client))]
async fn search_partial_hits_round_with_scroll(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
async fn search_partial_hits_round_with_scroll(
async fn search_partial_hits_phase_with_scroll(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's common to hear about two-phase search or execution for search engines like ES or ours.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes!

if let Ok(Some(search_after_resp)) = client.get_kv(get_request.clone()).await {
return Some(search_after_resp);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also display a warning here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not fond of warn here. It would trigger if you add a node for instance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it an info

See Design details in docs/internals/scroll.md

Closes #3551
@fulmicoton fulmicoton merged commit 5d2c66e into main Aug 7, 2023
3 checks passed
@fulmicoton fulmicoton deleted the issue/3551-scroll-api branch August 7, 2023 08:31
/// Maximum capacity of the search after cache.
///
/// For the moment this value is hardcoded.
/// TODO make configurable.
Copy link
Contributor

@fmassot fmassot Aug 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you open an issue on this?

Also, I don't understand the comment "
/// Assuming a search context of 1MB, this can
/// amount to up to 1GB.
"

Are you talking about the scroll context? If not, what is the search context?
And another question where the 1000 factor comes from (1MB -> 1GB)?

For airmail project, the scroll API will be used for exporting data, up to 20k hits. Does this mean we need 20GB of RAM on a searcher?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants