Added Scroll API #3705

fulmicoton · 2023-08-03T05:37:48Z

Solution described in docs/internals/scroll.md.

Tests
Unit tests for the simplistic KV score.
Unit test for the scroll API
Rest API test comparing output from ElasticSearch.

Closing #3551

guilload · 2023-08-04T16:34:44Z

docs/internals/scroll.md

+
+That id is then meant to be sent to a scroll rest API.
+This API successive calls will then return pages incrementally.
+Scroll is limited to 10k items.


Airmail can go up to 20k.

Understood. This is configurable in elastic. For the moment we don't have any limit what so ever.

guilload · 2023-08-04T16:37:46Z

docs/internals/scroll.md

+The scrolled results should be consistent with the state of the original index.
+For this reason we need to capture the state of the index at the point of the original request.
+
+If a network error happens between the client and the server at page N, there is no way for the client to ask the reemission of page N.


Do we "get it right" in the Quickwit implementation and then "expose it wrong" in the ES API?

We do it right and expose it right actually. People can retry with the scroll id that failed due to a network error and it will work as you would expect in the case of quickwit

guilload · 2023-08-04T16:45:29Z

quickwit/quickwit-common/src/lib.rs

@@ -32,6 +32,7 @@ pub mod pubsub;
 pub mod rand;
 pub mod rendezvous_hasher;
 pub mod runtimes;
+pub mod shared_consts;


nit: I think quickwit-proto is better suited for this usage since it's becoming the crate where we expose our public and internal APIs.

Hmmm.... I am not sure about that one. Maybe if we renamed it quickwit-public-api or something like that.
THe trouble is that both quickwit-commmon and quickwit-proto pull all kinds of shit.

guilload · 2023-08-04T16:49:11Z

quickwit/quickwit-proto/src/codegen/quickwit/quickwit.search.rs

+#[derive(Clone, PartialEq, ::prost::Message)]
+pub struct PutKvRequest {
+    #[prost(bytes = "vec", tag = "1")]
+    pub key: ::prost::alloc::vec::Vec<u8>,


nit: We should use Bytes instead of Vec<u8>.

guilload · 2023-08-04T17:06:15Z

quickwit/quickwit-search/src/cluster_client.rs

+            .count()
+            .await;
+
+        if successful_replication == 0 {


nit: I'd display a warning if any attempt failed and an error if all of them failed. It would be also nice to display the number of failures and which nodes failed. Given the urgency of this PR, feel free to ignore this comment.

excellent point.

I went for warn whenever an attempt failed, error if all failed and there was at least one node.

guilload · 2023-08-04T17:20:42Z

quickwit/quickwit-search/src/root.rs

+}
+
+#[instrument(skip(search_request, cluster_client))]
+async fn search_partial_hits_round_with_scroll(


Suggested change

async fn search_partial_hits_round_with_scroll(

async fn search_partial_hits_phase_with_scroll(

It's common to hear about two-phase search or execution for search engines like ES or ours.

guilload · 2023-08-04T17:24:36Z

quickwit/quickwit-search/src/cluster_client.rs

+            if let Ok(Some(search_after_resp)) = client.get_kv(get_request.clone()).await {
+                return Some(search_after_resp);
+            }
+        }


I'd also display a warning here.

I am not fond of warn here. It would trigger if you add a node for instance.

I made it an info

See Design details in docs/internals/scroll.md Closes #3551

fmassot · 2023-08-07T08:44:29Z

quickwit/quickwit-search/src/scroll_context.rs

+/// Maximum capacity of the search after cache.
+///
+/// For the moment this value is hardcoded.
+/// TODO make configurable.


can you open an issue on this?

Also, I don't understand the comment "
/// Assuming a search context of 1MB, this can
/// amount to up to 1GB.
"

Are you talking about the scroll context? If not, what is the search context?
And another question where the 1000 factor comes from (1MB -> 1GB)?

For airmail project, the scroll API will be used for exporting data, up to 20k hits. Does this mean we need 20GB of RAM on a searcher?

fulmicoton requested a review from trinity-1686a August 3, 2023 05:37

fulmicoton mentioned this pull request Aug 3, 2023

Support Elasticsearch Scroll API #3573

Closed

guilload approved these changes Aug 4, 2023

View reviewed changes

fulmicoton force-pushed the issue/3551-scroll-api branch from e3488e7 to 5aadcbc Compare August 7, 2023 07:40

Added Scroll API

071f895

See Design details in docs/internals/scroll.md Closes #3551

fulmicoton force-pushed the issue/3551-scroll-api branch from 5aadcbc to 071f895 Compare August 7, 2023 07:58

fulmicoton enabled auto-merge (squash) August 7, 2023 08:09

Merge branch 'main' into issue/3551-scroll-api

768a7c0

fulmicoton merged commit 5d2c66e into main Aug 7, 2023
3 checks passed

fulmicoton deleted the issue/3551-scroll-api branch August 7, 2023 08:31

fmassot reviewed Aug 7, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added Scroll API #3705

Added Scroll API #3705

fulmicoton commented Aug 3, 2023 •

edited

Loading

guilload Aug 4, 2023

fulmicoton Aug 7, 2023 •

edited

Loading

guilload Aug 4, 2023

fulmicoton Aug 7, 2023 •

edited

Loading

guilload Aug 4, 2023

fulmicoton Aug 7, 2023 •

edited

Loading

guilload Aug 4, 2023 •

edited

Loading

fulmicoton Aug 7, 2023

guilload Aug 4, 2023

fulmicoton Aug 7, 2023

guilload Aug 4, 2023

guilload Aug 4, 2023

fulmicoton Aug 7, 2023

guilload Aug 4, 2023

fulmicoton Aug 7, 2023

fulmicoton Aug 7, 2023

fmassot Aug 7, 2023 •

edited

Loading

	async fn search_partial_hits_round_with_scroll(
	async fn search_partial_hits_phase_with_scroll(

Added Scroll API #3705

Added Scroll API #3705

Conversation

fulmicoton commented Aug 3, 2023 • edited Loading

Choose a reason for hiding this comment

fulmicoton Aug 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fulmicoton Aug 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fulmicoton Aug 7, 2023 • edited Loading

Choose a reason for hiding this comment

guilload Aug 4, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fmassot Aug 7, 2023 • edited Loading

Choose a reason for hiding this comment

fulmicoton commented Aug 3, 2023 •

edited

Loading

fulmicoton Aug 7, 2023 •

edited

Loading

fulmicoton Aug 7, 2023 •

edited

Loading

fulmicoton Aug 7, 2023 •

edited

Loading

guilload Aug 4, 2023 •

edited

Loading

fmassot Aug 7, 2023 •

edited

Loading