-
Notifications
You must be signed in to change notification settings - Fork 330
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Showing
28 changed files
with
1,538 additions
and
194 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
# Scroll API | ||
|
||
The scroll API has been implemented to offer compatibility with ElasticSearch. | ||
The API and the implementation are quirky and are detailed in this document. | ||
|
||
## API description | ||
|
||
You can find information about the scroll API here. | ||
https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html#scroll-search-results | ||
https://www.elastic.co/guide/en/elasticsearch/reference/current/scroll-api.html | ||
|
||
The user runs a regular search request with a `scroll` param. | ||
The search result then contains the normal response, but a `_scroll` property is added to the search body. | ||
|
||
That id is then meant to be sent to a scroll rest API. | ||
This API successive calls will then return pages incrementally. | ||
|
||
## Quirk and difficulty. | ||
|
||
The scrolled results should be consistent with the state of the original index. | ||
For this reason we need to capture the state of the index at the point of the original request. | ||
|
||
The elasticsearch API is needlessly broken as it returns the same scroll_id over and over. | ||
If a network error happens between the client and the server at page N, there is no way for the client to ask the reemission of page N. | ||
Page N+1 will be returned on the next call. | ||
|
||
|
||
## Implementation | ||
|
||
The scroll context contains: | ||
- the detail about the original query (we need to be able to reemit paginated queries) | ||
- the list of split metadata used for the query | ||
- a cached list of partial docs (= not the doc content, just its address and its score) to avoid | ||
- the total number of results, in order to append that information to our response. | ||
searching at every single scroll requests. | ||
|
||
We use a simple leaderless KV store to keep the state required to run the scroll API. | ||
We generate a scroll ULID and use it to get a list of the servers with the best affinity according | ||
to rendez vous hashing. We then go through them in order and attempt to put that key on up to 2 servers. | ||
Failures for these PUTs are silent. | ||
|
||
For each call to scroll, one of two things can happen: | ||
- the partial docs for the page requested is in the partial doc cache. We just run the fetch_docs phase, | ||
and update the context with the `start_offset`. | ||
- the partial docs for the page request are not in the partial doc cache. We then run a new search query. | ||
|
||
We attempt to fetch `SCROLL_CACHE_CAPACITY` in order to fill the partial doc address cache for subsequent calls. | ||
|
||
# A strange `scroll_id`. | ||
|
||
For more robustness, the scroll id is the concatenation of the | ||
- ULID: used as the address for the search context. | ||
- the start_offset. | ||
|
||
The idea here is that if that if the put request failed, we can still return the right results even if we have an obsolete version of the `ScrollContext`. | ||
We indeed take the max of the start_offset supplied in the `scroll_id` and present in the `ScrollContext`. | ||
|
||
# Quickwit specific quirks | ||
|
||
Our state is pretty volatile. Some scrolls may end up being broken if we were to remove 2 servers within 30mn. | ||
|
||
The scroll lifetime does not extend the life of a scroll context. | ||
We do not anything to prevent splits from being GCed and only rely on the grace period to make sure this does not happen. | ||
|
||
The ES API does not always updates the `_scroll_id`. It does not seem to change in the different calls. | ||
A misimplemented client might therefore appear to work correctly on one shard with elasticsearch and not work on quickwit. |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
// Copyright (C) 2023 Quickwit, Inc. | ||
// | ||
// Quickwit is offered under the AGPL v3.0 and as commercial software. | ||
// For commercial licensing, contact us at [email protected]. | ||
// | ||
// AGPL: | ||
// This program is free software: you can redistribute it and/or modify | ||
// it under the terms of the GNU Affero General Public License as | ||
// published by the Free Software Foundation, either version 3 of the | ||
// License, or (at your option) any later version. | ||
// | ||
// This program is distributed in the hope that it will be useful, | ||
// but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | ||
// GNU Affero General Public License for more details. | ||
// | ||
// You should have received a copy of the GNU Affero General Public License | ||
// along with this program. If not, see <http://www.gnu.org/licenses/>. | ||
|
||
use std::time::Duration; | ||
|
||
/// We cannot safely delete splits right away as a: | ||
/// - in-flight queries could actually have selected this split, | ||
/// - scroll queries may also have a point in time on these splits. | ||
/// | ||
/// We deal this probably by introducing a grace period. A split is first marked as delete, | ||
/// and hence won't be selected for search. After a few minutes, once it reasonably safe to assume | ||
/// that all queries involving this split have terminated, we effectively delete the split. | ||
/// This duration is controlled by `DELETION_GRACE_PERIOD`. | ||
pub const DELETION_GRACE_PERIOD: Duration = Duration::from_secs(60 * 32); // 32 min |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.