Skip to content

Commit

Permalink
Adding a split cache in Searchers (#3857)
Browse files Browse the repository at this point in the history
Quickwit includes a split cache. It can be useful for specific workloads:

to improve performance
to reduce the cost associated with GET requests.
The split cache stores entire split files on disk.
It works under the following configurable constraints:

number of concurrent download
amount of disk space
number of on-disk files.
Searcher get tipped by indexers about the existence of splits (for which they have the best affinity).
They also might learn about split existence, upon read requests.

The searcher is then in charge of maintaining an in-memory datastructure with a bounded list of splits it knows about and their score.
The current strategy for admission/evicton is a simple LRU logic.

If the most recently accessed splits not already in cache has been accessed, we consider downloading it.
If the limits have been reached, we only proceed to eviction if one of the split currently
in cache has been less recently accessed.
  • Loading branch information
fulmicoton authored Sep 21, 2023
1 parent 66c8e4f commit b9a2215
Show file tree
Hide file tree
Showing 47 changed files with 2,069 additions and 166 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
**/flamegraph.svg
local/**
quickwit/quickwit-ui/package-lock.json
**/.DS_Store


# Remove Cargo.lock from gitignore if creating an executable, leave it for libraries
Expand Down
24 changes: 24 additions & 0 deletions docs/internals/searcher-split-cache.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@

# Searcher split cache

Quickwit includes a split cache. It can be useful for specific workloads:
- to improve performance
- to reduce the cost associated with GET requests.

The split cache stores entire split files on disk.
It works under the following configurable constraints:
- number of concurrent downloads
- amount of disk space
- number of on-disk files.

Searcher get tipped by indexers about the existence of splits (for which they have the best affinity).
They also might learn about split existence, upon read requests.

The searcher is then in charge of maintaining an in-memory data structure with a bounded list of splits it knows about and their score.
The current strategy for admission/evicton is a simple LRU logic.

If the most recently accessed split not already in cache has been accessed, we consider downloading it.
If the limits have been reached, we only proceed to eviction if one of the split currently
in cache has been less recently accessed.


Loading

0 comments on commit b9a2215

Please sign in to comment.