Adding a split cache in Searchers (#3857)

Quickwit includes a split cache. It can be useful for specific workloads: to improve performance to reduce the cost associated with GET requests. The split cache stores entire split files on disk. It works under the following configurable constraints: number of concurrent download amount of disk space number of on-disk files. Searcher get tipped by indexers about the existence of splits (for which they have the best affinity). They also might learn about split existence, upon read requests. The searcher is then in charge of maintaining an in-memory datastructure with a bounded list of splits it knows about and their score. The current strategy for admission/evicton is a simple LRU logic. If the most recently accessed splits not already in cache has been accessed, we consider downloading it. If the limits have been reached, we only proceed to eviction if one of the split currently in cache has been less recently accessed.
quickwit-oss · Sep 21, 2023 · b9a2215 · b9a2215
1 parent 66c8e4f
commit b9a2215
Show file tree

Hide file tree

Showing 47 changed files with 2,069 additions and 166 deletions.
diff --git a/.gitignore b/.gitignore
@@ -6,6 +6,7 @@
 **/flamegraph.svg
 local/**
 quickwit/quickwit-ui/package-lock.json
+**/.DS_Store
 
 
 # Remove Cargo.lock from gitignore if creating an executable, leave it for libraries

diff --git a/docs/internals/searcher-split-cache.md b/docs/internals/searcher-split-cache.md
@@ -0,0 +1,24 @@
+
+# Searcher split cache
+
+Quickwit includes a split cache. It can be useful for specific workloads:
+- to improve performance
+- to reduce the cost associated with GET requests.
+
+The split cache stores entire split files on disk.
+It works under the following configurable constraints:
+- number of concurrent downloads
+- amount of disk space
+- number of on-disk files.
+
+Searcher get tipped by indexers about the existence of splits (for which they have the best affinity).
+They also might learn about split existence, upon read requests.
+
+The searcher is then in charge of maintaining an in-memory data structure with a bounded list of splits it knows about and their score.
+The current strategy for admission/evicton is a simple LRU logic.
+
+If the most recently accessed split not already in cache has been accessed, we consider downloading it.
+If the limits have been reached, we only proceed to eviction if one of the split currently
+in cache has been less recently accessed.
+
+