-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Discuss] Design for Tiered File Cache & Block Level fetch #9987
Comments
@shwetathareja @biesps @andrross |
Thanks @ankitkala, this is great! A few questions:
|
Thanks @ankitkala
You might want to consider ARC for eviction strategy.
How do we ensure Tier 1 and Tier 2 caches are correctly accounted for, thrashing will cause the mmapped file access to be moved to disk, unless you are can pin/lock the file explicitly. Another important thing is, can we design it to be tier agnostic and think of block as the basic unit of cache.
For prefetching, it might be worth fetching top-k blocks in advance for better performance. Essentially read the required data block, and an additional amount of data that varies depending on the access pattern. We can start prefetching an extra block and keep doubling it on every subsequent read, upto certain max threshold. |
Intent was to have different flavors of blocked index inputs(e.g. open against file in remote v/s local disk).
Correct. Incase the warm shards and hot shards are on the same node, we can definitely run into this.
Sure. Like you called out, we can definitely explore getting rid of segmented cache is favor of a finer grained locking mechanism. |
Interesting!! Let me explore this route as well.
Yeah, total usage in Tier 1 & Tier 2 are not going to be true indicators of the actual usage.
Wanted to understand a bit more on the tier agnostic part. IndexInput are the basic unit of cache today. Block as a basic unit makes sense but the downside is that the we will need to store the data on disk also as blocks.
Ack. |
If files are opened for read, there is no isolation between tier1 and tier2 tier2 files can still thrash the blocks of tier1. To minimise this we should
|
Have you also evaluated the cache eviction policy comparing clock-based cache replacement policy and LRU. LRU may have certain deficiencies around cases like, a burst of references to infrequently used blocks, such as sequential scans through large files, may cause the replacement of frequently referenced blocks in cache. An effective replacement algorithm would be able to prevent hot blocks from being evicted by cold blocks. |
I've added this as an enhancement item: #12809 |
Coming from RFC for Multi-tiered File Cache, this issue captures the low level components and the flows.
Multi-tiered File Cache works along with Directory abstraction(for storing lucene segment files) which allows lucene indices(a.k.a shards) to be agnostic of the data locality(memory, local disk, remote store). Internally it implements a Multi Tiered Cache(Tier 1 Memory mapped, Tier 2 Disk, Tier 3 Remote store) and takes care of managing the data across all tiers as well as its movement.
Use case:
Ideally, all types of indices(local/warm/remote backed hot) can be managed with the File Cache where depending on shard type, tiers and its promotion criteria can be different. For current scope however, we intend to enable this support for remote based warm indices as it gains the maximum benefit from this abstraction.
This is helpful as it'll allow us to write the data to a warm shard without loading the entire data thereby improving the time to recovery. Similarly, we can also explore enabling support for warm shards with 0 replicas (for non-critical indices). Another benefit(with Block level fetch) is the lower memory footprint of the shard on the node.
This component will allows us to lazy load the shard’s data for read and write on-demand basis. This is helpful as it'll allow us to write the data to a warm shard without loading the entire data thereby improving the time to recovery. Similarly, we can also explore enabling support for warm shards with 0 replicas (for non-critical indices). Another benefit(with Block level fetch) is the lower memory footprint of the shard on the node.
This component will also allow us to introduce a working set model for every shard, where working set for a shard is defined as the set of files (or blocks of files) which a shard is currently using (for reads/writes/merges etc). With working set model the shards will have the capability to lazy load the data needed into the shard’s working set. Since working set of a shard is going to be a function of time it is expected that the files (or blocks of files) part of working set are going to be evicted and added with time in the file-cache on-demand basis.
Caveats:
Tier 3 is the source of truth for the shard. As soon as new data gets uploaded to the remote store, File cache starts tracking the new files.
How it works
User Flows:
Movement across Tiers:
Future Enhancements:
The text was updated successfully, but these errors were encountered: