Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POC for ssd cache #3723

Closed
wants to merge 25 commits into from
Closed

POC for ssd cache #3723

wants to merge 25 commits into from

Conversation

imotov
Copy link
Collaborator

@imotov imotov commented Aug 8, 2023

Description

Proof of concept for adding SSD cache to make sure this architecture is feasible. It is still in the very early stages, not all features are implemented.

To setup add the following node configuration is needed:

storage:
  cache:
    storage_uri: file:///tmp/storage
    max_cache_storage_disk_usage: "5G"

On the index side the following setting should be added to the config:

    "index_uri": "cache://file:///tmp/path_to/index/index_id'",

Fixes #3443

How was this PR tested?

Still need to add tests.

Outstanding issues

This is still WIP. The following tasks are still need to be finished for MVP:

  • Clean up cache storage configuration (needs some reasonable defaults)
  • Comprehensive unit testing of subcomponents
  • Rehydration of the current cache split state on the node startup
  • Limiting the maximum space used by downloaded and in-flight splits
  • Postponing split rebalancing until the cluster reached the desired number of nodes or a certain time passed after the number of nodes threshold has been breached.
  • Throttling the number of simulations split downloads and network load on search nodes.
  • Make storage copy_to_storage more robust
  • Optimize reading metadata with a large rate of change
  • Modify URI format to cache:file:///tmp

Beyond MVP:

  • Refactor Storage Configuration

@imotov imotov force-pushed the issue/3443-ssd-cache branch 3 times, most recently from 9bf68cb to ea172ae Compare August 18, 2023 03:40
Cache storage is resolved only onced and cached.
Cache uri is a Uri.
Counters are not copied all over the place. The Factory one was actually disconnected from the rest.
Removed redundant `Arc` indirection
@imotov
Copy link
Collaborator Author

imotov commented Sep 5, 2023

Superseded by #3786

@imotov imotov closed this Sep 5, 2023
@guilload guilload deleted the issue/3443-ssd-cache branch October 2, 2023 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SSD Split "cache"
2 participants