You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I had searched in the issues and found no similar issues.
Motivation
Hi folks,
First of all - I just wanted to say that this is an awesome project 🙂
Secondly -
I wondered whether it's possible to load data to Kvrocks via RocksDB's IngestExternalFile.
The use case is real-world.
I currently work on a system that relies on (non-distributed) RocksDB, and we'd like to possible start using Kvrocks instead.
Every once in a while, we use an offline, "bulky" Spark process which essentially generates a complete view of the RocksDB database.
This is done by creating SST files directly, which is pretty cool*.
The system then downloads these files locally and just points RocksDB to use them.
This way, we can leverage Spark's super-scalable compute to create a dataset (of ~20B tiny records) which would otherwise take a long, long time to write to an empty RocksDB database.
Q:
Since Kvrocks uses RocksDB as its backend, I wondered - how hard would it be to do something like this?
Thanks!
Technically: we create a Spark dataframe with the data in two columns, and we sort the data globally by key.
Then, a custom Spark module we've written creates an SST file using the RocksJava binding.
Each dataframe partition is turned into a separate SST file.
Since the data is sorted and keys are strictly unique, the SST files are non-overlapping, and thus are ingested to the bottomest level of RocksDB.
Solution
I assume that a solution would involve the following components:
An offline library to create Kvrocks-compatible SSTs (i.e. conforming to this)
A server API which can be given a list of SSTs to download and create a Kvrocks set from, using RockDB's IngestExternalFile.
Are you willing to submit a PR?
I'm willing to submit a PR!
The text was updated successfully, but these errors were encountered:
Yes, some users also proposed to support ingesting extern files: #1301, #1628. And the solution what you have mentioned is correct to implement this feature. But AFAIK, no community volunteer is working on it for now. Welcome to contribute if you're willing to do that.
Search before asking
Motivation
Hi folks,
First of all - I just wanted to say that this is an awesome project 🙂
Secondly -
I wondered whether it's possible to load data to Kvrocks via RocksDB's IngestExternalFile.
The use case is real-world.
I currently work on a system that relies on (non-distributed) RocksDB, and we'd like to possible start using Kvrocks instead.
Every once in a while, we use an offline, "bulky" Spark process which essentially generates a complete view of the RocksDB database.
This is done by creating SST files directly, which is pretty cool*.
The system then downloads these files locally and just points RocksDB to use them.
This way, we can leverage Spark's super-scalable compute to create a dataset (of ~20B tiny records) which would otherwise take a long, long time to write to an empty RocksDB database.
Q:
Since Kvrocks uses RocksDB as its backend, I wondered - how hard would it be to do something like this?
Thanks!
Then, a custom Spark module we've written creates an SST file using the RocksJava binding.
Each dataframe partition is turned into a separate SST file.
Since the data is sorted and keys are strictly unique, the SST files are non-overlapping, and thus are ingested to the bottomest level of RocksDB.
Solution
I assume that a solution would involve the following components:
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: