zdbd: allow large data chunks transfert for clone #159

maxux · 2023-12-22T02:59:33Z

In order to fast clone one namespace to another instance of zdb, a specific command which would transfert full chunk of data file in one shot would be really important to benefit of full line speed.

There is already DATA RAW command which fetch a specific entry based on offset, but that's inefficient when there are lot of small entries.

This command would be only for administrators obviously, since it could leak data and slow down process.

The text was updated successfully, but these errors were encountered:

maxux · 2024-03-08T19:09:32Z

Implementation started on branch development-v2-data-segment. There is a working version already which can export and import part of data. On local database.

This feature allowed me to clone a full namespace of 31G (locally) in 1 min 58 seconds without any external tools.

There is two new command (only available for administrator):

DATA EXPORT [dataid] [offset]
DATA IMPORT [dataid] [offset] [data]

When doing EXPORT, zdb is sending a 4 MB chunk of data_id:data_offset in one shot to client. Client can't choose the chunk size, the 4 MB is a hardcoded size which seems good to avoid locking zdb, take benefit of line bandwidth, is below any hard limit set on redis protocol level and doesn't consume lot of memory.

Import works the same way except that you can only import to the current (last) data_id, you can't import an already closed (immuable) datafile. In addition, this feature is only allowed on frozen namespace to avoid any side changes. This feature is designed to clone a namespace from scratch, this feature can't be used to clone a similar namespace if data are not exactly the same.

Workflow when importing:

(Ensure namespace is empty)
Freeze the namespace (with NSSET freeze)
Check current data_id and data_offset via NSINFO and fetch that from master
Import data
When -EOF is reached, jump to the next file with NSJUMP and keep cloning
When -EOF is reached and data_id and data_offset are the same than master, data are sync.

There is a script which does that already in place: tools/export-import/eximport.py

Next step is getting the index ready. Best solution in my opinion to achieve that is implementing an INDEX REBUILD based on data files, so index can be created from scratch from data file. There is an issue already talking about that, that would be nice #160.

maxux added the type_feature label Dec 22, 2023

maxux self-assigned this Dec 22, 2023

maxux mentioned this issue Dec 22, 2023

zdbd: implement index rebuild as internal command #160

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zdbd: allow large data chunks transfert for clone #159

zdbd: allow large data chunks transfert for clone #159

maxux commented Dec 22, 2023

maxux commented Mar 8, 2024 •

edited

Loading

zdbd: allow large data chunks transfert for clone #159

zdbd: allow large data chunks transfert for clone #159

Comments

maxux commented Dec 22, 2023

maxux commented Mar 8, 2024 • edited Loading

maxux commented Mar 8, 2024 •

edited

Loading