Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zdbd: allow large data chunks transfert for clone #159

Open
maxux opened this issue Dec 22, 2023 · 1 comment
Open

zdbd: allow large data chunks transfert for clone #159

maxux opened this issue Dec 22, 2023 · 1 comment
Assignees

Comments

@maxux
Copy link
Collaborator

maxux commented Dec 22, 2023

In order to fast clone one namespace to another instance of zdb, a specific command which would transfert full chunk of data file in one shot would be really important to benefit of full line speed.

There is already DATA RAW command which fetch a specific entry based on offset, but that's inefficient when there are lot of small entries.

This command would be only for administrators obviously, since it could leak data and slow down process.

@maxux
Copy link
Collaborator Author

maxux commented Mar 8, 2024

Implementation started on branch development-v2-data-segment. There is a working version already which can export and import part of data. On local database.

This feature allowed me to clone a full namespace of 31G (locally) in 1 min 58 seconds without any external tools.

There is two new command (only available for administrator):

  • DATA EXPORT [dataid] [offset]
  • DATA IMPORT [dataid] [offset] [data]

When doing EXPORT, zdb is sending a 4 MB chunk of data_id:data_offset in one shot to client. Client can't choose the chunk size, the 4 MB is a hardcoded size which seems good to avoid locking zdb, take benefit of line bandwidth, is below any hard limit set on redis protocol level and doesn't consume lot of memory.

Import works the same way except that you can only import to the current (last) data_id, you can't import an already closed (immuable) datafile. In addition, this feature is only allowed on frozen namespace to avoid any side changes. This feature is designed to clone a namespace from scratch, this feature can't be used to clone a similar namespace if data are not exactly the same.

Workflow when importing:

  • (Ensure namespace is empty)
  • Freeze the namespace (with NSSET freeze)
  • Check current data_id and data_offset via NSINFO and fetch that from master
  • Import data
  • When -EOF is reached, jump to the next file with NSJUMP and keep cloning
  • When -EOF is reached and data_id and data_offset are the same than master, data are sync.

There is a script which does that already in place: tools/export-import/eximport.py

Next step is getting the index ready. Best solution in my opinion to achieve that is implementing an INDEX REBUILD based on data files, so index can be created from scratch from data file. There is an issue already talking about that, that would be nice #160.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant