Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NBSNEBIUS-258: Support Zero Copy for RDMA Data Path on Disk Agent (#1324) #1617

Merged
merged 1 commit into from
Jul 16, 2024

Conversation

budevg
Copy link
Collaborator

@budevg budevg commented Jul 16, 2024

  • NBSNEBIUS-258: Support Zero Copy for RDMA Data Path on Disk Agent

The Disk Agent currently copies data buffers multiple times for READ/WRITE
requests received using RDMA transport.

For WRITE requests, the RDMA data buffer is first copied into the memory of
TWriteBlocksRequest and then into a disk block-aligned buffer allocated by
Storage.

For READ requests, disk data is first read into a disk block-aligned buffer
allocated by Storage and then copied into the TReadBlocksResponse message. This
message is then serialized into the RDMA buffer.

To avoid these expensive copies and maintain compatibility with older clients,
we introduce the RDMA_PROTO_FLAG_RDATA flag, which signals the data layout
relative to the allocated RDMA buffer.

Previously, the data layout was:

             buffer
|--------------+-------+------+--------|
| TProtoHeader | Proto | Data | unused |
|--------------+-------+------+--------|

This layout allows the data offset in memory to be unaligned to 512/4096 bytes,
even though the RDMA buffer is allocated in 4096-byte chunks. Libaio requires
block-aligned memory buffers for writing to the underlying block device with
O_DIRECT, necessitating a different data layout.

With RDMA_PROTO_FLAG_RDATA, the data layout becomes:

|--------------+-------+--------+------|
| TProtoHeader | Proto | unused | Data |
|--------------+-------+--------+------|

Since the Data buffer size is a multiple of 512/4096 bytes (depending on the
device block size) and the buffer is a multiple of 4096-byte chunks, the data
offset in memory will be 512/4096 bytes aligned, allowing its use with libaio.

)

* NBSNEBIUS-258: Support Zero Copy for RDMA Data Path on Disk Agent

The Disk Agent currently copies data buffers multiple times for READ/WRITE
requests received using RDMA transport.

For WRITE requests, the RDMA data buffer is first copied into the memory of
TWriteBlocksRequest and then into a disk block-aligned buffer allocated by
Storage.

For READ requests, disk data is first read into a disk block-aligned buffer
allocated by Storage and then copied into the TReadBlocksResponse message. This
message is then serialized into the RDMA buffer.

To avoid these expensive copies and maintain compatibility with older clients,
we introduce the RDMA_PROTO_FLAG_RDATA flag, which signals the data layout
relative to the allocated RDMA buffer.

Previously, the data layout was:

```
             buffer
|--------------+-------+------+--------|
| TProtoHeader | Proto | Data | unused |
|--------------+-------+------+--------|
```

This layout allows the data offset in memory to be unaligned to 512/4096 bytes,
even though the RDMA buffer is allocated in 4096-byte chunks. Libaio requires
block-aligned memory buffers for writing to the underlying block device with
O_DIRECT, necessitating a different data layout.

With RDMA_PROTO_FLAG_RDATA, the data layout becomes:

```
|--------------+-------+--------+------|
| TProtoHeader | Proto | unused | Data |
|--------------+-------+--------+------|
```

Since the Data buffer size is a multiple of 512/4096 bytes (depending on the
device block size) and the buffer is a multiple of 4096-byte chunks, the data
offset in memory will be 512/4096 bytes aligned, allowing its use with libaio.
@budevg budevg requested review from drbasic and qkrorlqr July 16, 2024 08:27
Copy link
Contributor

Note

This is an automated comment that will be appended during run.

🟢 default-linux-x86-64-relwithdebinfo: all tests PASSED for commit 15d6bd5.

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
1551 1551 0 0 0 0

@budevg budevg merged commit b06277c into stable-23-3 Jul 16, 2024
6 of 7 checks passed
@budevg budevg deleted the users/evgenybud/merge-23-3-v1 branch July 16, 2024 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants