Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swift: encode & decode size-delimited messages #2424

Merged
merged 1 commit into from
Nov 12, 2024

Conversation

jszumski
Copy link
Collaborator

Adds the ability to encode and decode a size-delimited message collection in Swift.

The intended use case is to decode a stream of messages recorded to disk (e.g., Bazel's Build Event Protocol). Similar implementations exist in Java (writeDelimitedTo and parseDelimitedFrom) and other languages in protocolbuffers/protobuf#10229.

// Use the size of the struct as an initial estimate for the space needed.
let structSize = MemoryLayout.size(ofValue: T.self)

let fullBuffer = WriteBuffer(capacity: structSize * values.count)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this calculation attempt to reserve some capacity for the varints or just let the buffer expand naturally?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to allocate slightly more than slightly less (expanding requires copying memory). I would change to (8 + structSize) * values.count where the 8 is 64 bits for the varint.

There's probably already plenty of buffer realistically, since I think the struct size is almost always larger than what is written, but I'd still go with what I mentioned and we can do some profiling and optimizing at a later date.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to reserve space 👍

@@ -36,7 +36,7 @@ final class WriteBuffer {
init(capacity: Int = 0) {
self.capacity = 0

if capacity > 0 {
if capacity >= 0 {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was required to ensure storage was allocated in all cases (see testAppendEmptyFirst)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little wary of calling malloc(0) in expand(). In C at least, the behavior for this can vary from runtime to runtime.

Can we achieve the same thing by adding something like this into append:

guard !data.isEmpty else { return }

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added guards where appropriate

try data.withUnsafeBytes { buffer in
// Handle the empty-data case.
guard let baseAddress = buffer.baseAddress, buffer.count > 0 else {
values = [try T(from: .empty)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the spec specify that "no bytes" should be one empty object vs. an empty array? (Or can we verify with protoc?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have been an empty array

// Use the size of the struct as an initial estimate for the space needed.
let structSize = MemoryLayout.size(ofValue: T.self)

let fullBuffer = WriteBuffer(capacity: structSize * values.count)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to allocate slightly more than slightly less (expanding requires copying memory). I would change to (8 + structSize) * values.count where the 8 is 64 bits for the varint.

There's probably already plenty of buffer realistically, since I think the struct size is almost always larger than what is written, but I'd still go with what I mentioned and we can do some profiling and optimizing at a later date.

}

// write this value's size + contents to the main buffer
fullBuffer.writeVarint(UInt64(writer.buffer.count), at: fullBuffer.count)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: We have some tricks inside the writer to avoid this write-then-copy inside of the ProtoWriter (see beginLengthDelimitedEncode). They're probably not worth doing here, but for future archaeologists who might be trying to optimize this I'll mention.

Basically, most messages will be in the size range that requires (IIRC) two bytes of varint, so we reserve two bytes, then write out value into the full buffer. If it happens to not take two bytes for its size, only then do we go back and move it to add more or less room for the prefix size.

In the future, we could extract this into an extension on WriteBuffer, like:

// This will reserve the length bytes, then write the buffer, then adjust and fill in the length
// bytes based on what was written.
buffer.writeLengthEncoded { tempBuffer in
   let writer = ProtoWriter(data: tempBuffer, ...)
   try value.encode(to: writer)
}

Again, all a good future enhancement. Not required now.

@@ -36,7 +36,7 @@ final class WriteBuffer {
init(capacity: Int = 0) {
self.capacity = 0

if capacity > 0 {
if capacity >= 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little wary of calling malloc(0) in expand(). In C at least, the behavior for this can vary from runtime to runtime.

Can we achieve the same thing by adding something like this into append:

guard !data.isEmpty else { return }

@jszumski jszumski force-pushed the jszumski/swift-size-delimited-messages branch 2 times, most recently from 41fdf6a to 7baa8cc Compare July 12, 2023 19:27
@jszumski jszumski force-pushed the jszumski/swift-size-delimited-messages branch 3 times, most recently from 3a6274d to 15f3084 Compare November 8, 2024 20:36
Copy link
Member

@oldergod oldergod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make sure all your comments are sentences?, they all look good but some are missing a period.

@jszumski jszumski force-pushed the jszumski/swift-size-delimited-messages branch from 15f3084 to 986c030 Compare November 11, 2024 15:01
@jszumski jszumski merged commit a262267 into master Nov 12, 2024
11 checks passed
@jszumski jszumski deleted the jszumski/swift-size-delimited-messages branch November 12, 2024 19:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants