You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Construct a batch of sufficiently large messages while respecting limits per message (max 10 MB data + attributes + ordering key, max 100 attributes, max 256 B per attribute key, max 1024 B per attribute value, max 1024 B per ordering key) to exceed either the serialized size limit of 10 MB per PublishRequest or 10 MiB request size limit per call to publish (pointless because of the first limit unless explicit size validation is enabled).
Publish the batch.
Observe thrown exception, either an early rejection (>10 MiB) or an internal error (>10 MB, <10 MiB) on request_size if explicit size validation isn't enabled for the topic.
External references such as API reference guides
The API reference does not provide adequate documentation on these limits, since it suggests that explicit size validation is used.
Any additional information below
Discovered this while reviewing size validation in PubsubIO for Apache Beam, see apache/beam#31800 for details.
Happy to open a PR with fixes when I get a chance, just wanted to highlight this ahead of time for discussion.
This shouldn't be an issue if a user recovers by lowering the byte size per batch (<9 MB is generally safe with current limits per message), but the client library should track the outgoing request size accurately on the user's behalf in my opinion.
Also note that allowing users to use the REST API for Pub/Sub as proposed in #2075 would complicate matters since the serialized size of publish requests sent as JSON can be smaller than the serialized size of the same requests after it's transcoded into protobuf. For this message type a protobuf message can be up to ~38 KB smaller or up to ~0.9 MB larger than its JSON counterpart.
An example of this would be a JSON publish request of 736 messages, each message with no data and 100 attributes, each attribute with no value and a key of 128 alphanumeric ASCII characters. The JSON request does not exceed 10 MB, but the protobuf transcoded request that's passed to Pub/Sub exceeds it by ~145 KB. That triggers the internal validation error without explicit size validation, but I guess this will become a non-issue when explicit size validation is used by default.
The text was updated successfully, but these errors were encountered:
sjvanrossum
changed the title
Request batching does not accurately track encoding overhead
Request batching does not track encoding overhead accurately
Jul 9, 2024
This reduces the risk of publish failures for the corner case when a Publisher is configured without compression and a maximal byte size threshold for batching:
The change in #2113 prevents an OutstandingBatch from producing a slightly oversized PublishRequest for this scenario so long as a single PubsubMessage does not exceed this limit, but that would be considered a user error to begin with.
@michaelpri10 as far as I'm aware, the proposed change makes no assumption about Pub/Sub limits since those are subject to change or the effect of compression since a byte size threshold above the max request size could be valid for highly compressible batches if explicit size validation is used.
Hello @sjvanrossum, apologies for such a long delay in getting to this ticket, but thank you for raising this issue and proposing a fix. I've left a few comments on #2113 to address, but overall the approach you've taken looks correct.
Steps to reproduce
PublishRequest
or 10 MiB request size limit per call to publish (pointless because of the first limit unless explicit size validation is enabled).External references such as API reference guides
The API reference does not provide adequate documentation on these limits, since it suggests that explicit size validation is used.
Any additional information below
Discovered this while reviewing size validation in
PubsubIO
for Apache Beam, see apache/beam#31800 for details.Happy to open a PR with fixes when I get a chance, just wanted to highlight this ahead of time for discussion.
This shouldn't be an issue if a user recovers by lowering the byte size per batch (<9 MB is generally safe with current limits per message), but the client library should track the outgoing request size accurately on the user's behalf in my opinion.
Also note that allowing users to use the REST API for Pub/Sub as proposed in #2075 would complicate matters since the serialized size of publish requests sent as JSON can be smaller than the serialized size of the same requests after it's transcoded into protobuf. For this message type a protobuf message can be up to ~38 KB smaller or up to ~0.9 MB larger than its JSON counterpart.
An example of this would be a JSON publish request of 736 messages, each message with no data and 100 attributes, each attribute with no value and a key of 128 alphanumeric ASCII characters. The JSON request does not exceed 10 MB, but the protobuf transcoded request that's passed to Pub/Sub exceeds it by ~145 KB. That triggers the internal validation error without explicit size validation, but I guess this will become a non-issue when explicit size validation is used by default.
The text was updated successfully, but these errors were encountered: