Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request batching does not track encoding overhead accurately #2106

Closed
sjvanrossum opened this issue Jul 8, 2024 · 2 comments · Fixed by #2113
Closed

Request batching does not track encoding overhead accurately #2106

sjvanrossum opened this issue Jul 8, 2024 · 2 comments · Fixed by #2113
Assignees
Labels
api: pubsub Issues related to the googleapis/java-pubsub API.

Comments

@sjvanrossum
Copy link
Contributor

Steps to reproduce

  1. Set the byte size per batched request to >9.1 MB.
  2. Construct a batch of sufficiently large messages while respecting limits per message (max 10 MB data + attributes + ordering key, max 100 attributes, max 256 B per attribute key, max 1024 B per attribute value, max 1024 B per ordering key) to exceed either the serialized size limit of 10 MB per PublishRequest or 10 MiB request size limit per call to publish (pointless because of the first limit unless explicit size validation is enabled).
  3. Publish the batch.
  4. Observe thrown exception, either an early rejection (>10 MiB) or an internal error (>10 MB, <10 MiB) on request_size if explicit size validation isn't enabled for the topic.

External references such as API reference guides

The API reference does not provide adequate documentation on these limits, since it suggests that explicit size validation is used.

Any additional information below

Discovered this while reviewing size validation in PubsubIO for Apache Beam, see apache/beam#31800 for details.
Happy to open a PR with fixes when I get a chance, just wanted to highlight this ahead of time for discussion.

This shouldn't be an issue if a user recovers by lowering the byte size per batch (<9 MB is generally safe with current limits per message), but the client library should track the outgoing request size accurately on the user's behalf in my opinion.

Also note that allowing users to use the REST API for Pub/Sub as proposed in #2075 would complicate matters since the serialized size of publish requests sent as JSON can be smaller than the serialized size of the same requests after it's transcoded into protobuf. For this message type a protobuf message can be up to ~38 KB smaller or up to ~0.9 MB larger than its JSON counterpart.
An example of this would be a JSON publish request of 736 messages, each message with no data and 100 attributes, each attribute with no value and a key of 128 alphanumeric ASCII characters. The JSON request does not exceed 10 MB, but the protobuf transcoded request that's passed to Pub/Sub exceeds it by ~145 KB. That triggers the internal validation error without explicit size validation, but I guess this will become a non-issue when explicit size validation is used by default.

@product-auto-label product-auto-label bot added the api: pubsub Issues related to the googleapis/java-pubsub API. label Jul 8, 2024
@sjvanrossum sjvanrossum changed the title Request batching does not accurately track encoding overhead Request batching does not track encoding overhead accurately Jul 9, 2024
@sjvanrossum
Copy link
Contributor Author

sjvanrossum commented Jul 12, 2024

This reduces the risk of publish failures for the corner case when a Publisher is configured without compression and a maximal byte size threshold for batching:

BatchingSettings batchingSettings = BatchingSettings
  .newBuilder()
  .setRequestByteThreshold(Publisher.getApiMaxRequestBytes())
  .build();

The change in #2113 prevents an OutstandingBatch from producing a slightly oversized PublishRequest for this scenario so long as a single PubsubMessage does not exceed this limit, but that would be considered a user error to begin with.

@michaelpri10 as far as I'm aware, the proposed change makes no assumption about Pub/Sub limits since those are subject to change or the effect of compression since a byte size threshold above the max request size could be valid for highly compressible batches if explicit size validation is used.

@michaelpri10
Copy link
Contributor

Hello @sjvanrossum, apologies for such a long delay in getting to this ticket, but thank you for raising this issue and proposing a fix. I've left a few comments on #2113 to address, but overall the approach you've taken looks correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: pubsub Issues related to the googleapis/java-pubsub API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants