-
Notifications
You must be signed in to change notification settings - Fork 268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(retry): Avoid panicking if responses come early #3216
Conversation
linkerd/http/retry/src/replay.rs
Outdated
pub fn is_capped(&self) -> bool { | ||
pub fn is_capped(&self) -> Option<bool> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
☝️ this is the core of the proposed change.
if there's an outstanding clone still being polled, return None
.
ae4ec8b
to
233f68a
Compare
This comment was marked as resolved.
This comment was marked as resolved.
62c2cf7
to
a249faf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
w00t
#3216 (comment) Co-Authored-By: Oliver Gould <[email protected]> Signed-off-by: katelyn martin <[email protected]>
`linkerd-http-retry` and `linkerd-retry` provide generic `tower` middleware to allow services to retry requests that fail. part of this middleware hinges on a `ReplayBody<B>` that will lazily buffer request bodies' data for use in subsequent attempts. if a request fails, the retry middleware will attempt to send another request, assuming the original body was able to completely fit into this buffer. `ReplayBody<B>` makes a subtle assummption, most succinctly stated in this excerpt of an internal method's documentation: ```rust // linkerd/http/retry/src/replay.rs impl<B: Body> ReplayBody<B> { /// This panics if another clone has currently acquired the state, based on /// the assumption that a retry body will not be polled until the previous /// request has been dropped. fn acquire_state<'a>( state: &'a mut Option<BodyState<B>>, shared: &Mutex<Option<BodyState<B>>>, ) -> &'a mut BodyState<B> { // ... } } ``` this assumption is slightly at odds with the request/response lifecycle permitted within the HTTP/2 specification. see RFC 9113 § 8.1, "_HTTP Message Framing_" (emphasis added): > .. > An HTTP request/response exchange fully consumes a single stream. A > request starts with the HEADERS frame that puts the stream into the > "open" state. **The request ends with a frame with the END_STREAM flag > set**, which causes the stream to become "half-closed (local)" for the > client and "half-closed (remote)" for the server. A response stream > starts with zero or more interim responses in HEADERS frames, followed > by a HEADERS frame containing a final status code. > > An HTTP response is complete after the server sends -- or the client > receives -- a frame with the END_STREAM flag set (including any > CONTINUATION frames needed to complete a field block). **A server can > send a complete response prior to the client sending an entire request > if the response does not depend on any portion of the request that has > not been sent and received.** <https://www.rfc-editor.org/rfc/rfc9113.html#section-8.1-11> because of this, a retry may panic when checking if the previous request body was capped, if a server delivers a response before the request is complete. this has been observed when retrying [wire-grpc](https://github.com/square/wire) requests, manifesting in a panic with this message: ```text thread 'main' panicked at 'if our `state` was `None`, the shared state must be `Some`', /__w/linkerd2-proxy/linkerd2-proxy/linkerd/http-retry/src/replay.rs:152:22 ``` this commit refactors `ReplayBody::is_capped()` so that it will no longer panic if there is an outstanding body still being polled. rather, it will return `Some(true)` or `Some(false)` if the previous body was capped, or `None` if it has not finished streaming. the related logic in the `linkerd-http-retry` library is updated to refrain from attempting a retry if a response is received before the request stream was completed. Signed-off-by: katelyn martin <[email protected]>
#3216 (comment) Co-Authored-By: Oliver Gould <[email protected]> Signed-off-by: katelyn martin <[email protected]>
2305731
to
426017d
Compare
Nice catch, this one had totally stumped me! :) |
linkerd-http-retry
andlinkerd-retry
provide generictower
middleware to allow services to retry requests that fail.
part of this middleware hinges on a
ReplayBody<B>
that will lazilybuffer request bodies' data for use in subsequent attempts. if a request
fails, the retry middleware will attempt to send another request,
assuming the original body was able to completely fit into this buffer.
ReplayBody<B>
makes a subtle assummption, most succinctly stated inthis excerpt of an internal method's documentation:
this assumption is slightly at odds with the request/response lifecycle
permitted within the HTTP/2 specification. see RFC 9113 § 8.1, "HTTP
Message Framing" (emphasis added):
https://www.rfc-editor.org/rfc/rfc9113.html#section-8.1-11
because of this, a retry may panic when checking if the previous request
body was capped, if a server delivers a response before the request is
complete. this has been observed when retrying
wire-grpc requests, manifesting in a
panic with this message:
this commit refactors
ReplayBody::is_capped()
so that it will nolonger panic if there is an outstanding body still being polled.
rather, it will return
Some(true)
orSome(false)
if the previousbody was capped, or
None
if it has not finished streaming.the related logic in the
linkerd-http-retry
library is updated torefrain from attempting a retry if a response is received before the
request stream was completed.