Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(retry): Avoid panicking if responses come early #3216

Merged
merged 2 commits into from
Sep 23, 2024

Conversation

cratelyn
Copy link
Collaborator

@cratelyn cratelyn commented Sep 22, 2024

linkerd-http-retry and linkerd-retry provide generic tower
middleware to allow services to retry requests that fail.

part of this middleware hinges on a ReplayBody<B> that will lazily
buffer request bodies' data for use in subsequent attempts. if a request
fails, the retry middleware will attempt to send another request,
assuming the original body was able to completely fit into this buffer.

ReplayBody<B> makes a subtle assummption, most succinctly stated in
this excerpt of an internal method's documentation:

// linkerd/http/retry/src/replay.rs
impl<B: Body> ReplayBody<B> {
    /// This panics if another clone has currently acquired the state, based on
    /// the assumption that a retry body will not be polled until the previous
    /// request has been dropped.
        fn acquire_state<'a>(
        state: &'a mut Option<BodyState<B>>,
        shared: &Mutex<Option<BodyState<B>>>,
    ) -> &'a mut BodyState<B> {
        // ...
    }
}

this assumption is slightly at odds with the request/response lifecycle
permitted within the HTTP/2 specification. see RFC 9113 § 8.1, "HTTP
Message Framing
" (emphasis added):

An HTTP request/response exchange fully consumes a single stream. A
request starts with the HEADERS frame that puts the stream into the
"open" state. The request ends with a frame with the END_STREAM flag
set
, which causes the stream to become "half-closed (local)" for the
client and "half-closed (remote)" for the server. A response stream
starts with zero or more interim responses in HEADERS frames, followed
by a HEADERS frame containing a final status code.

An HTTP response is complete after the server sends -- or the client
receives -- a frame with the END_STREAM flag set (including any
CONTINUATION frames needed to complete a field block). A server can
send a complete response prior to the client sending an entire request
if the response does not depend on any portion of the request that has
not been sent and received.

https://www.rfc-editor.org/rfc/rfc9113.html#section-8.1-11

because of this, a retry may panic when checking if the previous request
body was capped, if a server delivers a response before the request is
complete. this has been observed when retrying
wire-grpc requests, manifesting in a
panic with this message:

thread 'main' panicked at 'if our `state` was `None`, the shared state must be `Some`', /__w/linkerd2-proxy/linkerd2-proxy/linkerd/http-retry/src/replay.rs:152:22

this commit refactors ReplayBody::is_capped() so that it will no
longer panic if there is an outstanding body still being polled.
rather, it will return Some(true) or Some(false) if the previous
body was capped, or None if it has not finished streaming.

the related logic in the linkerd-http-retry library is updated to
refrain from attempting a retry if a response is received before the
request stream was completed.

Comment on lines 144 to 147
pub fn is_capped(&self) -> bool {
pub fn is_capped(&self) -> Option<bool> {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

☝️ this is the core of the proposed change.

if there's an outstanding clone still being polled, return None.

@cratelyn cratelyn marked this pull request as ready for review September 22, 2024 22:46
@cratelyn cratelyn requested a review from a team as a code owner September 22, 2024 22:46
@cratelyn

This comment was marked as resolved.

@cratelyn cratelyn force-pushed the kate/fix-grpc-retry-panics branch 2 times, most recently from 62c2cf7 to a249faf Compare September 23, 2024 15:07
Copy link
Member

@olix0r olix0r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

w00t

cratelyn added a commit that referenced this pull request Sep 23, 2024
#3216 (comment)

Co-Authored-By: Oliver Gould <[email protected]>
Signed-off-by: katelyn martin <[email protected]>
cratelyn and others added 2 commits September 23, 2024 17:27
`linkerd-http-retry` and `linkerd-retry` provide generic `tower`
middleware to allow services to retry requests that fail.

part of this middleware hinges on a `ReplayBody<B>` that will lazily
buffer request bodies' data for use in subsequent attempts. if a request
fails, the retry middleware will attempt to send another request,
assuming the original body was able to completely fit into this buffer.

`ReplayBody<B>` makes a subtle assummption, most succinctly stated in
this excerpt of an internal method's documentation:

```rust
// linkerd/http/retry/src/replay.rs
impl<B: Body> ReplayBody<B> {
    /// This panics if another clone has currently acquired the state, based on
    /// the assumption that a retry body will not be polled until the previous
    /// request has been dropped.
        fn acquire_state<'a>(
        state: &'a mut Option<BodyState<B>>,
        shared: &Mutex<Option<BodyState<B>>>,
    ) -> &'a mut BodyState<B> {
        // ...
    }
}
```

this assumption is slightly at odds with the request/response lifecycle
permitted within the HTTP/2 specification. see RFC 9113 § 8.1, "_HTTP
Message Framing_" (emphasis added):

> ..

> An HTTP request/response exchange fully consumes a single stream. A
> request starts with the HEADERS frame that puts the stream into the
> "open" state. **The request ends with a frame with the END_STREAM flag
> set**, which causes the stream to become "half-closed (local)" for the
> client and "half-closed (remote)" for the server. A response stream
> starts with zero or more interim responses in HEADERS frames, followed
> by a HEADERS frame containing a final status code.
>
> An HTTP response is complete after the server sends -- or the client
> receives -- a frame with the END_STREAM flag set (including any
> CONTINUATION frames needed to complete a field block). **A server can
> send a complete response prior to the client sending an entire request
> if the response does not depend on any portion of the request that has
> not been sent and received.**

<https://www.rfc-editor.org/rfc/rfc9113.html#section-8.1-11>

because of this, a retry may panic when checking if the previous request
body was capped, if a server delivers a response before the request is
complete. this has been observed when retrying
[wire-grpc](https://github.com/square/wire) requests, manifesting in a
panic with this message:

```text
thread 'main' panicked at 'if our `state` was `None`, the shared state must be `Some`', /__w/linkerd2-proxy/linkerd2-proxy/linkerd/http-retry/src/replay.rs:152:22
```

this commit refactors `ReplayBody::is_capped()` so that it will no
longer panic if there is an outstanding body still being polled.
rather, it will return `Some(true)` or `Some(false)` if the previous
body was capped, or `None` if it has not finished streaming.

the related logic in the `linkerd-http-retry` library is updated to
refrain from attempting a retry if a response is received before the
request stream was completed.

Signed-off-by: katelyn martin <[email protected]>
#3216 (comment)

Co-Authored-By: Oliver Gould <[email protected]>
Signed-off-by: katelyn martin <[email protected]>
@cratelyn cratelyn merged commit 67ed121 into main Sep 23, 2024
15 checks passed
@cratelyn cratelyn deleted the kate/fix-grpc-retry-panics branch September 23, 2024 21:51
@cratelyn cratelyn removed the request for review from olix0r September 23, 2024 21:51
@cratelyn cratelyn self-assigned this Sep 23, 2024
@hawkw
Copy link
Contributor

hawkw commented Sep 23, 2024

Nice catch, this one had totally stumped me! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants