Fix issue where successful server streaming calls can incorrectly be marked as failing due to context cancellation #16
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There is a race condition in
openTracingClientStream
that means that successful server streaming calls' client-generated spans can be incorrectly marked as failing due to context cancellation.This happens because whenever the gRPC library's
ClientStream
implementation encounters an error inRecvMsg()
,SendMsg()
,Header()
orCloseSend()
, it immediately cancels its context (the one returned byContext()
). This includes whenio.EOF
is returned byRecvMsg()
, which is used to indicate the successful end of a stream.However, this context cancellation then races with the interceptor calling
finishFunc
with the error returned byRecvMsg()
,SendMsg()
,Header()
orCloseSend()
. Either this call wins the race, in which case the true status of the call is recorded, or the context cancellation wins, and the span is recorded as failing due to context cancellation by the call tofinishFunc
here.On top of this, the docs for
grpc.ClientStream
warn that theContext()
method should not be called until after the first call toHeader()
orRecvMsg()
has returned, however the current implementation callsContext()
before either of these methods are called.This PR fixes both of these issues by using the provided context rather than the stream's context to wait for context cancellation, as this context is not cancelled when the stream is exhausted or fails.
This has two drawbacks:
ClientConn
is closed, but will instead wait until another method on the stream returns an error. This seemed preferable to the alternative, more complex, implementation which would require capturing the stream context after the first call toHeader()
orRecvMsg()
, and implementing some form of locking to ensure that the error returned byRecvMsg()
,SendMsg()
,Header()
orCloseSend()
wins the race with the cancellation of the stream's context.SpanDecoratorFunc
will now be the provided context, rather than the stream's context.Note that while I can observe this behaviour regularly in our application, I had to artificially delay the call to
finishFunc
on line 213 ofclient.go
to have the test reliably fail before introducing the fix.