-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Set span error only for 5xx response range #1196
feat: Set span error only for 5xx response range #1196
Conversation
74f912c
to
6acd6ee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for this PR and for your great research on rack.hijack
! 🚀
instrumentation/rack/test/opentelemetry/instrumentation/rack/middlewares/event_handler_test.rb
Outdated
Show resolved
Hide resolved
instrumentation/rack/test/opentelemetry/instrumentation/rack/middlewares/event_handler_test.rb
Outdated
Show resolved
Hide resolved
...mentation/rack/test/opentelemetry/instrumentation/rack/middlewares/tracer_middleware_test.rb
Outdated
Show resolved
Hide resolved
...mentation/rack/test/opentelemetry/instrumentation/rack/middlewares/tracer_middleware_test.rb
Outdated
Show resolved
Hide resolved
instrumentation/rack/lib/opentelemetry/instrumentation/rack/middlewares/event_handler.rb
Outdated
Show resolved
Hide resolved
@@ -109,7 +109,7 @@ def on_finish(request, response) | |||
span = OpenTelemetry::Instrumentation::Rack.current_span | |||
return unless span.recording? | |||
|
|||
add_response_attributes(span, response) if response | |||
add_response_attributes(span, response) if response && !request.env.key?('rack.hijack?') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@j15e The more I think about your comments in slack, the more I agree with this implementation:
I see, so maybe changing the status check to > 500 would be a simpler and more pragmatic fix, is that what you think?1
When we take a closer look at the spec:
Span Status MUST be left unset if HTTP status code was in the 1xx, 2xx or 3xx ranges, unless there was another error (e.g., network error receiving the response body; or 3xx codes with max redirects exceeded), in which case status MUST be set to Error.
...
For HTTP status codes in the 5xx range, as well as any other code the client failed to interpret, span status SHOULD be set to Error.
I interpret that to mean that the code should only set the status to error in cases where the response is in the 5xx
ranges . There are other non-5xx cases that the code isn't currently accounting for but the spec isn't terribly explicit about it other than the case of the Max Redirects Exceeded, which is usually a client side constraint.
Given the description the spec I think it is safe to change the code to look for Bad HTTP status codes in the 5xx range and leaving the span status unset in all other cases.
This will remove our explicit checks for the Rack Hijack implementation and allow the response context propagators to execute.
Footnotes
This also avoids setting span in error when a Rails ActionCable hijacked response returns a -1 status code
6acd6ee
to
34982e4
Compare
@@ -208,7 +207,7 @@ def detach_context(request) | |||
end | |||
|
|||
def add_response_attributes(span, response) | |||
span.status = OpenTelemetry::Trace::Status.error unless GOOD_HTTP_STATUSES.include?(response.status.to_i) | |||
span.status = OpenTelemetry::Trace::Status.error if response.server_error? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found out that here we have access to the final Rack::Response object so this neat server_error?
method can be re-used (not the case in the middleware instrument version though)
Thanks for your kind and detailed feedback! I adjusted the PR to change the way the status code is set (only for the 5xx response code range) rather than make a special case for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, thank you! I learned a lot during this review 😄
👋 Hello!
I ran into a very specific problem with the Rack instrumentation.
When a Rails Action Cable requests get traced by the Rack instrument, the response code returned by Rails is -1 and the body is empty as defined in Rack Hijacking spec.
In a nutshell, Rack hijacking exists to allow an app to take over the response streaming object to do special things like switching protocol to a web socket (just like Action Cable does).
This currently makes the Rack instrumentation add an error on the span because
-1
is (rightfully) outside the defined good http statues (100..499).opentelemetry-ruby-contrib/instrumentation/rack/lib/opentelemetry/instrumentation/rack/middlewares/event_handler.rb
Line 46 in c2ffafc
opentelemetry-ruby-contrib/instrumentation/rack/lib/opentelemetry/instrumentation/rack/middlewares/event_handler.rb
Lines 210 to 215 in c2ffafc
This change will simply ignore the response object when the env marked the request as hijacked, which I think is was a Rack user would expect in this case.
Special thanks to @arielvalentin for pointing me to the Rack detailed spec when I initially asked the question about this problem in Slack just here.