Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: 🦠 Metrics prototype for Rack instrumentation #1129

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open
15 changes: 14 additions & 1 deletion instrumentation/base/lib/opentelemetry/instrumentation/base.rb
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,7 @@ def infer_version
end
end

attr_reader :name, :version, :config, :installed, :tracer
attr_reader :name, :version, :config, :installed, :tracer, :meter

alias installed? installed

Expand All @@ -205,6 +205,8 @@ def initialize(name, version, install_blk, present_blk,
@installed = false
@options = options
@tracer = OpenTelemetry::Trace::Tracer.new
# check to see if the API is defined here because the config isn't available yet
@meter = OpenTelemetry::Metrics::Meter.new if defined?(OpenTelemetry::Metrics)
end
# rubocop:enable Metrics/ParameterLists

Expand All @@ -221,9 +223,20 @@ def install(config = {})

instance_exec(@config, &@install_blk)
@tracer = OpenTelemetry.tracer_provider.tracer(name, version)
install_meter
@installed = true
end

def install_meter
@meter = OpenTelemetry.meter_provider.meter(name, version: version) if metrics_enabled?
end

def metrics_enabled?
return @metrics_enabled if defined?(@metrics_enabled)

@metrics_enabled ||= defined?(OpenTelemetry::Metrics) && @config[:send_metrics]
end

# Whether or not this instrumentation is installable in the current process. Will
# be true when the instrumentation defines an install block, is not disabled
# by environment or config, and the target library present and compatible.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,9 @@ class Instrumentation < OpenTelemetry::Instrumentation::Base
option :untraced_requests, default: nil, validate: :callable
option :response_propagators, default: [], validate: :array
# This option is only valid for applications using Rack 2.0 or greater
option :use_rack_events, default: true, validate: :boolean
option :use_rack_events, default: true, validate: :boolean
# TODO: This option currently exclusively uses the event handler, should we support old and new Rack?
option :send_metrics, default: false, validate: :boolean
kaylareopelle marked this conversation as resolved.
Show resolved Hide resolved

# Temporary Helper for Sinatra and ActionPack middleware to use during installation
#
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,7 @@ def detach_context(request)
token, span = request.env[OTEL_TOKEN_AND_SPAN]
span.finish
OpenTelemetry::Context.detach(token)
record_http_server_request_duration_metric(span)
rescue StandardError => e
OpenTelemetry.handle_error(exception: e)
end
Expand Down Expand Up @@ -262,6 +263,43 @@ def create_span(parent_context, request)
span.add_event('http.proxy.request.started', timestamp: request_start_time) unless request_start_time.nil?
span
end

# Metrics stuff
HTTP_SERVER_REQUEST_DURATION_ATTRS_FROM_SPAN = %w[http.method http.scheme http.route http.status_code http.host].freeze

def metrics_enabled?
OpenTelemetry::Instrumentation::Rack::Instrumentation.instance.metrics_enabled?
end

def meter
return unless metrics_enabled?

OpenTelemetry::Instrumentation::Rack::Instrumentation.instance.meter
end

def http_server_request_duration_histogram
return unless metrics_enabled?

@http_server_request_duration_histogram ||= meter.create_histogram(
'http.server.request.duration',
unit: 's',
description: 'Duration of HTTP server requests.'
)
end

def record_http_server_request_duration_metric(span)
return unless metrics_enabled?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a bunch of ideas and questions floating around in my head that could use your input being a metrics SDK expert.

It may result in a separate dispatch call, but have you considered creating a separate handler for metric generation? That would remove the need for having conditionals in the tracer middleware.

What if our user has an o11y 1.0 mindset and only wanted to generate metrics and not traces? Would that not mean the span would be non_recording and the metric would always generate invalid histogram entries?

Is there anything in the spec that states this should be generated from span timestamps?

Is there an alternative implementation that generates metrics in the span processor pipeline?

E.g. there is a a processor that implements on_finish generated metrics for server and client spans regardless of the instrumentation?

What about one that decorates the BatchSpanProcessor export loop and generates metrics in the bsp thread to minimize the metrics from being generated in the critical path of the users request?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything in the spec that states this should be generated from span timestamps?

I don't think there is spec about where we should generate metrics. afaik, metrics spec only talks about what the metrics should looks like and how to export them.

Is there an alternative implementation that generates metrics in the span processor pipeline?

To generate the metrics from span processor, then the processor need have meter and create the instrument in processor in on_start or on_finish, I am thinkhing something like this?

def initialize
  ...
  @meter = ::OpenTelemetry.meter_provider.meter('sample_meter')
end

def on_start
  ...
  @histogram = @meter.create_histogram('histogram_name', unit: 'ms', description: 'measure sample data')
end

def on_finish
  ...
  span_time = span.end_timestamp - span.start_timestamp
  @histogram.record(span_time, attributes: {})
end

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps.

What are other language sigs doing?

Is what I'm suggesting seemingly out of touch with what is the consistent pattern across language implementations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love these questions! I have some research to do to follow-up, but here are my current thoughts.

The instrumentation is based on Python's approach with the aiohttp-server. The asgi instrumentation has an example of what we could do when OTEL_SEMCONV_STABILITY_OPT_IN comes around.

It may result in a separate dispatch call, but have you considered creating a separate handler for metric generation? That would remove the need for having conditionals in the tracer middleware.

I think we could take that approach! The examples I've seen so far tend to record the metric as part of the span creation, but we don't have to and that might be safer until metrics are stable.

What if our user has an o11y 1.0 mindset and only wanted to generate metrics and not traces? Would that not mean the span would be non_recording and the metric would always generate invalid histogram entries?

This is a great thing to consider. I think that'd be the case with this current implementation, and I'd like to make some tweaks for this scenario.

I don't see an option to create only metrics and not traces in the Python instrumentation, but perhaps this exists in other languages. I'll do a bit more digging. My read on the spec is that it should leverage the span if it's available, but if the span is not available, it should still record the metric. spec

Is there anything in the spec that states this should be generated from span timestamps?

The spec says: "When this metric is reported alongside an HTTP server span, the metric value SHOULD be the same as the HTTP server span duration."

AFAIK, we don't save duration on a span, just a start and end timestamp, so I figured this would be the way to create a duration that's the same as the HTTP server span duration. I might be interpreting the spec incorrectly. We could get the start and end timestamps in an independent fashion. That may be closer to what Python does.

Is there an alternative implementation that generates metrics in the span processor pipeline?

I haven't seen this, and this isn't something I considered. I'll poke around other languages and see if I can find an example. I haven't looked too much at the BSP metrics, I'll take a look to get a better sense of how they work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arielvalentin - Thanks for your patience!

I spent some time looking at metrics instrumentation in other languages. The most common design I found was to create metrics within the same instrumentation file as spans, but to calculate the duration and collect the attributes in separate ways. It seems like metrics are stable in those languages, so this might not have always been the case.
When it comes to creating a separate handler for metrics, I don't feel strongly one way or the other. Would you prefer to keep them in separate handlers?

Here are some of the files I looked at. The links should be to the code that records the metric. In most cases, the same file also creates the histogram.

For the span processor question, I didn't come across use of a span processor to collect metrics. Since span processors are part of the SDK and not the API, and instrumentation should only rely on the API this doesn't seem like quite the right fit. In addition, since spans and metrics should be independent of each other, I don't think we want a span processor to control the creation of another signal. What do you think? 


It looks like there are some features coming with the meter configurator that allow users to control what data is collected by instrumentation scope. This could help someone control what telemetry they want from a given instrumentation scope, and support that Olly 1.0 mindset of sending only metrics, but not traces. I'll look into prototyping the experimental spec and use that to control whether metrics and traces are sent. We could probably do this with configs in the instrumentation too.

So as a follow-up, I'd like to:

  • Separate metrics from traces in the instrumentation. The metrics data should not use the span directly for its attributes/duration.
  • See what it's like to make an event handler specifically for metrics
  • Prototype the experimental metrics configuration features to control whether metrics/traces are sent

Let me know if there's anything I missed! Thanks again for your feedback!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arielvalentin @xuan-cao-swi - I have a prototype ready that creates metrics in a separate event handler: #1213

I decided to put it in a different PR to start to make it easier to compare the two approaches.

Next on my list is to prototype the metrics configuration work, but that PR will likely live in the core repo.


# find span duration
# end - start / a billion to convert nanoseconds to seconds
duration = (span.end_timestamp - span.start_timestamp) / Float(10**9)
# glean attributes
attrs = span.attributes.select { |k, _v| HTTP_SERVER_REQUEST_DURATION_ATTRS_FROM_SPAN.include?(k) }
# set error
attrs['error.type'] = span.status.description if span.status.code == OpenTelemetry::Trace::Status::ERROR

http_server_request_duration_histogram.record(duration, attributes: attrs)
end
end
end
end
Expand Down
Loading