Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiling #327

Merged
merged 32 commits into from
Nov 15, 2023
Merged

Profiling #327

merged 32 commits into from
Nov 15, 2023

Conversation

seemk
Copy link
Contributor

@seemk seemk commented Sep 20, 2023

Description

Add support for CPU profiling.

Brings in new environment variables:

  • SPLUNK_PROFILER_ENABLED - disabled by default
  • SPLUNK_PROFILER_LOGS_ENDPOINT - defaults to http://localhost:4317
  • SPLUNK_PROFILER_CALL_STACK_INTERVAL - thread sampling (stacktrace taking) interval in milliseconds, defaults to 1000
  • SPLUNK_PROFILER_INCLUDE_INTERNAL_STACKS - whether to include stacktraces from internal profiler threads (sampler thread and logs processor/exporter), disabled by default.

The profiler / sampling works from a different thread, where call stacks for all threads are periodically sampled. These are then serialized to protobuf (pprof format), gzipped and encoded as base64. The base64 content are sent to collector as OTel logs. For the exporting we can leverage the OTel logs SDK, which makes it quite convenient.

To correlate active span contexts with sampled stacktraces, opentelemetry.context.{attach,detach} are monkeypatched, to store a mapping of thread ID -> span context. This mapping is available for the sampler thread.

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How Has This Been Tested?

  • Tested manually
  • Added automated tests

Checklist:

  • Changelogs have been updated
  • Unit tests have been added/updated
  • Documentation has been updated

@signalfx signalfx deleted a comment from mergify bot Oct 5, 2023
@nstawski
Copy link
Contributor

nstawski commented Oct 5, 2023

@seemk please rebase this pull against main

@nstawski
Copy link
Contributor

nstawski commented Oct 6, 2023

@seemk also the lint job is failing. Looks like just an extra newline

@nstawski nstawski added the Skip Changelog Skip Changelog label Oct 6, 2023
@seemk seemk marked this pull request as ready for review October 12, 2023 20:56
@seemk seemk requested review from a team as code owners October 12, 2023 20:56
@seemk seemk changed the title DRAFT: Profiling Profiling Oct 12, 2023
@tsloughter-splunk
Copy link

Is there a design doc or can you describe how it is working?

Not knowing much about this the one thing that caught my eye when trying to review is the global batch log processor, but I assume that is the best solution? :)

But do other profilers use their own log processor/exporters instead of using one that might already be created by the application for logs?

@seemk
Copy link
Contributor Author

seemk commented Oct 30, 2023

Is there a design doc or can you describe how it is working?

Not knowing much about this the one thing that caught my eye when trying to review is the global batch log processor, but I assume that is the best solution? :)

But do other profilers use their own log processor/exporters instead of using one that might already be created by the application for logs?

We went over the workings during a call, but I'll write a summary into the PR description.

As for the global batch processor, I just used it for convenience to get the export pipeline set up. Other distros (Node) didn't have the logs SDK available when profiling was implemented and had to have a custom exporter for the logs protos.

Copy link
Contributor

@pmcollins pmcollins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this Siim. A few questions/comments below.

splunk_otel/distro.py Show resolved Hide resolved
splunk_otel/profiling/__init__.py Show resolved Hide resolved
tests/unit/test_profiling.py Outdated Show resolved Hide resolved
splunk_otel/profiling/__init__.py Show resolved Hide resolved
splunk_otel/profiling/__init__.py Show resolved Hide resolved
splunk_otel/profiling/__init__.py Show resolved Hide resolved
splunk_otel/profiling/__init__.py Outdated Show resolved Hide resolved
splunk_otel/profiling/__init__.py Outdated Show resolved Hide resolved
@seemk
Copy link
Contributor Author

seemk commented Nov 1, 2023

Is there a design doc or can you describe how it is working?

Not knowing much about this the one thing that caught my eye when trying to review is the global batch log processor, but I assume that is the best solution? :)

But do other profilers use their own log processor/exporters instead of using one that might already be created by the application for logs?

Added a small description how the "profiler" operates, it turned out to be pretty small in Python.

@seemk
Copy link
Contributor Author

seemk commented Nov 2, 2023

Added filtering of internal threads (enabled by default) and stop_profiling

@seemk
Copy link
Contributor Author

seemk commented Nov 7, 2023

  • Added reinitialization of the profiler thread on forks (for forking servers such as gunicorn).
  • Added LoggerProvider instead of working with the batch processor directly. The logger provider has exit handlers, where it will shut down the exporting pipeline.

Copy link
Contributor

@pmcollins pmcollins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we can modify how we access the Profiler global variable, such that its global state is accessed once by a global function, but all other access is via function arguments.

splunk_otel/profiling/__init__.py Outdated Show resolved Hide resolved
Copy link

github-actions bot commented Nov 14, 2023

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

@pmcollins
Copy link
Contributor

I have read the CLA Document and I hereby sign the CLA

srv-gh-o11y-gdi-cla added a commit to splunk/cla-agreement that referenced this pull request Nov 14, 2023
@pmcollins
Copy link
Contributor

Looks great, thanks for taking my feedback.

@seemk
Copy link
Contributor Author

seemk commented Nov 15, 2023

@pmcollins Any outstanding issues you want addressed?

@pmcollins
Copy link
Contributor

pmcollins commented Nov 15, 2023

@pmcollins Any outstanding issues you want addressed?

LGTM.

@johnbley johnbley merged commit cf0a885 into main Nov 15, 2023
16 checks passed
@johnbley johnbley deleted the profiling-poc branch November 15, 2023 14:07
@github-actions github-actions bot locked and limited conversation to collaborators Nov 15, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Skip Changelog Skip Changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants