Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(core): add level of indirection for provider.py contextvars #10525

Open
wants to merge 42 commits into
base: main
Choose a base branch
from

Conversation

sanchda
Copy link
Contributor

@sanchda sanchda commented Sep 5, 2024

Whenever a contextvar is reassociated, it causes the underlying HAMT data structure to clone a node. This clone operation requires de-referencing stored Python objects, which can cause segmentation faults if other libraries mis-manage the reference counts for their objects, causing them to be GC'd.

This patch stores a single wrapper object into the contextvar, then manipulates a reference within that wrapper in order to propagate our desired information. In our normal testing fixture, we cause as many as 69 realloc (and clone) events in a single process (I deduce this by patching cpython itself to produce a log). With this patch, that number is down to 1, and it doesn't originate from this provider.py

I have a standalone reproduction for the noted behavior here. The repro isn't very clever about how it manages the lifetimes of GC'd objects--the issues we see in the wild are a little bit more subtle, since they don't segfault during normal scope cleanup (unlike mine).

Checklist

  • PR author has checked that all the criteria below are met
  • The PR description includes an overview of the change
  • The PR description articulates the motivation for the change
  • The change includes tests OR the PR description describes a testing strategy
  • The PR description notes risks associated with the change, if any
  • Newly-added code is easy to change
  • The change follows the library release note guidelines
  • The change includes or references documentation updates if necessary
  • Backport labels are set (if applicable)

Reviewer Checklist

  • Reviewer has checked that all the criteria below are met
  • Title is accurate
  • All changes are related to the pull request's stated goal
  • Avoids breaking API changes
  • Testing strategy adequately addresses listed risks
  • Newly-added code is easy to change
  • Release note makes sense to a user of the library
  • If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
  • Backport labels are set in a manner that is consistent with the release branch maintenance policy

Copy link
Contributor

github-actions bot commented Sep 5, 2024

CODEOWNERS have been resolved as:

releasenotes/notes/fix-contextvar-cloning-49adaf7fdf36e8fb.yaml         @DataDog/apm-python
ddtrace/_trace/provider.py                                              @DataDog/apm-sdk-api-python

@emmettbutler
Copy link
Collaborator

interesting and promising

@pr-commenter
Copy link

pr-commenter bot commented Sep 5, 2024

Benchmarks

Benchmark execution time: 2024-10-08 14:24:22

Comparing candidate commit 0712bbc in PR branch sanchda/make_contextvars_indirect with baseline commit f757fbf in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 339 metrics, 53 unstable metrics.

@taegyunkim
Copy link
Contributor

Could you run hatch run lint:fmt to format and trigger the rest of circle ci?

@datadog-dd-trace-py-rkomorn
Copy link

datadog-dd-trace-py-rkomorn bot commented Sep 5, 2024

Datadog Report

Branch report: sanchda/make_contextvars_indirect
Commit report: 708d8f4
Test service: dd-trace-py

✅ 0 Failed, 1142 Passed, 144 Skipped, 28m 48.41s Total duration (9m 30.07s time saved)

@wconti27 wconti27 removed their request for review September 30, 2024 15:48
Copy link
Contributor

@wconti27 wconti27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to let @emmettbutler handle this review

ddtrace/_trace/provider.py Outdated Show resolved Hide resolved
@sanchda
Copy link
Contributor Author

sanchda commented Oct 1, 2024

@wconti27 thanks for bringing @emmettbutler into the discussion--can you toggle your review? Merges are currently blocked since your last review required changes. 🙇

@sanchda sanchda disabled auto-merge October 3, 2024 17:29
@sanchda
Copy link
Contributor Author

sanchda commented Oct 3, 2024

/merge

@dd-devflow
Copy link

dd-devflow bot commented Oct 3, 2024

🚂 MergeQueue: waiting for PR to be ready

This merge request is not mergeable yet, because of pending checks/missing approvals. It will be added to the queue as soon as checks pass and/or get approvals.
Note: if you pushed new commits since the last approval, you may need additional approval.
You can remove it from the waiting list with /remove command.

Use /merge -c to cancel this operation!

@taegyunkim
Copy link
Contributor

Whenever I see failing tests in CI for this PR, I've been pushing retry button. I've pressed the button enough number of times to believe those are actual failures.

@dd-devflow
Copy link

dd-devflow bot commented Oct 3, 2024

⚠️ MergeQueue: This merge request was unqueued

This merge request was unqueued

If you need support, contact us on Slack #devflow!

@sanchda
Copy link
Contributor Author

sanchda commented Oct 4, 2024

Whenever I see failing tests in CI for this PR, I've been pushing retry button. I've pressed the button enough number of times to believe those are actual failures.

but what if it passes if I retry just one more time (jk, checking)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants