Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement release_all_elements method on StreamTracer #30

Merged
merged 2 commits into from
Jan 16, 2024

Conversation

lucila
Copy link
Contributor

@lucila lucila commented Jan 16, 2024

What and why? 🤔

We want to track when filters are the cause of stream exhaustion. This can happen when a filter releases all elements, the remaining elements is empty and if we don't set retry and overfetch properly, it can cause stream exhaustion.

On this PR we implement release_all_elements on StreamTracer. We call this method on the filter step of StreamFilter so that we can track when a filter releases all the elements on the filter step.

Testing Steps ✍️

Provide adequate testing steps (including examples if necessary).
Don't assume someone knows how to test your changes; be thorough.

  • Bullet point checklists
  • Are useful for some tests

You're it! 👑

@Automattic/stream-builders

…m StreamFilter when all elements are released
@lucila lucila requested a review from a team as a code owner January 16, 2024 15:17
@lucila lucila requested a review from sanmai January 16, 2024 15:17
@coveralls
Copy link

coveralls commented Jan 16, 2024

Coverage Status

coverage: 79.439% (+0.04%) from 79.4%
when pulling e3022ab on lucila/tick-all-elements-released
into 6cca8f6 on main.

Comment on lines +126 to +128
if (count($elements) === $filtered->get_released_count()) {
$tracer->release_all_elements($this, $filtered->get_released_count());
}
Copy link
Contributor

@nicola-barbieri nicola-barbieri Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we ALSO report the % of released elements ?
we can interpret it more clearly: e.g. this filter is filtering out 80% of the posts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we report the % of released elements, do we need the release_all tick?

this would be the diff for the % of released elements implementation:

diff --git a/lib/Tumblr/StreamBuilder/StreamFilters/StreamFilter.php b/lib/Tumblr/StreamBuilder/StreamFilters/StreamFilter.php
index 621cb17..70a772e 100644
--- a/lib/Tumblr/StreamBuilder/StreamFilters/StreamFilter.php
+++ b/lib/Tumblr/StreamBuilder/StreamFilters/StreamFilter.php
@@ -118,7 +118,8 @@ abstract class StreamFilter extends Templatable
             $tracer->end_filter(
                 $this,
                 $filtered->get_released_count(),
-                [$t0, microtime(true) - $t0]
+                [$t0, microtime(true) - $t0],
+                count($elements)
             );
             foreach ($filtered->get_released() as $element) {
                 $tracer->release_element($this, $element);
diff --git a/lib/Tumblr/StreamBuilder/StreamTracers/StreamTracer.php b/lib/Tumblr/StreamBuilder/StreamTracers/StreamTracer.php
index 8963fc2..8736ad3 100644
--- a/lib/Tumblr/StreamBuilder/StreamTracers/StreamTracer.php
+++ b/lib/Tumblr/StreamBuilder/StreamTracers/StreamTracer.php
@@ -387,21 +387,28 @@ abstract class StreamTracer
      * @param StreamFilter $filter The stream filter is applied.
      * @param int $released_count The number of items being filtered.
      * @param array $timing Zero indexed tuple of the start time and duration of the operation (in that order)
+     * @param int|null $total_count The number of elements that went through the filter step, out of which $release_count were filtered.
      * @return void
      */
     final public function end_filter(
         StreamFilter $filter,
         int $released_count,
-        array $timing
+        array $timing,
+        ?int $total_count = 0
     ): void {
+        $meta = [static::META_COUNT => $released_count];
+
+        if ($total_count > 0) {
+            $ratio = $released_count / $total_count;
+            $meta['released_ratio'] = $ratio;
+        }
+
         $this->trace_event(
             static::CATEGORY_FILTER,
             $filter,
             static::EVENT_END,
             $timing,
-            [
-                static::META_COUNT => $released_count,
-            ]
+            $meta
         );
     }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we report the % of released elements, do we need the release_all tick?

I would keep the release_all tick because for our use-case it's more easy to monitor that, but % release is more informative.

Copy link

@cmantas cmantas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I agree with logging both the % and the boolean of exhaustion, but I don't think doing one of the 2 is a blocker.

@lucila lucila merged commit 7fcff60 into main Jan 16, 2024
9 checks passed
@lucila lucila deleted the lucila/tick-all-elements-released branch January 16, 2024 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants