Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing bug that was causing the Indexer to stay busy in cooperative m… #4452

Merged
merged 4 commits into from
Jan 25, 2024

Conversation

fulmicoton
Copy link
Contributor

@fulmicoton fulmicoton commented Jan 24, 2024

…ode.

Removing the difference between Idle and Processing.
It was originally introduced with the idea of measure how long actors
where busy or idle, but tokio-console does that job very well now.

The code was relying on the enforcement of the non trivial state machine:
calling .idle() in the pause state had no effect for instance.

The real bug however showed up with the following sequence of event:

  • indexer drains its queue
  • on_drain is called
  • indexer "goes to sleep" by putting itself in the Pause state.
  • a low priority message is queued into the low priority queue.
  • indexer receives a high priority message (Observe is sent every second).
  • indexer would go through the loop logic consuming the entire pipeline,
    but only consuming high priority messages. The inbox.is_empty() exit
    condiition is never satisfied, so that this loop would keep going,
    yielding at each iteration, until the Resume Command is received.

This is done on the indexer runtime.
The task would always be busy, but yield all of the time, consuming the
whatever room is available on the thread it is consuming.

Closes #4448

Test

The bug was doing a using yielding loop, so unit testing is a difficult.
I tested locally and on k8s.

A small pipeline of 100KB/s now takes <1% cpu vs 1 full core before.
Outside of merges, indexing 100200kb=20MB on a single node takes 1 cpu instead of 4.
Outside of merges, indexing 1000
200kb=20MB on a 10 nodes takes 1 cpu instead of 4.

@fulmicoton fulmicoton force-pushed the issue/4448-fix-loop-with-cooperative-indexing branch 7 times, most recently from c91386f to a198b31 Compare January 24, 2024 15:23
@fulmicoton fulmicoton marked this pull request as ready for review January 24, 2024 15:23
…ode.

Removing the difference between Idle and Processing.
It was originally introduced with the idea of measure how long actors
where busy or idle, but tokio-console does that job very well now.

The code was relying on the enforcement of the non trivial state machine:
calling .idle() in the pause state had no effect for instance.

The real bug however showed up with the following sequence of event:
- indexer drains its queue
- on_drain is called
- indexer "goes to sleep" by putting itself in the Pause state.
- a low priority message is queued into the low priority queue.
- indexer receives a high priority message (Observe is sent every second).
- indexer would go through the loop logic consuming the entire pipeline,
  but only consuming high priority messages. The `inbox.is_empty()` exit
  condiition is never satisfied, so that this loop would keep going,
  yielding at each iteration, until the Resume Command is received.

This is done on the indexer runtime.
The task would always be busy, but yield all of the time, consuming the
whatever room is available on the thread it is consuming.

Closes #4448
@fulmicoton fulmicoton force-pushed the issue/4448-fix-loop-with-cooperative-indexing branch from a198b31 to eaaaaa7 Compare January 24, 2024 15:33
@fulmicoton fulmicoton merged commit a2e3b92 into main Jan 25, 2024
4 checks passed
@fulmicoton fulmicoton deleted the issue/4448-fix-loop-with-cooperative-indexing branch January 25, 2024 02:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

High cpu usage
2 participants