-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test injection_queue_depth_multi_thread
is flaky
#6847
Comments
injection_queue_depth_multi_thread
is flaky
I presume it's these two recent jobs that timed out
that have been affected by the flakiness of tokio/tokio/tests/rt_unstable_metrics.rs Line 663 in 83e922f
fails, the main thread panics and we never get to synchronise on tokio/tokio/tests/rt_unstable_metrics.rs Line 667 in 83e922f
here, causing the test to run forever (or until CI times out). |
Yes. Good point with the assert. That explains why it times out instead of failing normally. |
Making it fail instead of hanging would be nice. I submitted a PR for that. (But the flakiness is not fixed by this.) |
Interestingly enough, after I pulled your changes, the test still stalls instead of failing. It seems like I was wrong about the At least I built this highly professional bash script—which I'm going to share here in case someone else wants to use it—that finally allowed me to trigger the flakiness on my local x86_64 Linux system: run_test() {
cargo test \
--all-features \
--test rt_metrics \
injection_queue_depth_multi_thread \
-- --nocapture
}
i=0
while run_test; do
let i++
echo -e "$i: \e[32m☑\e[0m"
done Now that I can litter my fork with debug print statements, I'll investigate some more 🙂 |
My preliminary findings after adding debug prints before and after the line1 are that we deadlock on tokio/tokio/tests/rt_metrics.rs Line 84 in 82628b8
Footnotes
|
Uhm, isn't the flakiness just a mundane case of accidentally blocking the runtime before it can schedule the second task? |
The runtime isn't supposed to get blocked by this. If there is an idle worker thread and a runnable task, the worker thread must pick up a runnable task. The only exception is the LIFO slot which is not relevant here. |
Could there be some weirdness around parked threads? The deadlocked test parks the second worker before it gets the second task in its work queue and does not unpark it again on my machine. |
Looking at the source, my guess is one worker gets both tasks off the injection queue in one batch and doesn’t notify a peer to steal. @jofas would you be able to try to isolate this case as a loom test? |
I'd love to give it a try for sure. This is quite exciting. @Darksonn may I ask you questions in case I get stuck on this? |
Yes, feel free to send questions my way. A loom test would be a good start. |
This test has been observed to fail in CI:
tokio/tokio/tests/rt_unstable_metrics.rs
Lines 642 to 668 in 83e922f
To close this issue, figure out why it is failing and fix it.
The text was updated successfully, but these errors were encountered: