-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scx_rustland_core: performance regression due to kernel change #788
Comments
Never let the CPU go idle. This is a stress test to prove issue #788. The expected behavior is that CPUs should not go idle due to the immediate re-kick in ops.update_idle(). However, in version 6.12, the CPUs are still entering idle states, indicating that in certain cases, ops.update_idle() is not being correctly invoked by the sched_ext core. This is likely due to the pick_next_task()/put_prev_task() rework in sched core. WARNING: do not run this for too long or it may burn your CPUs. Signed-off-by: Andrea Righi <[email protected]>
I think I found a much easier reproducer, see 7f9b009. It seems that in 6.12, I don't have a fix yet, I'm just sharing the reproducer for now, I'll investigate more on the kernel side. |
With the consolidation of put_prev_task/set_next_task(), we are now skipping the sched_ext ops.stopping/running() transitions when the previous and next tasks are the same, see commit 436f3ee ("sched: Combine the last put_prev_task() and the first set_next_task()"). While this optimization makes sense in general, it can negatively impact performance in some user-space schedulers, that expect to handle such transitions when tasks exhaust their timeslice (see SCX_OPS_ENQ_LAST). For example, scx_rustland suffers a significant performance regression (e.g., gaming benchmarks drop from ~60fps to ~10fps). To fix this, ensure that put_prev_task()/set_next_task() are never skipped when the scx scheduling class is enabled, allowing the scx class to handle such transitions. This change restores the previous behavior, fixing the performance regression in scx_rustland. Link: sched-ext/scx#788 Fixes: 7c65ae8 ("sched_ext: Don't call put_prev_task_scx() before picking the next task") Signed-off-by: Andrea Righi <[email protected]>
FYI, https://lore.kernel.org/lkml/[email protected]/T/#u seems to fix this regression. |
With the consolidation of put_prev_task/set_next_task(), we are now skipping the sched_ext ops.stopping/running() transitions when the previous and next tasks are the same, see commit 436f3ee ("sched: Combine the last put_prev_task() and the first set_next_task()"). While this optimization makes sense in general, it can negatively impact performance in some user-space schedulers, that expect to handle such transitions when tasks exhaust their timeslice (see SCX_OPS_ENQ_LAST). For example, scx_rustland suffers a significant performance regression (e.g., gaming benchmarks drop from ~60fps to ~10fps). To fix this, ensure that put_prev_task()/set_next_task() are never skipped when the scx scheduling class is enabled, allowing the scx class to handle such transitions. This change restores the previous behavior, fixing the performance regression in scx_rustland. Link: sched-ext/scx#788 Fixes: 7c65ae8 ("sched_ext: Don't call put_prev_task_scx() before picking the next task") Signed-off-by: Andrea Righi <[email protected]>
Prevent CPUs from going idle when the user-space scheduler has some pending activities to complete. Keeping the CPU alive allows to consume tasks from the user-space scheduler more efficiently, preventing bubbles in the scheduling pipeline. To achieve this, trigger a CPU kick from ops.update_idle() and set a flag in the CPU context to prevent it from going idle. Then keep kicking the CPU from ops.dispatch() until the flag is cleared, which occurs when no more tasks are pending or when the CPU exits idle as a task starts running on it. This allows to fix the performance regression introduced by the put_prev_task_scx() behavior change in Linux 6.12 (see #788). Link: https://lore.kernel.org/lkml/[email protected]/ Signed-off-by: Andrea Righi <[email protected]>
Prevent CPUs from going idle when the user-space scheduler has some pending activities to complete. Keeping the CPU alive allows to consume tasks from the user-space scheduler more efficiently, preventing bubbles in the scheduling pipeline. To achieve this, trigger a CPU kick from ops.update_idle() and set a flag in the CPU context to prevent it from going idle. Then keep kicking the CPU from ops.dispatch() until the flag is cleared, which occurs when no more tasks are pending or when the CPU exits idle as a task starts running on it. This allows to fix the performance regression introduced by the put_prev_task_scx() behavior change in Linux 6.12 (see #788). Link: https://lore.kernel.org/lkml/[email protected]/ Signed-off-by: Andrea Righi <[email protected]>
Prevent CPUs from going idle when the user-space scheduler has some pending activities to complete. Keeping the CPU alive allows to consume tasks from the user-space scheduler more efficiently, preventing bubbles in the scheduling pipeline. To achieve this, trigger a CPU kick from ops.update_idle() and set a flag in the CPU context to prevent it from going idle. Then keep kicking the CPU from ops.dispatch() until the flag is cleared, which occurs when no more tasks are pending or when the CPU exits idle as a task starts running on it. This allows to fix the performance regression introduced by the put_prev_task_scx() behavior change in Linux 6.12 (see sched-ext#788). Link: https://lore.kernel.org/lkml/[email protected]/ Signed-off-by: Andrea Righi <[email protected]>
With the consolidation of put_prev_task/set_next_task(), we are now skipping the sched_ext ops.stopping/running() transitions when the previous and next tasks are the same, see commit 436f3ee ("sched: Combine the last put_prev_task() and the first set_next_task()"). While this optimization makes sense in general, it can negatively impact performance in some user-space schedulers, that expect to handle such transitions when tasks exhaust their timeslice (see SCX_OPS_ENQ_LAST). For example, scx_rustland suffers a significant performance regression (e.g., gaming benchmarks drop from ~60fps to ~10fps). To fix this, ensure that put_prev_task()/set_next_task() are never skipped when the scx scheduling class is enabled, allowing the scx class to handle such transitions. This change restores the previous behavior, fixing the performance regression in scx_rustland. Link: sched-ext/scx#788 Fixes: 7c65ae8 ("sched_ext: Don't call put_prev_task_scx() before picking the next task") Signed-off-by: Andrea Righi <[email protected]>
This commit in the kernel introduces a pretty bad performance regression in all the scx_rustland_core schedulers:
7c65ae81ea86 ("sched_ext: Don't call put_prev_task_scx() before picking the next task")
System becomes completely unresponsive when it's saturated and it's very easy to reproduce (i.e., starting a parallel kernel build with scx_rustland active).
I think the reason is one (or both) of these behavior changes:
But I haven't figured out exactly why, I've been playing a bit with
SCX_ENQ_LAST
, unsuccessfully, so I'm just opening the issue for now. Any pointers on how to attack this?The text was updated successfully, but these errors were encountered: