Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scx_rustland fixes and improvements #804

Merged
merged 12 commits into from
Oct 16, 2024
Merged

scx_rustland fixes and improvements #804

merged 12 commits into from
Oct 16, 2024

Conversation

arighi
Copy link
Contributor

@arighi arighi commented Oct 15, 2024

Some changes to improve stability and performance of the scx_rustland_core schedulers and properly support the 6.12 kernel.

Summary of the changes:

  • scx_rustland_core now provides nvcsw, slice and dsq_vtime directly (user-space scheduler can now access this information quickly)
  • keep CPUs alive with pending tasks (this fixes issue scx_rustland_core: performance regression due to kernel change #788)
  • restart scheduler on hotplug events (this makes scx_rustland_core schedulers more reliable)
  • scx_rustland improvements: use the new metrics provided by scx_rustland_core, apply some idle selection changes from bpfland, prioritize WAKE_SYNC tasks
  • scx_rlfifo: make it more work-conserving, so that it can be used as a better stress test for the sched_ext core

Re-align idle selection logic with some of the latest improvements done
in scx_bpfland.

Signed-off-by: Andrea Righi <[email protected]>
Provide additional task metrics to user-space schedulers via QueuedTask:
 - nvcsw: total amount of voluntary context switches
 - slice: task time slice "budget" (from p->scx.slice)
 - dsq_vtime: current task vtime (from p->scx.dsq_vtime)

In this way user-space schedulers can quickly access these metrics to
implement better scheduling policy.

Signed-off-by: Andrea Righi <[email protected]>
Assign an infinite time slice to the user-space scheduler itself, so
that it can completely drain all the pending tasks and voluntarily
release the CPU when it's done.

This allows to achieve more consistent performance and we can also
remove the speculative user-space scheduler wakeup from ops.stopping().

Signed-off-by: Andrea Righi <[email protected]>
User-space schedulers may still hit some stalls during CPU hotplugging
events.

There is no reason to overcomplicate the code and trying to handle
hotplug events within the scx_rustland_core framework and we can simply
handle a scheduler restart performed by the scx core.

This makes CPU hotplugging more reliable with scx_rustland_core-based
schedulers.

Signed-off-by: Andrea Righi <[email protected]>
Prevent CPUs from going idle when the user-space scheduler has some
pending activities to complete.

Keeping the CPU alive allows to consume tasks from the user-space
scheduler more efficiently, preventing bubbles in the scheduling
pipeline.

To achieve this, trigger a CPU kick from ops.update_idle() and set a
flag in the CPU context to prevent it from going idle. Then keep kicking
the CPU from ops.dispatch() until the flag is cleared, which occurs when
no more tasks are pending or when the CPU exits idle as a task starts
running on it.

This allows to fix the performance regression introduced by the
put_prev_task_scx() behavior change in Linux 6.12 (see #788).

Link: https://lore.kernel.org/lkml/[email protected]/
Signed-off-by: Andrea Righi <[email protected]>
@arighi arighi force-pushed the rustland-fixes branch 5 times, most recently from 85c57ab to ff41cae Compare October 16, 2024 11:11
Do not kick a CPU from rs_select_cpu() (called by the user-space
scheduler), since we may not immediately dispatch the task.

Instead, always try to wake up the task's assigned CPU after dispatching
to a global DSQ, ensuring it can be consumed immediately.

Signed-off-by: Andrea Righi <[email protected]>
With user-space scheduling we don't usually dispatch a task immediately
after selecting an idle CPU, so there's not much benefit at trying to
optimize the WAKE_SYNC scenario (when a task is waking up another task
and releaing the CPU) when picking an idle CPU.

Therefore, get rid of the WAKE_SYNC logic in select_cpu() and rely on
the user-space logic (that has access to the WAKE_SYNC information) to
handle this particular case.

Signed-off-by: Andrea Righi <[email protected]>
Bump up the minor version to reflect the new backward-compatible
functionality added.

Signed-off-by: Andrea Righi <[email protected]>
Use the nvcsw metric from the scx_rustland_core backend, intead of
retrieving this metric in user-space via procfs.

Signed-off-by: Andrea Righi <[email protected]>
Update vruntime adding the used virtual time slice of each task as soon
they are scheduled.

Signed-off-by: Andrea Righi <[email protected]>
scx_rustland is now effectively a deadline-based scheduler and not a
pure vruntime-based scheduler.

Clarify this in the source code. No functional change.

Signed-off-by: Andrea Righi <[email protected]>
Make scx_rlfifo even simpler and keep dispatching tasks even if the CPUs
are all busy.

This allows to better stress test the scx_rustland_core backend, by
using both the per-CPU DSQs and the global shared DSQ.

Signed-off-by: Andrea Righi <[email protected]>
@arighi arighi added this pull request to the merge queue Oct 16, 2024
Merged via the queue into main with commit 2ea47af Oct 16, 2024
42 checks passed
@arighi arighi deleted the rustland-fixes branch October 16, 2024 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants