-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Starvation scenario in a particular setup of pools and execution streams #277
Comments
Possible solutions:
|
I agree with the assessment and also think the 2nd option would be the best. I don't think it's every going to be a "good" configuration to configure a pool so that it can only execute things when work stealing occurs from the network progress loop; it's not intuitive that this would guarantee progress. Not worth development time for a complex solution at any rate. This is a good catch in terms of figuring out what configurations we want to allow. |
Right now I'm testing a configuration that should be working fine, yet it's deadlocking as well. I looked at the Argobots implementation of schedulers and found this: https://github.com/pmodels/argobots/blob/main/src/sched/basic_wait.c#L217 This function is called to re-order the pools such that the private pools are first, and the pools with more access come after. All my pools are "mpmc" so you would think it doesn't matter,... except |
@carns I have another scenario that deadlocks, event though the progress pool is the last one of its execution stream. This is caused by the fact that we don't take into account the fact that the execution stream could have other pools it could pull from. Not only that, but we don't give it an opportunity to context-switch out of the progress loop.
This time we have 2 xstreams"
The progress pool is set to Then the main function, which runs on the primary pool, creates a ULT and submits it to pool My assumption is that it's again a scheduler not getting back to pulling from its first pool. When the main ULT calls join, it context-switches to the primary ES's scheduler. The scheduler sees that all the ULTs on its first pool are blocked, so it switches to taking from its second pool, Here we should get a size of 0, so we don't yield on the next line. Here pending is 0 and size = 0 so we don't enter the condition and don't yield either. In other words, we never give the progress loop an opportunity to yield back to the scheduler. I think we need to yield at some point no matter what, otherwise in any configuration in which the progress pool shares its ES with another pool, we could starve that other pool for as long as Margo doesn't have more to do than running the progress loop. |
The code above is fixed by #278 |
I have narrowed down a deadlock issue to the reproducer shown hereafter.
The scenario is as follows: I initialize margo with 2 pools (
"__primary__"
and"rpc"
), and one execution stream ("__primary__"
) associated with these two pools in that order.The process then sends an RPC to itself, and hangs indefinitely. Adding print statements shows that the last call issued is
margo_forward
. The RPC handler doesn't execute.Attaching to the process with GDB shows that it is blocked in
HG_Progress
.I initially thought it was because once we send the RPC there are no runnable ULTs in the progress pool apart from the progress ULT, so
HG_Progress
would block, but actually @carns confirmed that it would only block until a certain timeout (by default 100ms).So what I think actually happens is that the progress ULT is never "blocked" from an Argobots perspective. it hard-blocks periodically on
HG_Progress
but never on an Argobots mutex, for example, so technically the progress pool is never empty from the point of view of the xstream's scheduler. The consequence of this is that the scheduler never attempts to pull ULTs from the"rpc"
pool, causing starvation.Note that this would not be specific to running RPC ULTs: whatever ULT is submitted to the
"rpc"
pool will never run in this setup.I wonder if this is something we should try to provide a fix for? But I have no idea how we would do that. This configuration seems simple enough and pretty reasonable. It seems to mean "do network progress in priority, execute ULTs in that other pool if there is really nothing else to do, including network progress". This could appear for e.g. really low-priority ULTs such as periodic diagnostics, etc.
The text was updated successfully, but these errors were encountered: