-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A few notes on work_stealing #217
Comments
While benchmarking this library and diving in a lot of segmentation violations, I think I just hit this There is also a similar problem in the NUMA version https://github.com/boostorg/fiber/blob/develop/include/boost/fiber/numa/algo/work_stealing.hpp#L41 and the shared-work scheduler https://github.com/boostorg/fiber/blob/develop/include/boost/fiber/algo/shared_work.hpp#L40 |
Still a bug related to the fact it is not possible to have different work-stealing schedulers boostorg/fiber#217
Implement in header only and modern C++ from - discussion boostorg/fiber#217 - https://github.com/lenerd/fiber_tp - https://github.com/boostorg/fiber/blob/develop/include/boost/fiber/algo/work_stealing.hpp
Ran into this today:
I tried to create work_stealing with 1 thread, but it hung in |
Hi all,
I wanted to implement a thread pool for executing fibers using the
work_stealing
algorithm from the library. While doing this I ran into a couple of problems. Maybe these are intended limitations of the library but I have not anticipated them by just reading the documentation. So I thought I just post here what I noted while using Boost.Fiber.You cannot use multiple instances of work_stealing scheduler concurrently in an application:
work_stealing
maintains the list and number of its instances in twostatic
class members:fiber/include/boost/fiber/algo/work_stealing.hpp
Lines 39 to 40 in 1bc393b
Therefore, we cannot have two disjoint groups of threads using
work_stealing
to distribute the fibers among themselves.You cannot use multiple instances of work_stealing scheduler sequentially in an application:
The static member
counter_
is initialized with0
fiber/src/algo/work_stealing.cpp
Line 26 in 1bc393b
and incremented as soon as a thread arrives
fiber/src/algo/work_stealing.cpp
Lines 36 to 37 in 1bc393b
The value (saved in
id_
) is then used as vector index of the current thread. However, there is no way to resetcounter_
to0
. Hence, we cannot usework_stealing
a second time.You need to have at least two threads to execute
work_stealing
algorithm:Otherwise, the single thread may try to steal from a random thread that is different from itself and cannot succeed.
It may not make much sense to perform the algorithm with a single thread but this way you cannot just change one parameter in order get a single worker thread performing the tasks.
The first points I have circumvented by adapting the code of
work_stealing
aspooled_work_stealing
and moving the static variable into apool_ctx
class (see lenerd/fiber_tp).You need to keep the "main" fibers running until all fibers managed by the scheduler are executed:
There is something written in the documentation: Keep workers running
Initially I understood this paragraph (and especially the last sentence) in the way that I need to prevent the "pool" to get empty (similarly to using
io_context::work
in Boost.Asio). However, if I understood the behavior correctly, the "main" fiber associated with the thread stack needs to stay alive until all other fibers are completed.This leads to the next point:
Assume we have such a pool of worker threads and we have created all fibers that we want to run in this pool. Moreover, let there be a fiber which is currently blocked, e.g., waiting for a future whose value is set by some thread outside of the pool. Before I am allowed to close the pool and let the "main" fibers terminate, I have to be sure that this fiber is also completed.
Is there a way to find out if there are still (possibly blocked) fibers waiting to get completed?
That's it so far. Thanks for the nice library.
The text was updated successfully, but these errors were encountered: