Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A few notes on work_stealing #217

Open
lenerd opened this issue Oct 30, 2019 · 2 comments
Open

A few notes on work_stealing #217

lenerd opened this issue Oct 30, 2019 · 2 comments

Comments

@lenerd
Copy link

lenerd commented Oct 30, 2019

Hi all,

I wanted to implement a thread pool for executing fibers using the work_stealing algorithm from the library. While doing this I ran into a couple of problems. Maybe these are intended limitations of the library but I have not anticipated them by just reading the documentation. So I thought I just post here what I noted while using Boost.Fiber.

  • You cannot use multiple instances of work_stealing scheduler concurrently in an application:

    work_stealing maintains the list and number of its instances in two static class members:

    static std::atomic< std::uint32_t > counter_;
    static std::vector< intrusive_ptr< work_stealing > > schedulers_;

    Therefore, we cannot have two disjoint groups of threads using work_stealing to distribute the fibers among themselves.

  • You cannot use multiple instances of work_stealing scheduler sequentially in an application:

    The static member counter_ is initialized with 0

    std::atomic< std::uint32_t > work_stealing::counter_{ 0 };

    and incremented as soon as a thread arrives
    work_stealing::work_stealing( std::uint32_t thread_count, bool suspend) :
    id_{ counter_++ },

    The value (saved in id_) is then used as vector index of the current thread. However, there is no way to reset counter_ to 0. Hence, we cannot use work_stealing a second time.

  • You need to have at least two threads to execute work_stealing algorithm:

    Otherwise, the single thread may try to steal from a random thread that is different from itself and cannot succeed.

    It may not make much sense to perform the algorithm with a single thread but this way you cannot just change one parameter in order get a single worker thread performing the tasks.

The first points I have circumvented by adapting the code of work_stealing as pooled_work_stealing and moving the static variable into a pool_ctx class (see lenerd/fiber_tp).

  • You need to keep the "main" fibers running until all fibers managed by the scheduler are executed:
    There is something written in the documentation: Keep workers running

    In order to keep the worker thread running, the fiber associated with the thread stack (which is called “main” fiber) is blocked. For instance the “main” fiber might wait on a condition_variable. For a gracefully shutdown condition_variable is signalled and the “main” fiber returns. The scheduler gets destructed if all fibers of the worker thread have been terminated.

    Initially I understood this paragraph (and especially the last sentence) in the way that I need to prevent the "pool" to get empty (similarly to using io_context::work in Boost.Asio). However, if I understood the behavior correctly, the "main" fiber associated with the thread stack needs to stay alive until all other fibers are completed.

This leads to the next point:

  • Assume we have such a pool of worker threads and we have created all fibers that we want to run in this pool. Moreover, let there be a fiber which is currently blocked, e.g., waiting for a future whose value is set by some thread outside of the pool. Before I am allowed to close the pool and let the "main" fibers terminate, I have to be sure that this fiber is also completed.

    Is there a way to find out if there are still (possibly blocked) fibers waiting to get completed?

That's it so far. Thanks for the nice library.

@keryell
Copy link
Contributor

keryell commented Mar 11, 2020

While benchmarking this library and diving in a lot of segmentation violations, I think I just hit this static member problem...

There is also a similar problem in the NUMA version https://github.com/boostorg/fiber/blob/develop/include/boost/fiber/numa/algo/work_stealing.hpp#L41 and the shared-work scheduler https://github.com/boostorg/fiber/blob/develop/include/boost/fiber/algo/shared_work.hpp#L40

keryell added a commit to keryell/teaching_C-plus-plus that referenced this issue Mar 12, 2020
Still a bug related to the fact it is not possible to have different
work-stealing schedulers boostorg/fiber#217
@yotann
Copy link

yotann commented Mar 10, 2022

Ran into this today:

You need to have at least two threads to execute work_stealing algorithm

I tried to create work_stealing with 1 thread, but it hung in this_fiber::sleep_for(). This limitation should be documented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants