Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel: support dynamic thread stack allocation #44379

Conversation

cfriedt
Copy link
Member

@cfriedt cfriedt commented Mar 30, 2022

Opening this up for review. It's a biggie, and has some wide-reaching implications.

TL;DR - we should be able to support automatic pthread stack allocation with this. It's in the Roadmap (#51211).

However, at least with GCC, this does a lot more though. With compliant POSIX threads <pthread.h>, we also get

  • ISO C11 <threads.h>
  • ISO C++11 <thread> (<mutex>, <condition_variable>, <semaphore>), etc.

Later, we can optimize-out the POSIX layer of course, which I would like to take great care in doing with Keith and Stephanos.

Taking a step back, excluding coroutines, that represents the de-facto threading model for approximately the last 25 years, and probably the de-facto threading model for most other programming languages for the foreseeable future.

  • MicroPython threads
  • Rust threads
  • Lua threads
  • Zig threads
  • ...

The API supports both safety-critical and non-safety critical because it uses the same interface for both pool-based and heap-based thread stack allocations.

Covered in my talks at EOSS 2023.

Thanks for the help everyone 🙏 ❤️
@andrewboie , @pfalcon, @andyross , @peter-mitsis , @nashif , @dcpleung , @ceolin , @stephanosio , @keith-packard ... I really could keep going.

Fixes #26999

There is some icing on the top added by @ceolin for k_object management in #59773

BELOW FOR POSTERITY

Just a WIP at the moment. Working fine for cortex-m3.

$ west build -p auto -b qemu_cortex_m3 -t run tests/kernel/threads/dynamic_thread_stack
...
*** Booting Zephyr OS build zephyr-v3.0.0-2000-g3ccf6cd86f23  ***
Running test suite dynamic_thread_stack
===================================================================
START - test_dynamic_thread_stack
Hello, dynamic world!
 PASS - test_dynamic_thread_stack in 0.1 seconds
===================================================================
Test suite dynamic_thread_stack succeeded
===================================================================
PROJECT EXECUTION SUCCESSFUL

@github-actions github-actions bot added area: API Changes to public APIs area: Kernel area: Tests Issues related to a particular existing or missing test labels Mar 30, 2022
drivers/mm/Kconfig Outdated Show resolved Hide resolved
@cfriedt cfriedt force-pushed the issue/26999/k-thread-create-dynamic-stack branch 7 times, most recently from b5b819c to f58ea3a Compare March 31, 2022 02:51
@cfriedt
Copy link
Member Author

cfriedt commented Mar 31, 2022

@dcpleung - would it make sense to separate the sys_mm_* - arch_mm_* redirection layer for into its own PR?

@cfriedt cfriedt force-pushed the issue/26999/k-thread-create-dynamic-stack branch 3 times, most recently from 53f2ccd to 2993fbf Compare March 31, 2022 11:07
@cfriedt cfriedt added the dev-review To be discussed in dev-review meeting label Mar 31, 2022
@cfriedt cfriedt force-pushed the issue/26999/k-thread-create-dynamic-stack branch from 2993fbf to 03fbcb4 Compare March 31, 2022 11:26
@dcpleung
Copy link
Member

Architectures have arch_mem_map() and arch_mem_unmap(). Adding a bunch of arch_mm_drv_*() is going to confuse developers. Maybe add a system mm driver to utilize arch_mem_*() instead?

@cfriedt cfriedt force-pushed the issue/26999/k-thread-create-dynamic-stack branch 2 times, most recently from 12621e6 to 8ec3552 Compare April 6, 2022 10:06
@cfriedt
Copy link
Member Author

cfriedt commented Apr 6, 2022

Architectures have arch_mem_map() and arch_mem_unmap(). Adding a bunch of arch_mm_drv_*() is going to confuse developers. Maybe add a system mm driver to utilize arch_mem_*() instead?

I didn't bother creating a sys interface, it seemed to be overkill for now.

@cfriedt cfriedt force-pushed the issue/26999/k-thread-create-dynamic-stack branch 6 times, most recently from 774f93b to b87c6de Compare April 6, 2022 12:57
@cfriedt cfriedt force-pushed the issue/26999/k-thread-create-dynamic-stack branch from d32fbd4 to b97445a Compare June 30, 2023 16:59
@cfriedt
Copy link
Member Author

cfriedt commented Jun 30, 2023

some yaml fixups... hopefully I didn't break anything this time 🤣

@cfriedt
Copy link
Member Author

cfriedt commented Jun 30, 2023

Flavio & I will likely merge our PRs tomorrow, so no rush at the moment.

Add support for dynamic thread stack allocation

Signed-off-by: Christopher Friedt <[email protected]>
@cfriedt cfriedt force-pushed the issue/26999/k-thread-create-dynamic-stack branch from b97445a to 2a012bb Compare July 11, 2023 22:05
@cfriedt
Copy link
Member Author

cfriedt commented Jul 11, 2023

@ceolin & I discussed it, and we opted to merge this PR first, and subsequently merge #59773 afterward. The second PR fine-tunes userspace, mmu, and mpu support.

This will open a few interesting doors.

Test that automatic thread stack allocation works for
both user and kernel threads.

Signed-off-by: Christopher Friedt <[email protected]>
@cfriedt cfriedt force-pushed the issue/26999/k-thread-create-dynamic-stack branch from 2a012bb to 71ca732 Compare July 11, 2023 23:44
@cfriedt
Copy link
Member Author

cfriedt commented Jul 11, 2023

  • reformat tests/kernel/threads/dynamic_thread_stack/testcase.yaml in whatever way yamllint is cool with 🤷‍♂️

Copy link
Collaborator

@keith-packard keith-packard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should always allocate stacks from the heap; reduces code complexity, avoids issues with the pre-allocated stacks being the wrong size and doesn't consume any more memory.

@cfriedt
Copy link
Member Author

cfriedt commented Jul 12, 2023

I think you should always allocate stacks from the heap; reduces code complexity, avoids issues with the pre-allocated stacks being the wrong size and doesn't consume any more memory.

@keith-packard - I think that makes sense for a large majority of use cases, in particular userspace code, and we actually can get that behaviour via Kconfig (by specifying the pool size to be zero and enabling heap allocation).

The pool option is really there to support high reliability / safety critical if they choose to use it.

@andyross
Copy link
Contributor

The pool option is really there to support high reliability / safety critical if they choose to use it.

Even fragmentation is an issue in an RTOS. Stacks are big, memories are small, uptimes are (sometimes very) long. A dynamic spawned thread that works fine in testing in legacy code might suddenly die after two years in the field. A core Zephyr paradigm is that the system heap should be 100% optional. Everything that can be should be allocatable statically.

I'm actually dealing with a bunch of issues like this currently while fuzzing SOF, which is very heap-centric. The allocation errors can be handled correctly (and are), but the resulting runtime failures are still failures, and prevent reaching further states. The impact there is actually paradoxical: the random input tends to leak a lot of allocations and run out of space quickly, so in fact fuzz coverage drops like a rock after the first few thousand commands. We're seeing failures come back from the (extremely parallel) oss-fuzz rig that I never saw over days and days of running locally.

@ceolin
Copy link
Member

ceolin commented Jul 12, 2023

I think you should always allocate stacks from the heap; reduces code complexity, avoids issues with the pre-allocated stacks being the wrong size and doesn't consume any more memory.

@keith-packard - I think that makes sense for a large majority of use cases, in particular userspace code, and we actually can get that behaviour via Kconfig (by specifying the pool size to be zero and enabling heap allocation).

The pool option is really there to support high reliability / safety critical if they choose to use it.

yep ... One thing though that I am thinking now is that having two separated API is probably better. The reason is that whe you are allocating from the pool the stack size is defined at build time by CONFIG_DYNAMIC_THREAD_STACK_SIZE (so the given parameter is not really needed and is confusing) while allocating from the heap we don't have this constraint. Also, K_USER does not make sense IMHO, since if you have userspace enabled they stacks declared in the pool are already user capable.

@keith-packard
Copy link
Collaborator

yep ... One thing though that I am thinking now is that having two separated API is probably better.

Oh, so just exposing the stack_alloc_pool and stack_alloc_dyn apis directly instead of wrapping them in another function. That avoids problems with applications expecting pool allocations and getting surprised with dyn stacks.

One more request -- can we have the stack automatically freed when the thread exits?

@cfriedt
Copy link
Member Author

cfriedt commented Jul 13, 2023

yep ... One thing though that I am thinking now is that having two separated API is probably better.

Oh, so just exposing the stack_alloc_pool and stack_alloc_dyn apis directly instead of wrapping them in another function. That avoids problems with applications expecting pool allocations and getting surprised with dyn stacks.

If that's the general consensus, it would probably be better to not even expose pool allocation this way - just have a separate K_THREAD_POOL() that only ever allocates at build time.

It would also make the pthread use case more complicated, which I was hoping to avoid, since that was the entire reason for doing this in a way to silently support both safety and non-safety in the same way.

@romkell, @andyross - do you have any preferences?

One more request -- can we have the stack automatically freed when the thread exits?

That was tried a long time ago in a couple of different ways and it tended to be problematic and a bit racey.

Additionally, there was a vito from the safety side, because invariably then the malloc-ing code would be intermixed with the non-mallocing code.

@romkell
Copy link
Contributor

romkell commented Jul 13, 2023

@romkell, @andyross - do you have any preferences?

My major point concerning safety is that we retain the API where we instantiate every thing statically on the appl. side (and pass to init function etc.).
To have a stack pool to separate the memory for stacks from other heap based mem allocation may bring some advantages. I still would want to avoid dyn. stacks for safety application.
When exposing both APIs for pool and heap for the threads, the appl. developer can do any kind of mixed (pool and heap) dynamic stack allocation in the appl. Do we want to support that?
When hiding underneath one API function the application developer decides whether to use the pool or heap approach via KConfig. What is the use case to allow both pool and heap stacks in one application?
For simplicity I would prefer one API, but can perfectly live with both exposed, and leave responsibility to the appl. developer.

@cfriedt
Copy link
Member Author

cfriedt commented Jul 13, 2023

yep ... One thing though that I am thinking now is that having two separated API is probably better. The reason is that whe you are allocating from the pool the stack size is defined at build time by CONFIG_DYNAMIC_THREAD_STACK_SIZE (so the given parameter is not really needed and is confusing) while allocating from the heap we don't have this constraint.

Technically, that is a maximum size for pooled stacks (maybe the Kconfig could be named better). The size parameter can still provide non-redundant information when size < max (especially in the case where an MMU or Stack Canaries are present).

In any case though, given that both the pure-heap, pure-pool, and mixed cases are realizable with a single API, I think it would eliminate a non-trivial amount of code duplication and maybe some future bugs with virtually zero overhead (since kconfig-controlled code paths are dead-stripped by the linker in the current implementation).

To me that's a big selling feature that makes supporting the standard threading model in C, C++, pthreads (not to mention other languages) dramatically easier.

Just at a glance, it does not seem possible to support statically allocated C11 or C++11 thread stacks at all, externally to the implementation.

https://en.cppreference.com/w/c/thread

https://en.cppreference.com/w/cpp/thread/thread/thread

I think it may be possible in Rust to specify a statically allocated thread stack.

So there is a significant probability we would be forced to duplicate all of that heap / pool logic for several language runtimes.

Also, K_USER does not make sense IMHO, since if you have userspace enabled they stacks declared in the pool are already user capable.

K_USER does make sense for heap-allocated stacks though.

@cfriedt
Copy link
Member Author

cfriedt commented Jul 13, 2023

Mentioned offline as well, but it would likely make sense to support flags for pool or heap as well

@ceolin
Copy link
Member

ceolin commented Jul 13, 2023

that the system heap should be 100% optional. Everything that can be should be allocatable statically.

I think that was well ... just the K_USER that option that I was talking about to be clear. Particularly because of the different behavior that this can lead on heap/pool allocation. Though we can choose the preferred way, if it fails the fallback is using the other one.

@ceolin
Copy link
Member

ceolin commented Jul 13, 2023

yep ... One thing though that I am thinking now is that having two separated API is probably better.

Oh, so just exposing the stack_alloc_pool and stack_alloc_dyn apis directly instead of wrapping them in another function. That avoids problems with applications expecting pool allocations and getting surprised with dyn stacks.

If that's the general consensus, it would probably be better to not even expose pool allocation this way - just have a separate K_THREAD_POOL() that only ever allocates at build time.

It would also make the pthread use case more complicated, which I was hoping to avoid, since that was the entire reason for doing this in a way to silently support both safety and non-safety in the same way.

@romkell, @andyross - do you have any preferences?

One more request -- can we have the stack automatically freed when the thread exits?

That was tried a long time ago in a couple of different ways and it tended to be problematic and a bit racey.

Additionally, there was a vito from the safety side, because invariably then the malloc-ing code would be intermixed with the non-mallocing code.

I agree, that is error prone too, someone may try to re-use the allocated stack in a different thread ...

Copy link
Contributor

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should drop a +1 to move this along. Really I think this is very reasonable. It's a big area with lots of edges (especially once we start talking about userspace!), and no API is going to meet everyone's needs. This seems straightforward and useful enough for common cases, and apps with complicated needs can use it as a reference if necessary.

config DYNAMIC_THREAD_STACK_SIZE
int "Size of each pre-allocated thread stack"
default 1024 if !64BIT
default 2048 if 64BIT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default MMU_PAGE_SIZE if MMU as stack needs to use the full page for access control (especially for userspace).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

userspace is coming in my pr ... do not worry about it now please.

@cfriedt
Copy link
Member Author

cfriedt commented Jul 13, 2023

Mentioned offline as well, but it would likely make sense to support flags for pool or heap as well

So if it's desirable, k_thread_stack_alloc() could accept flags like

K_THREAD_STACK_FROM_HEAP, or K_THREAD_STACK_FROM_POOL.

to indicate a suitable allocation strategy as well (a default of zero could imply either).

@cfriedt
Copy link
Member Author

cfriedt commented Jul 13, 2023

Should drop a +1 to move this along. Really I think this is very reasonable. It's a big area with

Amen to that. Lets get Flavio's changes in as well so people can mix things up and develop organically.

Better to do it early in the release cycle.

@ceolin
Copy link
Member

ceolin commented Jul 13, 2023

Mentioned offline as well, but it would likely make sense to support flags for pool or heap as well

So if it's desirable, k_thread_stack_alloc() could accept flags like

K_THREAD_STACK_FROM_HEAP, or K_THREAD_STACK_FROM_POOL.

to indicate a suitable allocation strategy as well (a default of zero could imply either).

That is what I was thinking ... this way we can remove the build symbols about the allocation preferences and remove all fallbacks ... that is way is clear for the user from where he is getting the memory. Note that is not a blocker for me, we can do it incrementally.

@cfriedt cfriedt merged commit 1323b1a into zephyrproject-rtos:main Jul 13, 2023
25 checks passed
@cfriedt cfriedt deleted the issue/26999/k-thread-create-dynamic-stack branch October 7, 2023 00:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: API Changes to public APIs area: Kernel area: Tests Issues related to a particular existing or missing test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a Stack Allocation API