Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix get_stack_base in linux v2 #1485

Merged

Conversation

bongkyu7-kim
Copy link
Contributor

In case of pthread_attr_getstack() returns np_stack_size = 1MB,

  • calculated stack_base = stack_limit + stack_size(default: 4MB)
  • real stack_base = stack_limit + np_stack_size(1MB) stack_base is wrong because calculated with 4MB default stack size.

Due to this wrong stack_base, task_dispatcher::can_steal() can be always false. It causes abnormal long loop in task_dispatcher::receive_or_steal_task().

Correct stack_size by pthread_attr_getstacksize() in linux.

Description

Add a comprehensive description of proposed changes

Fixes # - issue number(s) if exists

Type of change

Choose one or multiple, leave empty if none of the other choices apply

Add a respective label(s) to PR if you have permissions

  • bug fix - change that fixes an issue
  • new feature - change that adds functionality
  • tests - change in tests
  • infrastructure - change in infrastructure and CI
  • documentation - documentation update

Tests

  • added - required for new features and some bug fixes
  • not needed

Documentation

  • updated in # - add PR number
  • needs to be updated
  • not needed

Breaks backward compatibility

  • Yes
  • No
  • Unknown

Notify the following users

@pavelkumbrasev

Other information

@bongkyu7-kim bongkyu7-kim mentioned this pull request Aug 22, 2024
14 tasks
size_t np_stack_size = 0;
pthread_attr_t np_attr_stack;
if (0 == pthread_getattr_np(pthread_self(), &np_attr_stack)) {
if (0 == pthread_attr_getstacksize(&np_attr_stack, &np_stack_size)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this call be sufficient?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is sufficient.
If stack size is not set by pthread_attr_setstacksize(), inside pthread_attr_getstacksize() can handle it. (by RLIMIT_STACK)

Comment on lines 206 to 208
#else
return a.my_threading_control->worker_stack_size();
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add implementation for the rest of OS? (The same logic as for get_stack_base)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest of OS are no portable way to get stack size.
get_stack_base() like the below.

  • windows: get stack_base and stack_limit (no stack size)
  • linux: pthread_attr_np API
  • others: no way for stack base and size -> just using current stack pointer

@pavelkumbrasev
Copy link
Contributor

@aleksei-fedotov what do you think?

@bongkyu7-kim
Copy link
Contributor Author

@pavelkumbrasev Any update for this? I'm waiting for an reply.

@aleksei-fedotov
Copy link
Contributor

In case of pthread_attr_getstack() returns np_stack_size = 1MB

  • Does not returning 1MB by phtread_attr_getstack() mean that the stack size is indeed that size?

  • And why would pthread_attr_getstacksize() return a different value?
    Even if so, it seems reasonable to use the smallest value between these two. Simply due to safety reasons - the program might crash otherwise.

And maybe the commentary also need to be updated.

@bongkyu7-kim
Copy link
Contributor Author

@aleksei-fedotov
pthread_attr_getstack() returns 1MB correctly.
The problem is not to use the returned stack size(1MB), but to use worker_stack_size(default 4MB) for calculated_stealing_threshold as shown in the below code.

stack_size = a.my_threading_control->worker_stack_size();  <-- default 4MB
std::uintptr_t stack_base = get_stack_base(stack_size);  <-- only return stack_base, stack_size is not updated
task_dispatcher& task_disp = td.my_arena_slot->default_task_dispatcher();
td.enter_task_dispatcher(task_disp, calculate_stealing_threshold(stack_base, stack_size));   <-- calculated by wrong stack_size(4MB)

@aleksei-fedotov
Copy link
Contributor

The problem is <...> to use worker_stack_size(default 4MB) for calculated_stealing_threshold

@bongkyu7-kim Yeah, but according to this part of the comment that was done on purpose.

To be clear, I don't disagree with your proposal. I just disagree with the implementation approach you chose. I think we can do better.

The issues I see with the patch:

  • There is a double effort obtaining essentially the same value for the stack size - calling pthread_attr_getstack and pthread_attr_getstacksize, not to mention excessive pthread_attr_t instantiation and destruction.
  • The code becomes more confusing as now we have two separate places that do the same thing.
  • The commentary part becomes outdated and contradicts to what the code does.

So, to improve the code I propose the following:

  • Incorporate the new get_stack_size() into get_stack_base() with, of course, renaming the latter as it becomes returning not only stack base but its size as well. Perhaps, to something like get_stack_attributes()?
  • Still preserve the fallback behavior to use worker stack size for macOS and IA-64 for Linux OS. Perhaps, use dedicated macros to differentiate.
  • Updating the comment to reflect the changes. Perhaps, with moving its corresponding parts closer to the respective parts of the implementation.

@bongkyu7-kim
Copy link
Contributor Author

@aleksei-fedotov

  • I have already proposed in an incoporate way such as comment(Fix get_stack_base in linux #1389 (comment)), but it has been rejected and is now patched.
    Should I modify it as below code and change the function name to get_stack_attributes?
--- a/src/tbb/governor.cpp
+++ b/src/tbb/governor.cpp
@@ -137,7 +137,7 @@ bool governor::does_client_join_workers(const rml::tbb_client &client) {
     3) If the user app strives to conserve the memory by cutting stack size, it
     should do this for TBB workers too (as in the #1).
 */
-static std::uintptr_t get_stack_base(std::size_t stack_size) {
+static std::uintptr_t get_stack_base(std::size_t &stack_size) {
     // Stacks are growing top-down. Highest address is called "stack base",
     // and the lowest is "stack limit".
 #if __TBB_USE_WINAPI
@@ -165,7 +165,8 @@ static std::uintptr_t get_stack_base(std::size_t stack_size) {
 #endif /* __linux__ */
     std::uintptr_t stack_base{};
     if (stack_limit) {
-        stack_base = reinterpret_cast<std::uintptr_t>(stack_limit) + stack_size;
+        stack_base = reinterpret_cast<std::uintptr_t>(stack_limit) + np_stack_size;
+        stack_size = np_stack_size;
     } else {
         // Use an anchor as a base stack address.
         int anchor{};
  • I think we don't need to change for macOS and IA-64 for Linux OS.
#if __linux__ && !__bg__
// macOS is not __linux__, not enter here
// IA-64 for Linux OS is __bg__(Blue Gene system), not enter here
#endif /* __linux__ */

@aleksei-fedotov
Copy link
Contributor

@bongkyu7-kim I looked at your previous PR #1389.

So, I suggest having the following changes:

  • Have the get_stack_base to return stack base as well as its size. I.e., change the signature to static void get_stack_attributes(std::uintptr_t& stack_base, std::size_t& stack_size, std::size_t fallback_stack_size).
  • Apply your patch from PR Fix get_stack_base in linux #1389, but pass a.my_threading_control->worker_stack_size()as the value of fallback_stack_size parameter. Use pthread-* functions as usual to try obtaining real stack_limit and np_stack_size.
  • Before computing stack_base check that obtained np_stack_size is greater than zero. Otherwise, use fallback value.

@bongkyu7-kim
Copy link
Contributor Author

@aleksei-fedotov
I updated patch by your suggestion. Please review it.
Thanks.

Copy link
Contributor

@aleksei-fedotov aleksei-fedotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bongkyu7-kim almost done. I believe one more change and we finish it.

Good that we avoided changing the commentary part as it remains relevant.

src/tbb/governor.cpp Outdated Show resolved Hide resolved
In case of pthread_attr_getstack() returns np_stack_size = 1MB,
- calculated stack_base = stack_limit + stack_size(default: 4MB)
- real stack_base = stack_limit + np_stack_size(1MB)
stack_base is wrong because calculated with 4MB stack size.

Due to this wrong stack_base, task_dispatcher::can_steal() can be always false.
It causes abnormal long loop in task_dispatcher::receive_or_steal_task().

Correct this problem using np_stack_size instead of stack_size.

Signed-off-by: Bongkyu Kim <[email protected]>
Copy link
Contributor

@aleksei-fedotov aleksei-fedotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the patch looks good to me. Nice job, @bongkyu7-kim!

I suppose PR #1389 can be closed as well?

@bongkyu7-kim
Copy link
Contributor Author

@aleksei-fedotov
Thanks for your review. I closed PR #1389.

@aleksei-fedotov aleksei-fedotov merged commit 9d85bf3 into oneapi-src:master Oct 7, 2024
23 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants