-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] Acero hash join ManyJoins test failing on several nightly builds #43040
Comments
One particular specialty of this test is that it takes more stack (the C program stack) space, about over 2MB. This is so far what I've been guessing to cause this failure. Surprise is that I ran this case locally (on my Mac M1) with ASAN enabled, it effectively reported:
Due to the lack of environment of alpine and emscripten, I'm going to file a PR and do some experiments to the stack size. Is it possible to run the failed tasks in a particular PR? @jorisvandenbossche Any idea? Thank you. Also, maybe not directly related to this issue, I'm curious why existing CI doesn't detect the said ASAN failure. I may want to see some recent ASAN testing in CI or nightly. @jorisvandenbossche Do you have any pointers? Thank you again. |
Yes, we can trigger those from a PR (I am not sure being a collaborator is sufficient to allow triggering this, or if you need to be a commiter, but in any case I can trigger those builds for you in case)
I am not very familiar with our ASAN/UBSAN test builds (cc @pitrou). |
Verified that the issue is caused by too many joins taking big amount of stack space. PR #43042 filed to solve this (and includes the experiment results). The original test was too aggressive (had 64 joins, actually 16 should just do). Sorry for failing the nightly for such a long time :( |
The ubuntu ASAN build seems running all fine. I guess it might be related to the operating system (mine is OSX). PR #43042 solves the ASAN error on OSX too so I'm not going to dive any deeper about this. |
### Rationale for this change The current recursion 64 in many-join test is too aggressive so stack (the C program stack) overflow may happen on alpine or emscripten causing issues like #43040 . ### What changes are included in this PR? Reduce the recursion to 16, which is strong enough for the purpose of #41335 which introduced this test. ### Are these changes tested? Change is test. ### Are there any user-facing changes? None. * GitHub Issue: #43040 Authored-by: Ruoxi Sun <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
Issue resolved by pull request 43042 |
…43042) ### Rationale for this change The current recursion 64 in many-join test is too aggressive so stack (the C program stack) overflow may happen on alpine or emscripten causing issues like apache#43040 . ### What changes are included in this PR? Reduce the recursion to 16, which is strong enough for the purpose of apache#41335 which introduced this test. ### Are these changes tested? Change is test. ### Are there any user-facing changes? None. * GitHub Issue: apache#43040 Authored-by: Ruoxi Sun <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
The
ManyJoins
tests (part ofarrow-acero-hash-join-node-test
) added in #41335 seems to be causing nightly failures (cc @zanmato1984).test-alpine-linux-cpp
Here there is no useful output except for "Segmentation fault", but I assume it is coming from
HashJoin.ManyJoins
because that is the one that has no "OK" (and the last passing build was from the day the PR was merged)test-ubuntu-22.04-cpp-emscripten
Here there is some output logs from the failure:
(cc @joemarshall in case this output gives you any pointers)
The text was updated successfully, but these errors were encountered: