Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-41720: [C++][Acero] Remove an useless parameter for QueryContext::Init called in hash_join_benchmark #41716

Merged
merged 3 commits into from
May 23, 2024

Conversation

ZhangHuiGui
Copy link
Collaborator

@ZhangHuiGui ZhangHuiGui commented May 19, 2024

Rationale for this change

My local compilation parameters will include the compilation of some basic benchmarks. I discovered this compilation problem today. It seems that #41334 of QueryContext::Init is not synchronized to hash_join_benchmark.cc, and CI has not found this problem. .

What changes are included in this PR?

Remove the first arg .

Are these changes tested?

Needn't

Are there any user-facing changes?

No

@ZhangHuiGui
Copy link
Collaborator Author

@zanmato1984 PTAL this?
@kou Should our CI include full benchmark compilation? Seems current hash_join_benchmark compiled by ARROW_BUILD_OPENMP_BENCHMARKS not included in CI?

Copy link
Collaborator

@zanmato1984 zanmato1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Thanks for catching this!

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels May 19, 2024
@zanmato1984
Copy link
Collaborator

Though the fix is small enough, I don't think it is a minor change as there is code modification. Could you open an issue and subject this PR to it? Thanks. @ZhangHuiGui

@kou
Copy link
Member

kou commented May 19, 2024

Can we measure how much -DARROW_BUILD_OPENMP_BENCHMARKS=ON increases CI time?

The following change will enable the benchmark on "C++ / AMD64 Conda C++ AVX2":

diff --git a/docker-compose.yml b/docker-compose.yml
index a1d8f60a26..0c9784cc0d 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -285,6 +285,7 @@ services:
     environment:
       <<: [*common, *ccache, *sccache, *cpp]
       ARROW_BUILD_BENCHMARKS: "ON"
+      ARROW_BUILD_OPENMP_BENCHMARKS: "ON"
       ARROW_BUILD_EXAMPLES: "ON"
       ARROW_ENABLE_TIMING_TESTS:  # inherit
       ARROW_EXTRA_ERROR_CONTEXT: "ON"

@ZhangHuiGui ZhangHuiGui changed the title MINOR: [C++][Acero] Remove an useless parameter for QueryContext::Init called in hash_join_benchmark GH-41720: [C++][Acero] Remove an useless parameter for QueryContext::Init called in hash_join_benchmark May 20, 2024
Copy link

⚠️ GitHub issue #41720 has been automatically assigned in GitHub to PR creator.

@ZhangHuiGui
Copy link
Collaborator Author

ZhangHuiGui commented May 20, 2024

Can we measure how much -DARROW_BUILD_OPENMP_BENCHMARKS=ON increases CI time?

The following change will enable the benchmark on "C++ / AMD64 Conda C++ AVX2":

diff --git a/docker-compose.yml b/docker-compose.yml
index a1d8f60a26..0c9784cc0d 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -285,6 +285,7 @@ services:
     environment:
       <<: [*common, *ccache, *sccache, *cpp]
       ARROW_BUILD_BENCHMARKS: "ON"
+      ARROW_BUILD_OPENMP_BENCHMARKS: "ON"
       ARROW_BUILD_EXAMPLES: "ON"
       ARROW_ENABLE_TIMING_TESTS:  # inherit
       ARROW_EXTRA_ERROR_CONTEXT: "ON"

Thanks, added this to current PR, but I’m not sure how to compare it with the previous CI compilation time.

@kou
Copy link
Member

kou commented May 20, 2024

We can CI logs for it.

Anyway... The parameter wasn't used...: https://github.com/apache/arrow/actions/runs/9152098620/job/25159059866?pr=41716#step:6:864

Could you also add this?

diff --git a/ci/scripts/cpp_build.sh b/ci/scripts/cpp_build.sh
index a1f40fc360..6a3a53f253 100755
--- a/ci/scripts/cpp_build.sh
+++ b/ci/scripts/cpp_build.sh
@@ -120,6 +120,7 @@ else
     -DARROW_BUILD_BENCHMARKS=${ARROW_BUILD_BENCHMARKS:-OFF} \
     -DARROW_BUILD_EXAMPLES=${ARROW_BUILD_EXAMPLES:-OFF} \
     -DARROW_BUILD_INTEGRATION=${ARROW_BUILD_INTEGRATION:-OFF} \
+    -DARROW_BUILD_OPENMP_BENCHMARKS=${ARROW_BUILD_OPENMP_BENCHMARKS:-OFF} \
     -DARROW_BUILD_SHARED=${ARROW_BUILD_SHARED:-ON} \
     -DARROW_BUILD_STATIC=${ARROW_BUILD_STATIC:-ON} \
     -DARROW_BUILD_TESTS=${ARROW_BUILD_TESTS:-OFF} \

@ZhangHuiGui
Copy link
Collaborator Author

off:
AMD64 Conda C++ AVX2
succeeded 7 hours ago in 26m 43s

on:
AMD64 Conda C++ AVX2
succeeded 4 hours ago in 5m 42s

It seems to have decreased a lot...

@kou
Copy link
Member

kou commented May 20, 2024

It's caused by ccache.

The "off" case wasn't cached: https://github.com/apache/arrow/actions/runs/9152098620/job/25159059866?pr=41716#step:6:2944

   Hits:               14 / 882 ( 1.59%)
  Misses:            868 / 882 (98.41%)

The "on" case cached: https://github.com/apache/arrow/actions/runs/9154084580/job/25164035145?pr=41716#step:6:2956

  Hits:              895 / 1765 (50.71%)
  Misses:            870 / 1765 (49.29%)

I think that about 6min with cache isn't a problem.

@pitrou pitrou merged commit c8f89d0 into apache:main May 23, 2024
64 of 65 checks passed
@pitrou pitrou removed the awaiting committer review Awaiting committer review label May 23, 2024
@github-actions github-actions bot added the awaiting committer review Awaiting committer review label May 23, 2024
Copy link

After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit c8f89d0.

There were 8 benchmark results indicating a performance regression:

The full Conbench report has more details. It also includes information about 47 possible false positives for unstable benchmarks that are known to sometimes produce them.

vibhatha pushed a commit to vibhatha/arrow that referenced this pull request May 25, 2024
…text::Init called in hash_join_benchmark (apache#41716)

### Rationale for this change
My local compilation parameters will include the compilation of some basic benchmarks. I discovered this compilation problem today. It seems that apache#41334 of `QueryContext::Init` is not synchronized to `hash_join_benchmark.cc`, and CI has not found this problem. .

### What changes are included in this PR?
Remove the first arg .

### Are these changes tested?
Needn't

### Are there any user-facing changes?
No

* GitHub Issue: apache#41720

Lead-authored-by: ZhangHuiGui <[email protected]>
Co-authored-by: ZhangHuiGui <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants