-
Notifications
You must be signed in to change notification settings - Fork 11.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[libc] Add Multithreaded GPU Benchmarks #98964
Conversation
@llvm/pr-subscribers-libc Author: None (jameshu15869) ChangesThis PR runs benchmarks on a 32 threads (A single warp on NVPTX) by default, adding the option for single threaded benchmarks. We can specify that a benchmark should be run on a single thread using the I chose to use a flag here so that other options could be added in the future. Full diff: https://github.com/llvm/llvm-project/pull/98964.diff 4 Files Affected:
diff --git a/libc/benchmarks/gpu/CMakeLists.txt b/libc/benchmarks/gpu/CMakeLists.txt
index eaeecbdacd23e..8c409bc6ef3ea 100644
--- a/libc/benchmarks/gpu/CMakeLists.txt
+++ b/libc/benchmarks/gpu/CMakeLists.txt
@@ -10,6 +10,10 @@ function(add_benchmark benchmark_name)
"LINK_LIBRARIES" # Multi-value arguments
${ARGN}
)
+ # We run benchmarks for a single warp with and give the
+ # option to run only a single thread
+ set(BENCHMARK_NUM_THREADS 32)
+
if(NOT libc.src.time.clock IN_LIST TARGET_LLVMLIBC_ENTRYPOINTS)
message(FATAL_ERROR "target does not support clock")
endif()
@@ -19,6 +23,8 @@ function(add_benchmark benchmark_name)
LINK_LIBRARIES
LibcGpuBenchmark.hermetic
${BENCHMARK_LINK_LIBRARIES}
+ LOADER_ARGS
+ --threads ${BENCHMARK_NUM_THREADS}
${BENCHMARK_UNPARSED_ARGUMENTS}
)
get_fq_target_name(${benchmark_name} fq_target_name)
diff --git a/libc/benchmarks/gpu/LibcGpuBenchmark.cpp b/libc/benchmarks/gpu/LibcGpuBenchmark.cpp
index 23fff3e8180f7..2094d33e1e9e7 100644
--- a/libc/benchmarks/gpu/LibcGpuBenchmark.cpp
+++ b/libc/benchmarks/gpu/LibcGpuBenchmark.cpp
@@ -114,8 +114,10 @@ void Benchmark::run_benchmarks() {
all_results.reset();
gpu::sync_threads();
- auto current_result = b->run();
- all_results.update(current_result);
+ if (!(b->flags & BenchmarkFlags::SINGLE_THREADED) || id == 0) {
+ auto current_result = b->run();
+ all_results.update(current_result);
+ }
gpu::sync_threads();
if (id == 0)
diff --git a/libc/benchmarks/gpu/LibcGpuBenchmark.h b/libc/benchmarks/gpu/LibcGpuBenchmark.h
index 1f813f8655de6..53f35768e1bf1 100644
--- a/libc/benchmarks/gpu/LibcGpuBenchmark.h
+++ b/libc/benchmarks/gpu/LibcGpuBenchmark.h
@@ -74,16 +74,19 @@ struct BenchmarkResult {
clock_t total_time = 0;
};
+enum BenchmarkFlags { SINGLE_THREADED = 0x1 };
+
BenchmarkResult benchmark(const BenchmarkOptions &options,
cpp::function<uint64_t(void)> wrapper_func);
class Benchmark {
const cpp::function<uint64_t(void)> func;
const cpp::string_view name;
+ const uint8_t flags;
public:
- Benchmark(cpp::function<uint64_t(void)> func, char const *name)
- : func(func), name(name) {
+ Benchmark(cpp::function<uint64_t(void)> func, char const *name, uint8_t flags)
+ : func(func), name(name), flags(flags) {
add_benchmark(this);
}
@@ -104,6 +107,11 @@ class Benchmark {
#define BENCHMARK(SuiteName, TestName, Func) \
LIBC_NAMESPACE::benchmarks::Benchmark SuiteName##_##TestName##_Instance( \
- Func, #SuiteName "." #TestName)
+ Func, #SuiteName "." #TestName, 0)
+
+#define SINGLE_THREADED_BENCHMARK(SuiteName, TestName, Func) \
+ LIBC_NAMESPACE::benchmarks::Benchmark SuiteName##_##TestName##_Instance( \
+ Func, #SuiteName "." #TestName, \
+ LIBC_NAMESPACE::benchmarks::BenchmarkFlags::SINGLE_THREADED)
#endif
diff --git a/libc/benchmarks/gpu/src/ctype/isalnum_benchmark.cpp b/libc/benchmarks/gpu/src/ctype/isalnum_benchmark.cpp
index 6f8d247902f76..d9c1a804ec506 100644
--- a/libc/benchmarks/gpu/src/ctype/isalnum_benchmark.cpp
+++ b/libc/benchmarks/gpu/src/ctype/isalnum_benchmark.cpp
@@ -7,6 +7,8 @@ uint64_t BM_IsAlnum() {
return LIBC_NAMESPACE::latency(LIBC_NAMESPACE::isalnum, x);
}
BENCHMARK(LlvmLibcIsAlNumGpuBenchmark, IsAlnum, BM_IsAlnum);
+SINGLE_THREADED_BENCHMARK(LlvmLibcIsAlNumGpuBenchmark, IsAlnumSingleThread,
+ BM_IsAlnum);
uint64_t BM_IsAlnumCapital() {
char x = 'A';
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I feel like it's easy enough to just set this one each one. It's also difficult because the warp size varies on the hardware (and compilation settings) for AMDGPU. Having a helper for a single threaded run is probably fine.
Ah, do you mean have the macro say how many threads to use? e.g. |
Nah I just mean each time we register a benchmark it should just say how many it wants. One thread is a reasonable default since that's what the loader defaults to. |
How many threads should we launch the loader with? I mean like if we always run benchmarks with |
It should launch with whatever the user requested when they wrote the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably easier to just forward the loader args through the unparsed arguments, but either way works.
Do you mean you would prefer something that looks more like
I was thinking explicitly having |
This PR runs benchmarks on a 32 threads (A single warp on NVPTX) by default, adding the option for single threaded benchmarks. We can specify that a benchmark should be run on a single thread using the `SINGLE_THREADED_BENCHMARK()` macro. I chose to use a flag here so that other options could be added in the future.
This PR runs benchmarks on a 32 threads (A single warp on NVPTX) by default, adding the option for single threaded benchmarks. We can specify that a benchmark should be run on a single thread using the `SINGLE_THREADED_BENCHMARK()` macro. I chose to use a flag here so that other options could be added in the future.
Summary: This PR runs benchmarks on a 32 threads (A single warp on NVPTX) by default, adding the option for single threaded benchmarks. We can specify that a benchmark should be run on a single thread using the `SINGLE_THREADED_BENCHMARK()` macro. I chose to use a flag here so that other options could be added in the future. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60250873
This PR runs benchmarks on a 32 threads (A single warp on NVPTX) by default, adding the option for single threaded benchmarks. We can specify that a benchmark should be run on a single thread using the
SINGLE_THREADED_BENCHMARK()
macro.I chose to use a flag here so that other options could be added in the future.