Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUDAX] Add a way to combine thread hierarchies #2746

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

pciolkosz
Copy link
Contributor

@pciolkosz pciolkosz commented Nov 8, 2024

In order to implement combination of kernel configurations, we first need a way to combine thread hierarchies.

In case of overlap between the hierarchies one of them should have the priority over the other. I decided to implement it as a member function combine on the hierarchy type to make it less symmetrical than a free function. In case of overlap the object on which combine was called has priority over the other hierarchy.

I decided not to support a case where one of the hierarchies is "in the middle" of the other hierarchy. The supported overlap cases are hierarchy ABC with AB, BC or CD, but not just B.

I also decided to do some small cleanups:
can_stack_to_top -> can_rhs_stack_on_lhs because the previous one was confusing about the order
get_first_level_type -> get_first/last_level, because get_first_level_type::level_type seems weird.

@pciolkosz pciolkosz requested a review from a team as a code owner November 8, 2024 00:59
Copy link
Contributor

github-actions bot commented Nov 8, 2024

🟩 CI finished in 1h 14m: Pass: 100%/54 | Total: 4h 38m | Avg: 5m 09s | Max: 16m 28s | Hits: 61%/238
  • 🟩 cudax: Pass: 100%/54 | Total: 4h 38m | Avg: 5m 09s | Max: 16m 28s | Hits: 61%/238

    🟩 cpu
      🟩 amd64              Pass: 100%/50  | Total:  4h 25m | Avg:  5m 18s | Max: 16m 28s | Hits:  61%/238   
      🟩 arm64              Pass: 100%/4   | Total: 13m 36s | Avg:  3m 24s | Max:  3m 28s
    🟩 ctk
      🟩 12.0               Pass: 100%/19  | Total:  1h 37m | Avg:  5m 07s | Max: 16m 21s | Hits:  61%/119   
      🟩 12.5               Pass: 100%/2   | Total: 11m 59s | Avg:  5m 59s | Max:  6m 09s
      🟩 12.6               Pass: 100%/33  | Total:  2h 49m | Avg:  5m 08s | Max: 16m 28s | Hits:  61%/119   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/19  | Total:  1h 37m | Avg:  5m 07s | Max: 16m 21s | Hits:  61%/119   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 11m 59s | Avg:  5m 59s | Max:  6m 09s
      🟩 nvcc12.6           Pass: 100%/33  | Total:  2h 49m | Avg:  5m 08s | Max: 16m 28s | Hits:  61%/119   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/54  | Total:  4h 38m | Avg:  5m 09s | Max: 16m 28s | Hits:  61%/238   
    🟩 cxx
      🟩 Clang9             Pass: 100%/2   | Total:  8m 10s | Avg:  4m 05s | Max:  4m 26s
      🟩 Clang10            Pass: 100%/2   | Total:  8m 03s | Avg:  4m 01s | Max:  4m 30s
      🟩 Clang11            Pass: 100%/4   | Total: 15m 05s | Avg:  3m 46s | Max:  3m 54s
      🟩 Clang12            Pass: 100%/4   | Total: 15m 03s | Avg:  3m 45s | Max:  4m 00s
      🟩 Clang13            Pass: 100%/4   | Total: 15m 11s | Avg:  3m 47s | Max:  3m 56s
      🟩 Clang14            Pass: 100%/4   | Total: 27m 05s | Avg:  6m 46s | Max: 15m 34s
      🟩 Clang15            Pass: 100%/2   | Total:  8m 23s | Avg:  4m 11s | Max:  4m 35s
      🟩 Clang16            Pass: 100%/4   | Total: 15m 24s | Avg:  3m 51s | Max:  4m 22s
      🟩 Clang17            Pass: 100%/2   | Total:  8m 17s | Avg:  4m 08s | Max:  4m 09s
      🟩 Clang18            Pass: 100%/2   | Total: 20m 05s | Avg: 10m 02s | Max: 16m 20s
      🟩 GCC9               Pass: 100%/2   | Total:  7m 33s | Avg:  3m 46s | Max:  3m 48s
      🟩 GCC10              Pass: 100%/4   | Total: 14m 53s | Avg:  3m 43s | Max:  4m 01s
      🟩 GCC11              Pass: 100%/4   | Total: 15m 21s | Avg:  3m 50s | Max:  4m 00s
      🟩 GCC12              Pass: 100%/7   | Total:  1h 02m | Avg:  8m 59s | Max: 16m 28s
      🟩 GCC13              Pass: 100%/3   | Total:  9m 52s | Avg:  3m 17s | Max:  3m 28s
      🟩 MSVC14.36          Pass: 100%/1   | Total:  7m 23s | Avg:  7m 23s | Max:  7m 23s | Hits:  61%/119   
      🟩 MSVC14.39          Pass: 100%/1   | Total:  7m 59s | Avg:  7m 59s | Max:  7m 59s | Hits:  61%/119   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 11m 59s | Avg:  5m 59s | Max:  6m 09s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/30  | Total:  2h 20m | Avg:  4m 41s | Max: 16m 20s
      🟩 GCC                Pass: 100%/20  | Total:  1h 50m | Avg:  5m 31s | Max: 16m 28s
      🟩 MSVC               Pass: 100%/2   | Total: 15m 22s | Avg:  7m 41s | Max:  7m 59s | Hits:  61%/238   
      🟩 NVHPC              Pass: 100%/2   | Total: 11m 59s | Avg:  5m 59s | Max:  6m 09s
    🟩 gpu
      🟩 v100               Pass: 100%/54  | Total:  4h 38m | Avg:  5m 09s | Max: 16m 28s | Hits:  61%/238   
    🟩 jobs
      🟩 Build              Pass: 100%/49  | Total:  3h 18m | Avg:  4m 03s | Max:  7m 59s | Hits:  61%/238   
      🟩 Test               Pass: 100%/5   | Total:  1h 19m | Avg: 15m 59s | Max: 16m 28s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  3m 22s | Avg:  3m 22s | Max:  3m 22s
      🟩 90a                Pass: 100%/1   | Total:  3m 05s | Avg:  3m 05s | Max:  3m 05s
    🟩 std
      🟩 17                 Pass: 100%/29  | Total:  2h 17m | Avg:  4m 43s | Max: 16m 28s
      🟩 20                 Pass: 100%/25  | Total:  2h 21m | Avg:  5m 39s | Max: 16m 20s | Hits:  61%/238   
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 54)

# Runner
43 linux-amd64-cpu16
5 linux-amd64-gpu-v100-latest-1
4 linux-arm64-cpu16
2 windows-amd64-cpu16

Copy link
Contributor

🟩 CI finished in 1h 27m: Pass: 100%/54 | Total: 4h 44m | Avg: 5m 15s | Max: 20m 26s | Hits: 61%/246
  • 🟩 cudax: Pass: 100%/54 | Total: 4h 44m | Avg: 5m 15s | Max: 20m 26s | Hits: 61%/246

    🟩 cpu
      🟩 amd64              Pass: 100%/50  | Total:  4h 30m | Avg:  5m 24s | Max: 20m 26s | Hits:  61%/246   
      🟩 arm64              Pass: 100%/4   | Total: 14m 16s | Avg:  3m 34s | Max:  4m 10s
    🟩 ctk
      🟩 12.0               Pass: 100%/19  | Total:  1h 42m | Avg:  5m 22s | Max: 20m 26s | Hits:  61%/123   
      🟩 12.5               Pass: 100%/2   | Total: 16m 04s | Avg:  8m 02s | Max:  8m 24s
      🟩 12.6               Pass: 100%/33  | Total:  2h 46m | Avg:  5m 01s | Max: 17m 04s | Hits:  61%/123   
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/19  | Total:  1h 42m | Avg:  5m 22s | Max: 20m 26s | Hits:  61%/123   
      🟩 nvcc12.5           Pass: 100%/2   | Total: 16m 04s | Avg:  8m 02s | Max:  8m 24s
      🟩 nvcc12.6           Pass: 100%/33  | Total:  2h 46m | Avg:  5m 01s | Max: 17m 04s | Hits:  61%/123   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/54  | Total:  4h 44m | Avg:  5m 15s | Max: 20m 26s | Hits:  61%/246   
    🟩 cxx
      🟩 Clang9             Pass: 100%/2   | Total:  8m 33s | Avg:  4m 16s | Max:  4m 23s
      🟩 Clang10            Pass: 100%/2   | Total:  7m 58s | Avg:  3m 59s | Max:  4m 20s
      🟩 Clang11            Pass: 100%/4   | Total: 15m 23s | Avg:  3m 50s | Max:  4m 02s
      🟩 Clang12            Pass: 100%/4   | Total: 15m 06s | Avg:  3m 46s | Max:  4m 04s
      🟩 Clang13            Pass: 100%/4   | Total: 14m 32s | Avg:  3m 38s | Max:  3m 43s
      🟩 Clang14            Pass: 100%/4   | Total: 26m 03s | Avg:  6m 30s | Max: 15m 25s
      🟩 Clang15            Pass: 100%/2   | Total:  7m 43s | Avg:  3m 51s | Max:  3m 55s
      🟩 Clang16            Pass: 100%/4   | Total: 14m 35s | Avg:  3m 38s | Max:  3m 57s
      🟩 Clang17            Pass: 100%/2   | Total:  7m 48s | Avg:  3m 54s | Max:  3m 59s
      🟩 Clang18            Pass: 100%/2   | Total: 18m 21s | Avg:  9m 10s | Max: 14m 25s
      🟩 GCC9               Pass: 100%/2   | Total:  7m 26s | Avg:  3m 43s | Max:  3m 54s
      🟩 GCC10              Pass: 100%/4   | Total: 14m 37s | Avg:  3m 39s | Max:  4m 03s
      🟩 GCC11              Pass: 100%/4   | Total: 14m 59s | Avg:  3m 44s | Max:  4m 01s
      🟩 GCC12              Pass: 100%/7   | Total:  1h 06m | Avg:  9m 32s | Max: 20m 26s
      🟩 GCC13              Pass: 100%/3   | Total: 10m 51s | Avg:  3m 37s | Max:  4m 10s
      🟩 MSVC14.36          Pass: 100%/1   | Total:  8m 04s | Avg:  8m 04s | Max:  8m 04s | Hits:  61%/123   
      🟩 MSVC14.39          Pass: 100%/1   | Total:  9m 30s | Avg:  9m 30s | Max:  9m 30s | Hits:  61%/123   
      🟩 NVHPC24.7          Pass: 100%/2   | Total: 16m 04s | Avg:  8m 02s | Max:  8m 24s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/30  | Total:  2h 16m | Avg:  4m 32s | Max: 15m 25s
      🟩 GCC                Pass: 100%/20  | Total:  1h 54m | Avg:  5m 44s | Max: 20m 26s
      🟩 MSVC               Pass: 100%/2   | Total: 17m 34s | Avg:  8m 47s | Max:  9m 30s | Hits:  61%/246   
      🟩 NVHPC              Pass: 100%/2   | Total: 16m 04s | Avg:  8m 02s | Max:  8m 24s
    🟩 gpu
      🟩 v100               Pass: 100%/54  | Total:  4h 44m | Avg:  5m 15s | Max: 20m 26s | Hits:  61%/246   
    🟩 jobs
      🟩 Build              Pass: 100%/49  | Total:  3h 22m | Avg:  4m 07s | Max:  9m 30s | Hits:  61%/246   
      🟩 Test               Pass: 100%/5   | Total:  1h 22m | Avg: 16m 26s | Max: 20m 26s
    🟩 sm
      🟩 90                 Pass: 100%/1   | Total:  3m 06s | Avg:  3m 06s | Max:  3m 06s
      🟩 90a                Pass: 100%/1   | Total:  3m 23s | Avg:  3m 23s | Max:  3m 23s
    🟩 std
      🟩 17                 Pass: 100%/29  | Total:  2h 20m | Avg:  4m 49s | Max: 20m 26s
      🟩 20                 Pass: 100%/25  | Total:  2h 24m | Avg:  5m 46s | Max: 17m 04s | Hits:  61%/246   
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
CUB
Thrust
+/- CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

🏃‍ Runner counts (total jobs: 54)

# Runner
43 linux-amd64-cpu16
5 linux-amd64-gpu-v100-latest-1
4 linux-arm64-cpu16
2 windows-amd64-cpu16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

2 participants