Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time profiles for DY+4j (and DY+3j) have high 'python/bash' component - especially for cuda #1000

Open
valassi opened this issue Sep 16, 2024 · 0 comments

Comments

@valassi
Copy link
Member

valassi commented Sep 16, 2024

Documenting/Analysing further results of DY+4jet tests in #948

This is an issue that I had mentioned for DY+3j in #994 for cuda. In DY+4j it is even larger, and it becomes obvious (to a lesser extent) also for cpp and fortran. So I strip this off to a separate issue.

pp_dy4j.mad/fortran/output.txt (#events: 81)
[GridPackCmd.launch] OVERALL TOTAL    21707.6095 seconds
[madevent COUNTERS]  PROGRAM TOTAL    21546.1
[madevent COUNTERS]  Fortran Overhead 1579.09
[madevent COUNTERS]  Fortran MEs      19967
--------------------------------------------------------------------------------
pp_dy4j.mad/cppnone/output.txt (#events: 195)
[GridPackCmd.launch] OVERALL TOTAL    26745.1639 seconds
[madevent COUNTERS]  PROGRAM TOTAL    26584.9
[madevent COUNTERS]  Fortran Overhead 1608.51
[madevent COUNTERS]  CudaCpp MEs      24910.4
[madevent COUNTERS]  CudaCpp HEL      66.0341
--------------------------------------------------------------------------------
pp_dy4j.mad/cppsse4/output.txt (#events: 195)
[GridPackCmd.launch] OVERALL TOTAL    14398.4664 seconds
[madevent COUNTERS]  PROGRAM TOTAL    14231.3
[madevent COUNTERS]  Fortran Overhead 1647.03
[madevent COUNTERS]  CudaCpp MEs      12550.6
[madevent COUNTERS]  CudaCpp HEL      33.7035
--------------------------------------------------------------------------------
pp_dy4j.mad/cppavx2/output.txt (#events: 195)
[GridPackCmd.launch] OVERALL TOTAL    7335.2356 seconds
[madevent COUNTERS]  PROGRAM TOTAL    7114.43
[madevent COUNTERS]  Fortran Overhead 1683.7
[madevent COUNTERS]  CudaCpp MEs      5415.48
[madevent COUNTERS]  CudaCpp HEL      15.2596
--------------------------------------------------------------------------------
pp_dy4j.mad/cpp512y/output.txt (#events: 195)
[GridPackCmd.launch] OVERALL TOTAL    6831.8971 seconds
[madevent COUNTERS]  PROGRAM TOTAL    6649.98
[madevent COUNTERS]  Fortran Overhead 1669.94
[madevent COUNTERS]  CudaCpp MEs      4966.24
[madevent COUNTERS]  CudaCpp HEL      13.8066
--------------------------------------------------------------------------------
pp_dy4j.mad/cpp512z/output.txt (#events: 195)
[GridPackCmd.launch] OVERALL TOTAL    7136.2962 seconds
[madevent COUNTERS]  PROGRAM TOTAL    6958.96
[madevent COUNTERS]  Fortran Overhead 1636.28
[madevent COUNTERS]  CudaCpp MEs      5305.14
[madevent COUNTERS]  CudaCpp HEL      17.5447
--------------------------------------------------------------------------------
pp_dy4j.mad/cuda/output.txt (#events: 195)
[GridPackCmd.launch] OVERALL TOTAL    2523.7488 seconds
[madevent COUNTERS]  PROGRAM TOTAL    2234.93
[madevent COUNTERS]  Fortran Overhead 1820.36
[madevent COUNTERS]  CudaCpp MEs      97.9622
[madevent COUNTERS]  CudaCpp HEL      316.613
--------------------------------------------------------------------------------

Specifically the python/bash ("GridPackCmd OVERALL TOTAL" - "madevent PROGRAM TOTAL") is

  • 300s (2500-2200) for cuda
  • 180s (6830-6650) for cpp 512y
  • 160s (21710-21550) for fortran

So again, this non-ME component becomes more disturbing/visible for the faster MEs like cuda and simd. BUT in addition, it seems even higher for cuda.

To be understood...

valassi added a commit to valassi/madgraph4gpu that referenced this issue Sep 16, 2024
Note also:
- CudaCpp HEL seems still very high for cuda? madgraph5#999
- 'Python/Bash' component (difference between gridpack and total madevent) seems high in cudacpp? madgraph5#1000
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant