remove unnecessary first pass on all bridge events in cudacpp helicity calculation from madevent (and improve timers) #960

valassi · 2024-08-08T12:13:47Z

This is a WIP PR to improve timers and cudacpp helicity computation in madevent #958 and #546

Using more /tmp/avalassi/input_ggttggg_test 256 1 1 ! Number of events and max and min iterations 0.000001 ! Accuracy (ignored because max iterations = min iterations) 0 ! Grid Adjustment 0=none, 2=adjust (NB if = 0, ftn26 will still be used if present) 1 ! Suppress Amplitude 1=yes (i.e. use MadEvent single-diagram enhancement) 0 ! Helicity Sum/event 0=exact 1 ! ICONFIG number (1-N) for single-diagram enhancement multi-channel (NB used even if suppress amplitude is 0!) For CUDACPP_RUNTIME_VECSIZEUSED=256 ./madevent_fortran < /tmp/avalassi/input_ggttggg_test [COUNTERS] PROGRAM TOTAL : 3.2359s [COUNTERS] Fortran Overhead ( 0 ) : 0.0972s [COUNTERS] Fortran MEs ( 1 ) : 3.1387s for 256 events => throughput is 8.16E+01 events/s For CUDACPP_RUNTIME_VECSIZEUSED=256 ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggttggg_test [COUNTERS] PROGRAM TOTAL : 7.1173s [COUNTERS] Fortran Overhead ( 0 ) : 3.4293s [COUNTERS] CudaCpp MEs ( 2 ) : 3.6880s for 256 events => throughput is 6.94E+01 events/s For CUDACPP_RUNTIME_VECSIZEUSED=256 ./build.512y_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggttggg_test [COUNTERS] PROGRAM TOTAL : 1.5505s [COUNTERS] Fortran Overhead ( 0 ) : 0.7714s [COUNTERS] CudaCpp MEs ( 2 ) : 0.7791s for 256 events => throughput is 3.29E+02 events/s

…1.f and counters.cc, remove "counters_smatrix1_" functions and calls, which are not used anywhere There is a small but noticeable difference in ggttggg (probably much more in simpler processes?) For CUDACPP_RUNTIME_VECSIZEUSED=256 ./madevent_fortran < /tmp/avalassi/input_ggttggg_test [COUNTERS] PROGRAM TOTAL : 3.1335s [COUNTERS] Fortran Overhead ( 0 ) : 0.0983s [COUNTERS] Fortran MEs ( 1 ) : 3.0352s for 256 events => throughput is 8.43E+01 events/s

…l fix in gg_tt.mad)

…CudaCpp helicities madgraph5#958 CUDACPP_RUNTIME_VECSIZEUSED=256 ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggttggg_test [COUNTERS] PROGRAM TOTAL : 7.0962s [COUNTERS] Fortran Overhead ( 0 ) : 0.0969s [COUNTERS] CudaCpp MEs ( 2 ) : 3.6843s for 256 events => throughput is 6.95E+01 events/s [COUNTERS] CudaCpp HEL ( 3 ) : 3.3149s for 256 events => throughput is 7.72E+01 events/s CUDACPP_RUNTIME_VECSIZEUSED=256 ./build.512y_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggttggg_test [COUNTERS] PROGRAM TOTAL : 1.5576s [COUNTERS] Fortran Overhead ( 0 ) : 0.1012s [COUNTERS] CudaCpp MEs ( 2 ) : 0.7721s for 256 events => throughput is 3.32E+02 events/s [COUNTERS] CudaCpp HEL ( 3 ) : 0.6843s for 256 events => throughput is 3.74E+02 events/s

…t PR madgraph5#953) into hel

…udaCpp helicities madgraph5#958 (remove event count and throughput)

…e.inc interface, add parameter goodHelOnly as in Bridge to quit after few events in cudacpp helicity computation (fix madgraph5#958 aka madgraph5#546) CUDACPP_RUNTIME_VECSIZEUSED=256 ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggttggg_test [COUNTERS] PROGRAM TOTAL : 4.1082s [COUNTERS] Fortran Overhead ( 0 ) : 0.0979s [COUNTERS] CudaCpp MEs ( 2 ) : 3.8176s for 256 events => throughput is 6.71E+01 events/s [COUNTERS] CudaCpp HEL ( 3 ) : 0.1927s CUDACPP_RUNTIME_VECSIZEUSED=256 ./build.512y_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggttggg_test [COUNTERS] PROGRAM TOTAL : 0.9085s [COUNTERS] Fortran Overhead ( 0 ) : 0.0995s [COUNTERS] CudaCpp MEs ( 2 ) : 0.7692s for 256 events => throughput is 3.33E+02 events/s [COUNTERS] CudaCpp HEL ( 3 ) : 0.0398s (Also fix clang formatting in counters)

…small issue to fix)

…alformed patches The only files that still need to be patched are - 3 in patch.common: Source/makefile, Source/genps.inc, SubProcesses/makefile - 3 in patch.P1: auto_dsig1.f, driver.f, matrix1.f ./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch git diff --no-ext-diff -R gg_tt.mad/Source/makefile gg_tt.mad/Source/genps.inc gg_tt.mad/SubProcesses/makefile > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 git checkout gg_tt.mad

…df (tested on a command line)

… added to generateAndCompare.sh

…ggg by a template

… full tests

valassi · 2024-08-08T15:28:37Z

Hi @oliviermattelaer this is essentially ready for review. Can you please reviww?
(But I will still launch some manual tests on all processes)

I renamed it as "remove unnecessary first pass on all bridge events in cudacpp helicity calculation from madevent (and improve timers)". Essentially what this does is

I added a boolean parameter "goodhelonly" to the fortran-cpp interface of the bridge (this already existed in the cpp bridge but was not exposed to fortran yet)
in the madevent executable, during the first pass on cudacpp bridge that is meant to compute only helicities, I call the bridge with this flag=true: this ensures that only a few events (chosen by cudapp, typically 16) are used for helicity computation
without this patch, what was happening was that, with 16384 events in the bridge, the first pass was computing MEs for all 16k events, even if only 16 were used to cmpute helicities
note that the computation scales with SIMD, which explains why it was faster with AVX2 than in no SIMD
note also that this was adding a huge overhead to the cudacpp calculation with respect to fortran, because fortran does not have the extr acalculation of these 16k events
and note also that the overhead was reduced when running with only 32 events in the grid, because then this forst pass was only 32 events and not 16k

So I would say that all is understood. This was a nasty performance overhead. Now cudacpp should look a bit better with respect to fortran, especially if only few events are generated

oliviermattelaer

I guess that this would make more sense to include this in the master_goodhel
branch since the two are in a way related.

This will allow to move the HELONLY from a boolean to a float and pass limhel to the code to have this selection of helicity to be done as it should.

Or we can merge this one, and then I do the switch in master_goodhel to have that parameter as a float. (but we should not wait for merging master_goodhel anyway)

But can you comment way some file are modified on matrix.f this is weird/bad
(no isue here actually)

Cheers,

epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1

madgraph5#960) into cmsdy

…ph5#960 (remove first cudacpp pass in helicity calculation) On itgold91: Code generation completed in 161 seconds Code generation and additional checks completed in 246 seconds

…91, with 16384 vector_size, and the removal of cudacpp helicity pass madgraph5#960 CUDACPP_RUNTIME_DISABLEFPE=1 ./tlau/lauX.sh -fortran pp_dy3j.mad -togridpack

…ld91, vector_size=16384, overhead reduced PR madgraph5#960 CUDACPP_RUNTIME_DISABLEFPE=1 ./tlau/lauX.sh -nomakeclean -ALL pp_dy3j.mad -fromgridpack ./parseGridpackLogs.sh pp_dy3j.mad/ pp_dy3j.mad//fortran/output.txt [GridPackCmd.launch] OVERALL TOTAL 443.0382 seconds [madevent COUNTERS] PROGRAM TOTAL 438.801 [madevent COUNTERS] Fortran Overhead 131.826 [madevent COUNTERS] Fortran MEs 306.975 -------------------------------------------------------------------------------- pp_dy3j.mad//cppnone/output.txt [GridPackCmd.launch] OVERALL TOTAL 443.1323 seconds [madevent COUNTERS] PROGRAM TOTAL 438.864 [madevent COUNTERS] Fortran Overhead 131.804 [madevent COUNTERS] CudaCpp MEs 306.034 [madevent COUNTERS] CudaCpp HEL 1.025 -------------------------------------------------------------------------------- pp_dy3j.mad//cppsse4/output.txt [GridPackCmd.launch] OVERALL TOTAL 290.4177 seconds [madevent COUNTERS] PROGRAM TOTAL 286.159 [madevent COUNTERS] Fortran Overhead 131.795 [madevent COUNTERS] CudaCpp MEs 153.803 [madevent COUNTERS] CudaCpp HEL 0.5612 -------------------------------------------------------------------------------- pp_dy3j.mad//cppavx2/output.txt [GridPackCmd.launch] OVERALL TOTAL 199.7083 seconds [madevent COUNTERS] PROGRAM TOTAL 195.451 [madevent COUNTERS] Fortran Overhead 131.835 [madevent COUNTERS] CudaCpp MEs 63.3324 [madevent COUNTERS] CudaCpp HEL 0.2837 -------------------------------------------------------------------------------- pp_dy3j.mad//cpp512y/output.txt [GridPackCmd.launch] OVERALL TOTAL 195.8398 seconds [madevent COUNTERS] PROGRAM TOTAL 191.538 [madevent COUNTERS] Fortran Overhead 131.888 [madevent COUNTERS] CudaCpp MEs 59.3799 [madevent COUNTERS] CudaCpp HEL 0.2715 -------------------------------------------------------------------------------- pp_dy3j.mad//cpp512z/output.txt [GridPackCmd.launch] OVERALL TOTAL 171.8862 seconds [madevent COUNTERS] PROGRAM TOTAL 167.589 [madevent COUNTERS] Fortran Overhead 131.943 [madevent COUNTERS] CudaCpp MEs 35.4473 [madevent COUNTERS] CudaCpp HEL 0.1996 -------------------------------------------------------------------------------- pp_dy3j.mad//cuda/output.txt File not found: SKIP backend cuda -------------------------------------------------------------------------------- pp_dy3j.mad//hip/output.txt File not found: SKIP backend hip --------------------------------------------------------------------------------

valassi · 2024-08-08T17:16:31Z

I guess that this would make more sense to include this in the master_goodhel
branch since the two are in a way related.

Hi Olivier, thanks for loking at this.

One point: I really think that we should try NOT to have too many master_xxx branches, and rather have everything in master. Otherwise it becomes unmanageable. I work on master, and I also worked on master_june24 (to merge it in master, essentially ready), but I would avoid more branches. (What I try to do is to have several PRs onmaster in parallel, and in case I need one inside another then I do include them, but always trying to make sure that they can be merged to master one after the pther).

This specific PR makes sense against master, I would put it on master. And it does NOT have much to do with the other hel work you are doing in fortran (IIUC), because you are reducing from two fortran helicity calculations to one. Here I am just modifying how many events are used in the cudacpp helicity computation, I am not changing one vs two computations in cudacpp nor am I touching what is done in fortran. So I would really merge it in master.

This will allow to move the HELONLY from a boolean to a float

Uh? This I really do not understand, I should see what you have done. But again, I'd prefer to merge this to master first and then look at the rest. All the tests I am doing for CMS use this now.

Thanks :-)
Andrea

oliviermattelaer · 2024-08-08T19:33:10Z

One point: I really think that we should try NOT to have too many master_xxx branches, and rather have everything in master.

Approve the merge of the CI, and master_goodhel and ... they will be in master.

The multiplication of master_xxx is just that I have parallel work on therefore multiple PR (which for some weird reason takes a while to be merged). While master_june24 was special the others are quite "normal branch" (and the master_ prefix is just my typical naming of branch indicating from which branch I'm originating from.

Uh? This I really do not understand, I should see what you have done. But again, I'd prefer to merge this to master first and then look at the rest. All the tests I am doing for CMS use this now.

No problem, I will do a master_goodhel_limhel branch then to show you what I want to do for it (but if we merge master_goodhel in master --which should be uncontroversial-- first such that I can do a master_limhel branch).

STARTED AT Thu Aug 8 07:40:53 PM CEST 2024 ./tput/teeThroughputX.sh -mix -hrd -makej -eemumu -ggtt -ggttg -ggttgg -gqttq -ggttggg -makeclean ENDED(1) AT Thu Aug 8 08:05:50 PM CEST 2024 [Status=0] ./tput/teeThroughputX.sh -flt -hrd -makej -eemumu -ggtt -ggttgg -inlonly -makeclean ENDED(2) AT Thu Aug 8 08:14:09 PM CEST 2024 [Status=0] ./tput/teeThroughputX.sh -makej -eemumu -ggtt -ggttg -gqttq -ggttgg -ggttggg -flt -bridge -makeclean ENDED(3) AT Thu Aug 8 08:22:29 PM CEST 2024 [Status=0] ./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -rmbhst ENDED(4) AT Thu Aug 8 08:25:13 PM CEST 2024 [Status=0] ./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -curhst ENDED(5) AT Thu Aug 8 08:27:57 PM CEST 2024 [Status=0] ./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -common ENDED(6) AT Thu Aug 8 08:30:45 PM CEST 2024 [Status=0] ./tput/teeThroughputX.sh -mix -hrd -makej -susyggtt -susyggt1t1 -smeftggtttt -heftggbb -makeclean ENDED(7) AT Thu Aug 8 08:42:09 PM CEST 2024 [Status=0]

…heft madgraph5#833, but gqttq madgraph5#845 crash is fixed) STARTED AT Thu Aug 8 08:42:09 PM CEST 2024 (SM tests) ENDED(1) AT Fri Aug 9 12:48:36 AM CEST 2024 [Status=0] (BSM tests) ENDED(1) AT Fri Aug 9 12:58:52 AM CEST 2024 [Status=0] 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_heftggbb_mad/log_heftggbb_mad_d_inl0_hrd0.txt 1 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_heftggbb_mad/log_heftggbb_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_heftggbb_mad/log_heftggbb_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_smeftggtttt_mad/log_smeftggtttt_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_smeftggtttt_mad/log_smeftggtttt_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_smeftggtttt_mad/log_smeftggtttt_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggt1t1_mad/log_susyggt1t1_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggt1t1_mad/log_susyggt1t1_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggt1t1_mad/log_susyggt1t1_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggtt_mad/log_susyggtt_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggtt_mad/log_susyggtt_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggtt_mad/log_susyggtt_mad_m_inl0_hrd0.txt

…an-14 on Mac githib-hosted runners (fix madgraph5#971)

… merging git checkout upstream/master $(git ls-tree --name-only HEAD */CODEGEN*txt)

…MS nvcc without nvtx PR madgraph5#966) into hel

…graph5#974) into hel

valassi · 2024-08-21T09:14:47Z

I have updated this PR including

the fixes for cms nvcc without nvtx PR Fix cudacpp makefile for installations (e.g. in CMS) where nvcc exists but curand and nvtx headers are missing #966
the fix in the mac CI PR CI fix: move from gfortran-11 to gfortran-14 on Mac github-hosted runners #974
conflict fixes in codegen logs

This should be ready to merge, when the CI tests have passed

valassi · 2024-08-21T09:43:02Z

This will allow to move the HELONLY from a boolean to a float

Uh? This I really do not understand, I should see what you have done. But again, I'd prefer to merge this to master first and then look at the rest. All the tests I am doing for CMS use this now.

No problem, I will do a master_goodhel_limhel branch then to show you what I want to do for it (but if we merge master_goodhel in master --which should be uncontroversial-- first such that I can do a master_limhel branch).

Hi @oliviermattelaer thanks again for the discussion yesterday. Ok now I understand better what you mean. We could reuse the same parameter as a float in the following way

if it is >0 or =0: consider this as a helicity filtering call (ie goodhelonly=true), with the float representing LIMHEL, ie start the cudacpp helicity filtering on a subset of events and with that value of LIMHEL (see Allow LIMHEL>0 in cudacpp (using the exact same algorithm as in fortran) #564)
if it is <0 (eg say -1 by convention as you suggested), then consider this as a normal ME computation, which assumes that helicities have been precomputed.

Note also that I opened #975 about possibly adding channelid to helicity filtering (which now instead is done on all channels).

valassi · 2024-08-21T09:54:12Z

The CI completed, there are the usual expected 3 failures

Hi @oliviermattelaer as discussed yesterday, I now self-merge this. Some followup work remains to do in

Allow LIMHEL>0 in cudacpp (using the exact same algorithm as in fortran) #564 (and maybe elsewhere) for passing LIMHEL>0 to cudacpp
Use channelid in cudacpp (and fortran) helicity filtering? #975 for using channels in helicity filtering

I also still need to review and merge your comprehensive helicity changes in fortran (which I think is #955 and related work)

Thanks Andrea

… mac madgraph5#974, nvcc madgraph5#966) into june24 Fix conflicts: epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/madgraph/iolibs/template_files/gpu/counters.cc epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/madgraph/iolibs/template_files/gpu/fbridge.cc epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f epochX/cudacpp/gg_tt.mad/SubProcesses/counters.cc epochX/cudacpp/gg_tt.mad/SubProcesses/fbridge.cc NB: here I essentially fixed gg_tt.mad, not CODEGEN, which will need to be adjusted a posteriori with a backport In particular: - Note1: patch.P1 is now taken from june24, but will need to be recomputed git checkout HEAD CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 - Note2: I need to manually port some upstream/master changes in auto_dsig1.f to smatrix_multi.f, which did not yet exist

… mac madgraph5#974, nvcc madgraph5#966) into pptt Fix conflicts: epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 (take HEAD version, must recompute) epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f (fix manually)

… mac madgraph5#974, nvcc madgraph5#966) into prof

… mac madgraph5#974, nvcc madgraph5#966) into grid

…dgraph5#960, mac madgraph5#974, nvcc madgraph5#966) into cmsdy Fix conflict in tlau/fromgridpacks/parseGridpackLogs.sh (use the currenmt cmsdy version: git checkout b125b65 tlau/fromgridpacks/parseGridpackLogs.sh)

…rge with hel madgraph5#960, mac madgraph5#974, nvcc madgraph5#966) into cmsdyps

valassi added 4 commits August 8, 2024 13:49

[hel] regenerate gg_ttggg.mad - ok, only some empty lines change (wil…

3bc7e26

…l fix in gg_tt.mad)

valassi self-assigned this Aug 8, 2024

valassi marked this pull request as draft August 8, 2024 12:13

This was linked to issues Aug 8, 2024

Reduce madevent 'Fortran overhead' by restricting cudacpp helicity calculation (which scales with SIMD) to only 16 events #958

Closed

Reassess mad/MEs split in tmad timing measurements (speed up the Bridge?) #546

Closed

valassi mentioned this pull request Aug 8, 2024

Reassess mad/MEs split in tmad timing measurements (speed up the Bridge?) #546

Closed

valassi added 12 commits August 8, 2024 15:07

Merge remote-tracking branch 'upstream/master' (including clang-forma…

07759dc

…t PR madgraph5#953) into hel

[hel] in gg_ttggg.mad and CODEGEN counters.cc, improve counters for C…

ca79a65

…udaCpp helicities madgraph5#958 (remove event count and throughput)

[hel] regenerate gg_ttggg.mad, check all is ok (matrix1.pdf changed, …

3a48da8

…small issue to fix)

[hel] regenerate gg_tt.mad with the patches from gg_ttggg.mad

4431f70

[hel] in CODEGEN/generateAndCompare.sh, improve handling of matrix1.p…

2070ad5

…df (tested on a command line)

[hel] manually fix matrix1.pdf in ggttggg.mad using the awk command I…

6e42038

… added to generateAndCompare.sh

[hel] regenerate gg_tt.mad, check that all is ok

94db0f9

[hel] in CODEGEN fcheck_sa.f bg fix, reaplce hardcoded value for ggtt…

9fb9d98

…ggg by a template

[hel] regenerate all processes

8c10ea7

[hel] rerun ggtt tmad test just to check that all is ok - will launch…

19af635

… full tests

valassi requested a review from oliviermattelaer August 8, 2024 15:19

valassi changed the title ~~WIP improve timers and cudacpp helicity computation in madevent~~ remove unnecessary first pass on all bridge events in cudacpp helicity calculation from madevent (and improve timers) Aug 8, 2024

valassi marked this pull request as ready for review August 8, 2024 15:21

This was referenced Aug 8, 2024

Reduce madevent 'Fortran overhead' by restricting cudacpp helicity calculation (which scales with SIMD) to only 16 events #958

Closed

Understand why CMS sees a speedup in DY+4jets but not DY+3 jets #943

Open

oliviermattelaer reviewed Aug 8, 2024

View reviewed changes

epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 Show resolved Hide resolved

epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 Show resolved Hide resolved

valassi added a commit to valassi/madgraph4gpu that referenced this pull request Aug 8, 2024

Merge branch 'hel' (remove madevent overhead from first cudacpp pass PR

272f6e4

madgraph5#960) into cmsdy

valassi added 3 commits August 9, 2024 07:04

[hel] minor fixes in tput/gitdifftput.sh

926958a

valassi mentioned this pull request Aug 10, 2024

WIP Improve timers (lower overhead using rdtcs) and profile additional fortran components (other than MEs) #962

Draft

valassi added 4 commits August 21, 2024 10:54

[mac] in .github/workflows/c-cpp.yml, move from gfortran-11 to gfortr…

c2d3116

…an-14 on Mac githib-hosted runners (fix madgraph5#971)

[hel] move to CODEGEN logs from the latest upstream/master for easier…

92abe01

… merging git checkout upstream/master $(git ls-tree --name-only HEAD */CODEGEN*txt)

Merge remote-tracking branch 'upstream/master' (including fixes for C…

ca2eac5

…MS nvcc without nvtx PR madgraph5#966) into hel

Merge branch 'mac' (with CI fixes moving to gfortran-14 on mac PR mad…

f05273d

…graph5#974) into hel

valassi requested a review from a team as a code owner August 21, 2024 09:13

valassi mentioned this pull request Aug 21, 2024

Use channelid in cudacpp (and fortran) helicity filtering? #975

Open

valassi merged commit 3f69b26 into madgraph5:master Aug 21, 2024
166 of 169 checks passed

valassi added a commit to valassi/madgraph4gpu that referenced this pull request Aug 21, 2024

Merge remote-tracking branch 'upstream/master' (with hel madgraph5#960,…

9b394e6

… mac madgraph5#974, nvcc madgraph5#966) into prof

valassi added a commit to valassi/madgraph4gpu that referenced this pull request Aug 22, 2024

Merge remote-tracking branch 'upstream/master' (with hel madgraph5#960,…

6ad8641

… mac madgraph5#974, nvcc madgraph5#966) into grid

valassi added a commit to valassi/madgraph4gpu that referenced this pull request Aug 22, 2024

Merge branch 'cmsdy' (including grid which include upstream/master me…

7b2d8b2

…rge with hel madgraph5#960, mac madgraph5#974, nvcc madgraph5#966) into cmsdyps

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove unnecessary first pass on all bridge events in cudacpp helicity calculation from madevent (and improve timers) #960

remove unnecessary first pass on all bridge events in cudacpp helicity calculation from madevent (and improve timers) #960

valassi commented Aug 8, 2024

valassi commented Aug 8, 2024

oliviermattelaer left a comment •

edited

Loading

valassi commented Aug 8, 2024

oliviermattelaer commented Aug 8, 2024

valassi commented Aug 21, 2024

valassi commented Aug 21, 2024

valassi commented Aug 21, 2024

remove unnecessary first pass on all bridge events in cudacpp helicity calculation from madevent (and improve timers) #960

remove unnecessary first pass on all bridge events in cudacpp helicity calculation from madevent (and improve timers) #960

Conversation

valassi commented Aug 8, 2024

valassi commented Aug 8, 2024

oliviermattelaer left a comment • edited Loading

Choose a reason for hiding this comment

valassi commented Aug 8, 2024

oliviermattelaer commented Aug 8, 2024

valassi commented Aug 21, 2024

valassi commented Aug 21, 2024

valassi commented Aug 21, 2024

oliviermattelaer left a comment •

edited

Loading