Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove unnecessary first pass on all bridge events in cudacpp helicity calculation from madevent (and improve timers) #960

Merged
merged 23 commits into from
Aug 21, 2024

Conversation

valassi
Copy link
Member

@valassi valassi commented Aug 8, 2024

This is a WIP PR to improve timers and cudacpp helicity computation in madevent #958 and #546

Using
more /tmp/avalassi/input_ggttggg_test
256 1 1 ! Number of events and max and min iterations
0.000001 ! Accuracy (ignored because max iterations = min iterations)
0 ! Grid Adjustment 0=none, 2=adjust (NB if = 0, ftn26 will still be used if present)
1 ! Suppress Amplitude 1=yes (i.e. use MadEvent single-diagram enhancement)
0 ! Helicity Sum/event 0=exact
1 ! ICONFIG number (1-N) for single-diagram enhancement multi-channel (NB used even if suppress amplitude is 0!)

For
CUDACPP_RUNTIME_VECSIZEUSED=256 ./madevent_fortran < /tmp/avalassi/input_ggttggg_test
 [COUNTERS] PROGRAM TOTAL          :    3.2359s
 [COUNTERS] Fortran Overhead ( 0 ) :    0.0972s
 [COUNTERS] Fortran MEs      ( 1 ) :    3.1387s for      256 events => throughput is 8.16E+01 events/s

For
CUDACPP_RUNTIME_VECSIZEUSED=256 ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggttggg_test
 [COUNTERS] PROGRAM TOTAL          :    7.1173s
 [COUNTERS] Fortran Overhead ( 0 ) :    3.4293s
 [COUNTERS] CudaCpp MEs      ( 2 ) :    3.6880s for      256 events => throughput is 6.94E+01 events/s

For
CUDACPP_RUNTIME_VECSIZEUSED=256 ./build.512y_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggttggg_test
 [COUNTERS] PROGRAM TOTAL          :    1.5505s
 [COUNTERS] Fortran Overhead ( 0 ) :    0.7714s
 [COUNTERS] CudaCpp MEs      ( 2 ) :    0.7791s for      256 events => throughput is 3.29E+02 events/s
…1.f and counters.cc, remove "counters_smatrix1_" functions and calls, which are not used anywhere

There is a small but noticeable difference in ggttggg (probably much more in simpler processes?)

For
CUDACPP_RUNTIME_VECSIZEUSED=256 ./madevent_fortran < /tmp/avalassi/input_ggttggg_test
 [COUNTERS] PROGRAM TOTAL          :    3.1335s
 [COUNTERS] Fortran Overhead ( 0 ) :    0.0983s
 [COUNTERS] Fortran MEs      ( 1 ) :    3.0352s for      256 events => throughput is 8.43E+01 events/s
…CudaCpp helicities madgraph5#958

CUDACPP_RUNTIME_VECSIZEUSED=256 ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggttggg_test
 [COUNTERS] PROGRAM TOTAL          :    7.0962s
 [COUNTERS] Fortran Overhead ( 0 ) :    0.0969s
 [COUNTERS] CudaCpp MEs      ( 2 ) :    3.6843s for      256 events => throughput is 6.95E+01 events/s
 [COUNTERS] CudaCpp HEL      ( 3 ) :    3.3149s for      256 events => throughput is 7.72E+01 events/s

CUDACPP_RUNTIME_VECSIZEUSED=256 ./build.512y_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggttggg_test
 [COUNTERS] PROGRAM TOTAL          :    1.5576s
 [COUNTERS] Fortran Overhead ( 0 ) :    0.1012s
 [COUNTERS] CudaCpp MEs      ( 2 ) :    0.7721s for      256 events => throughput is 3.32E+02 events/s
 [COUNTERS] CudaCpp HEL      ( 3 ) :    0.6843s for      256 events => throughput is 3.74E+02 events/s
…udaCpp helicities madgraph5#958 (remove event count and throughput)
…e.inc interface, add parameter goodHelOnly as in Bridge to quit after few events in cudacpp helicity computation (fix madgraph5#958 aka madgraph5#546)

CUDACPP_RUNTIME_VECSIZEUSED=256 ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggttggg_test
 [COUNTERS] PROGRAM TOTAL          :    4.1082s
 [COUNTERS] Fortran Overhead ( 0 ) :    0.0979s
 [COUNTERS] CudaCpp MEs      ( 2 ) :    3.8176s for      256 events => throughput is 6.71E+01 events/s
 [COUNTERS] CudaCpp HEL      ( 3 ) :    0.1927s

CUDACPP_RUNTIME_VECSIZEUSED=256 ./build.512y_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggttggg_test
 [COUNTERS] PROGRAM TOTAL          :    0.9085s
 [COUNTERS] Fortran Overhead ( 0 ) :    0.0995s
 [COUNTERS] CudaCpp MEs      ( 2 ) :    0.7692s for      256 events => throughput is 3.33E+02 events/s
 [COUNTERS] CudaCpp HEL      ( 3 ) :    0.0398s

(Also fix clang formatting in counters)
…alformed patches

The only files that still need to be patched are
- 3 in patch.common: Source/makefile, Source/genps.inc, SubProcesses/makefile
- 3 in patch.P1: auto_dsig1.f, driver.f, matrix1.f

./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch
git diff --no-ext-diff -R gg_tt.mad/Source/makefile gg_tt.mad/Source/genps.inc gg_tt.mad/SubProcesses/makefile > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common
git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1
git checkout gg_tt.mad
@valassi valassi changed the title WIP improve timers and cudacpp helicity computation in madevent remove unnecessary first pass on all bridge events in cudacpp helicity calculation from madevent (and improve timers) Aug 8, 2024
@valassi valassi marked this pull request as ready for review August 8, 2024 15:21
@valassi
Copy link
Member Author

valassi commented Aug 8, 2024

Hi @oliviermattelaer this is essentially ready for review. Can you please reviww?
(But I will still launch some manual tests on all processes)

I renamed it as "remove unnecessary first pass on all bridge events in cudacpp helicity calculation from madevent (and improve timers)". Essentially what this does is

  • I added a boolean parameter "goodhelonly" to the fortran-cpp interface of the bridge (this already existed in the cpp bridge but was not exposed to fortran yet)
  • in the madevent executable, during the first pass on cudacpp bridge that is meant to compute only helicities, I call the bridge with this flag=true: this ensures that only a few events (chosen by cudapp, typically 16) are used for helicity computation
  • without this patch, what was happening was that, with 16384 events in the bridge, the first pass was computing MEs for all 16k events, even if only 16 were used to cmpute helicities
  • note that the computation scales with SIMD, which explains why it was faster with AVX2 than in no SIMD
  • note also that this was adding a huge overhead to the cudacpp calculation with respect to fortran, because fortran does not have the extr acalculation of these 16k events
  • and note also that the overhead was reduced when running with only 32 events in the grid, because then this forst pass was only 32 events and not 16k

So I would say that all is understood. This was a nasty performance overhead. Now cudacpp should look a bit better with respect to fortran, especially if only few events are generated

Copy link
Member

@oliviermattelaer oliviermattelaer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that this would make more sense to include this in the master_goodhel
branch since the two are in a way related.

This will allow to move the HELONLY from a boolean to a float and pass limhel to the code to have this selection of helicity to be done as it should.

Or we can merge this one, and then I do the switch in master_goodhel to have that parameter as a float. (but we should not wait for merging master_goodhel anyway)

But can you comment way some file are modified on matrix.f this is weird/bad
(no isue here actually)

Cheers,

valassi added a commit to valassi/madgraph4gpu that referenced this pull request Aug 8, 2024
valassi added a commit to valassi/madgraph4gpu that referenced this pull request Aug 8, 2024
…ph5#960 (remove first cudacpp pass in helicity calculation)

On itgold91:
Code generation completed in 161 seconds
Code generation and additional checks completed in 246 seconds
valassi added a commit to valassi/madgraph4gpu that referenced this pull request Aug 8, 2024
…91, with 16384 vector_size, and the removal of cudacpp helicity pass madgraph5#960

CUDACPP_RUNTIME_DISABLEFPE=1 ./tlau/lauX.sh -fortran pp_dy3j.mad -togridpack
valassi added a commit to valassi/madgraph4gpu that referenced this pull request Aug 8, 2024
…ld91, vector_size=16384, overhead reduced PR madgraph5#960

CUDACPP_RUNTIME_DISABLEFPE=1 ./tlau/lauX.sh -nomakeclean -ALL pp_dy3j.mad -fromgridpack

./parseGridpackLogs.sh pp_dy3j.mad/
pp_dy3j.mad//fortran/output.txt
[GridPackCmd.launch] OVERALL TOTAL    443.0382 seconds
[madevent COUNTERS]  PROGRAM TOTAL    438.801
[madevent COUNTERS]  Fortran Overhead 131.826
[madevent COUNTERS]  Fortran MEs      306.975
--------------------------------------------------------------------------------
pp_dy3j.mad//cppnone/output.txt
[GridPackCmd.launch] OVERALL TOTAL    443.1323 seconds
[madevent COUNTERS]  PROGRAM TOTAL    438.864
[madevent COUNTERS]  Fortran Overhead 131.804
[madevent COUNTERS]  CudaCpp MEs      306.034
[madevent COUNTERS]  CudaCpp HEL      1.025
--------------------------------------------------------------------------------
pp_dy3j.mad//cppsse4/output.txt
[GridPackCmd.launch] OVERALL TOTAL    290.4177 seconds
[madevent COUNTERS]  PROGRAM TOTAL    286.159
[madevent COUNTERS]  Fortran Overhead 131.795
[madevent COUNTERS]  CudaCpp MEs      153.803
[madevent COUNTERS]  CudaCpp HEL      0.5612
--------------------------------------------------------------------------------
pp_dy3j.mad//cppavx2/output.txt
[GridPackCmd.launch] OVERALL TOTAL    199.7083 seconds
[madevent COUNTERS]  PROGRAM TOTAL    195.451
[madevent COUNTERS]  Fortran Overhead 131.835
[madevent COUNTERS]  CudaCpp MEs      63.3324
[madevent COUNTERS]  CudaCpp HEL      0.2837
--------------------------------------------------------------------------------
pp_dy3j.mad//cpp512y/output.txt
[GridPackCmd.launch] OVERALL TOTAL    195.8398 seconds
[madevent COUNTERS]  PROGRAM TOTAL    191.538
[madevent COUNTERS]  Fortran Overhead 131.888
[madevent COUNTERS]  CudaCpp MEs      59.3799
[madevent COUNTERS]  CudaCpp HEL      0.2715
--------------------------------------------------------------------------------
pp_dy3j.mad//cpp512z/output.txt
[GridPackCmd.launch] OVERALL TOTAL    171.8862 seconds
[madevent COUNTERS]  PROGRAM TOTAL    167.589
[madevent COUNTERS]  Fortran Overhead 131.943
[madevent COUNTERS]  CudaCpp MEs      35.4473
[madevent COUNTERS]  CudaCpp HEL      0.1996
--------------------------------------------------------------------------------
pp_dy3j.mad//cuda/output.txt
File not found: SKIP backend cuda
--------------------------------------------------------------------------------
pp_dy3j.mad//hip/output.txt
File not found: SKIP backend hip
--------------------------------------------------------------------------------
@valassi
Copy link
Member Author

valassi commented Aug 8, 2024

I guess that this would make more sense to include this in the master_goodhel
branch since the two are in a way related.

Hi Olivier, thanks for loking at this.

One point: I really think that we should try NOT to have too many master_xxx branches, and rather have everything in master. Otherwise it becomes unmanageable. I work on master, and I also worked on master_june24 (to merge it in master, essentially ready), but I would avoid more branches. (What I try to do is to have several PRs onmaster in parallel, and in case I need one inside another then I do include them, but always trying to make sure that they can be merged to master one after the pther).

This specific PR makes sense against master, I would put it on master. And it does NOT have much to do with the other hel work you are doing in fortran (IIUC), because you are reducing from two fortran helicity calculations to one. Here I am just modifying how many events are used in the cudacpp helicity computation, I am not changing one vs two computations in cudacpp nor am I touching what is done in fortran. So I would really merge it in master.

This will allow to move the HELONLY from a boolean to a float

Uh? This I really do not understand, I should see what you have done. But again, I'd prefer to merge this to master first and then look at the rest. All the tests I am doing for CMS use this now.

Thanks :-)
Andrea

@oliviermattelaer
Copy link
Member

One point: I really think that we should try NOT to have too many master_xxx branches, and rather have everything in master.

Approve the merge of the CI, and master_goodhel and ... they will be in master.

The multiplication of master_xxx is just that I have parallel work on therefore multiple PR (which for some weird reason takes a while to be merged). While master_june24 was special the others are quite "normal branch" (and the master_ prefix is just my typical naming of branch indicating from which branch I'm originating from.

Uh? This I really do not understand, I should see what you have done. But again, I'd prefer to merge this to master first and then look at the rest. All the tests I am doing for CMS use this now.

No problem, I will do a master_goodhel_limhel branch then to show you what I want to do for it (but if we merge master_goodhel in master --which should be uncontroversial-- first such that I can do a master_limhel branch).

STARTED  AT Thu Aug  8 07:40:53 PM CEST 2024
./tput/teeThroughputX.sh -mix -hrd -makej -eemumu -ggtt -ggttg -ggttgg -gqttq -ggttggg -makeclean
ENDED(1) AT Thu Aug  8 08:05:50 PM CEST 2024 [Status=0]
./tput/teeThroughputX.sh -flt -hrd -makej -eemumu -ggtt -ggttgg -inlonly -makeclean
ENDED(2) AT Thu Aug  8 08:14:09 PM CEST 2024 [Status=0]
./tput/teeThroughputX.sh -makej -eemumu -ggtt -ggttg -gqttq -ggttgg -ggttggg -flt -bridge -makeclean
ENDED(3) AT Thu Aug  8 08:22:29 PM CEST 2024 [Status=0]
./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -rmbhst
ENDED(4) AT Thu Aug  8 08:25:13 PM CEST 2024 [Status=0]
./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -curhst
ENDED(5) AT Thu Aug  8 08:27:57 PM CEST 2024 [Status=0]
./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -common
ENDED(6) AT Thu Aug  8 08:30:45 PM CEST 2024 [Status=0]
./tput/teeThroughputX.sh -mix -hrd -makej -susyggtt -susyggt1t1 -smeftggtttt -heftggbb -makeclean
ENDED(7) AT Thu Aug  8 08:42:09 PM CEST 2024 [Status=0]
…heft madgraph5#833, but gqttq madgraph5#845 crash is fixed)

STARTED  AT Thu Aug  8 08:42:09 PM CEST 2024
(SM tests)
ENDED(1) AT Fri Aug  9 12:48:36 AM CEST 2024 [Status=0]
(BSM tests)
ENDED(1) AT Fri Aug  9 12:58:52 AM CEST 2024 [Status=0]

24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_heftggbb_mad/log_heftggbb_mad_d_inl0_hrd0.txt
1 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_heftggbb_mad/log_heftggbb_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_heftggbb_mad/log_heftggbb_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_smeftggtttt_mad/log_smeftggtttt_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_smeftggtttt_mad/log_smeftggtttt_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_smeftggtttt_mad/log_smeftggtttt_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggt1t1_mad/log_susyggt1t1_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggt1t1_mad/log_susyggt1t1_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggt1t1_mad/log_susyggt1t1_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggtt_mad/log_susyggtt_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggtt_mad/log_susyggtt_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggtt_mad/log_susyggtt_mad_m_inl0_hrd0.txt
@valassi valassi requested a review from a team as a code owner August 21, 2024 09:13
@valassi
Copy link
Member Author

valassi commented Aug 21, 2024

I have updated this PR including

This should be ready to merge, when the CI tests have passed

@valassi
Copy link
Member Author

valassi commented Aug 21, 2024

This will allow to move the HELONLY from a boolean to a float

Uh? This I really do not understand, I should see what you have done. But again, I'd prefer to merge this to master first and then look at the rest. All the tests I am doing for CMS use this now.

No problem, I will do a master_goodhel_limhel branch then to show you what I want to do for it (but if we merge master_goodhel in master --which should be uncontroversial-- first such that I can do a master_limhel branch).

Hi @oliviermattelaer thanks again for the discussion yesterday. Ok now I understand better what you mean. We could reuse the same parameter as a float in the following way

  • if it is >0 or =0: consider this as a helicity filtering call (ie goodhelonly=true), with the float representing LIMHEL, ie start the cudacpp helicity filtering on a subset of events and with that value of LIMHEL (see Allow LIMHEL>0 in cudacpp (using the exact same algorithm as in fortran) #564)
  • if it is <0 (eg say -1 by convention as you suggested), then consider this as a normal ME computation, which assumes that helicities have been precomputed.

Note also that I opened #975 about possibly adding channelid to helicity filtering (which now instead is done on all channels).

@valassi
Copy link
Member Author

valassi commented Aug 21, 2024

The CI completed, there are the usual expected 3 failures
image

Hi @oliviermattelaer as discussed yesterday, I now self-merge this. Some followup work remains to do in

I also still need to review and merge your comprehensive helicity changes in fortran (which I think is #955 and related work)

Thanks Andrea

@valassi valassi merged commit 3f69b26 into madgraph5:master Aug 21, 2024
166 of 169 checks passed
valassi added a commit to valassi/madgraph4gpu that referenced this pull request Aug 21, 2024
… mac madgraph5#974, nvcc madgraph5#966) into june24

Fix conflicts:
	epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1
	epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/madgraph/iolibs/template_files/gpu/counters.cc
	epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/madgraph/iolibs/template_files/gpu/fbridge.cc
	epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f
	epochX/cudacpp/gg_tt.mad/SubProcesses/counters.cc
	epochX/cudacpp/gg_tt.mad/SubProcesses/fbridge.cc

NB: here I essentially fixed gg_tt.mad, not CODEGEN, which will need to be adjusted a posteriori with a backport

In particular:
- Note1: patch.P1 is now taken from june24, but will need to be recomputed
git checkout HEAD CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1
- Note2: I need to manually port some upstream/master changes in auto_dsig1.f to smatrix_multi.f, which did not yet exist
valassi added a commit to valassi/madgraph4gpu that referenced this pull request Aug 21, 2024
… mac madgraph5#974, nvcc madgraph5#966) into pptt

Fix conflicts:
	epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 (take HEAD version, must recompute)
	epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f (fix manually)
valassi added a commit to valassi/madgraph4gpu that referenced this pull request Aug 21, 2024
valassi added a commit to valassi/madgraph4gpu that referenced this pull request Aug 22, 2024
valassi added a commit to valassi/madgraph4gpu that referenced this pull request Aug 22, 2024
…dgraph5#960, mac madgraph5#974, nvcc madgraph5#966) into cmsdy

Fix conflict in tlau/fromgridpacks/parseGridpackLogs.sh
(use the currenmt cmsdy version: git checkout b125b65 tlau/fromgridpacks/parseGridpackLogs.sh)
valassi added a commit to valassi/madgraph4gpu that referenced this pull request Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants