Patches and test results over the latest PRs #780

valassi · 2023-10-28T14:46:55Z

Hi Olivier, Stefan, Stephan, this is a comprehensive patch including several changes over the latest PRs. This is based on the current master including all PRs in the last two months, with some fixes that should be non controversial (clang format, build warnings, build errors etc), and other changes that instead need closer review (e.g. they may break Olivier's Mac).

It also includes new test results for 'my standard processes', using a new Alma9 machine (including downfall mitigation, which seems to have no effect anyway), using the August codebase and the most recent codebase from this branch.

I try to give a summary below. Can you please review (I suggest that you focus on CODEGEN only and on the parts that are potentially controversial). Thanks!

Summary of changes (in addition to or including minor fixes already in PRs #778 and #779).

Non-controversial? changes:

fix build error for eemumu float by adding missing cxsmpl<float> operator*
fix build warning in counters.cc after Remove smatrix counters and regenerate all processes #762
fix clang format around Ccoeff following the fix for unary minus wrong type argument to unary minus (FFV functions should not use "-COUPs" as argument) #628
add copyright and license, add comments, remove too many empty lines in python etc

More controversial changes, can you please review

reenable lines commented out by Olivier in patchMad.sh: do this only in a special 'tmad mode' called from generateAndCompare.sh, and/or move the lines directly to the latter script (this in principle should also be non controversial and should not affect the full workflow?)
go back from a ln to a cp for counters.cc and ompthreads.cc (Stephan, maybe yours was better: I just wanted to minimise differences so I got back my working version... but if you prefer and it works, I will try the ln)
always dump stdout and stderr from subprocess.popen for patchMad.sh (eventually we may remove this maybe, but now I need this to more easily check the logs for errors and avoid silent failures... and personally I prefr to have it dumped anyway as it is a tricky part still)
in cudacpp.mk, remove 'include ../../Source/make_opts' and remove '-fopenmp' (Olivier this is for you, why do you need that? for me it works without it)
add a comment that in MatrixElementKernels.cc we should add the correct ifdef for Mac
in launch_plugin.py, do not raise an Exception in reset_simd, but just print a warning (otherwise my generation fails: by the way I still use _cudacpp, not simd and not cuda)
in patch.common, reenable some makefile fragments that had been commented out by Olivier (why? Mac?)

Coming to the performance and functionality tests.

I have the latest logs in these two commits:
ee83b62
and
536ad49

The first log has 'tput' (check.exe, gcheck.exe, runTests.exe) results. This is a summary

Functionality:
** all NaN's have disappeared everywhere!
-- eemumu, ggtt, ggttg, ggttgg, ggttggg: all ok
** gqttq: some failures in runTest.exe

Performance:
-- eemumu: cuda ~20% slower (and 20% increase in registers used)
** eemumu: simd a factor 2 (inl0) to a factor 4 (inl1) slower
-- ggtt: cuda up to 10% slower
-- ggtt: simd ~10-15% faster (inl0), similar (inl1)
-- ggttg, ggttgg, ggttggg: generally similar performance, some minor speedups

Rephasing:

GOOD: no NaNs anymore! (I would not be surprised if Three floating point exceptions in CPP launch of nobm_pp_ttW (FPE in COUP values in VVV1P0_1) #733 will also disappear)
TO DO: gqttq failures
(bad? or who cares?... eemumu performance is degraded, I think by using ixxx instead of imzx)

The second log has 'tmad' tests (madevent executable: check xsec and lhe files). This is a summary

Functionality:

ggttggg: madevent crashes
gqttq: xsec differs again

Performance (Fortran overhead):
-- all very similar

Performance (MEs):
-- eemumu: fortran 20% faster, cuda slightly slower, simd a factor 2 to 3 slower
-- ggtt: fortran and simd 20% faster, cuda similar
-- ggttg: fortran 10% faster, simd and cuda similar
-- ggttgg: all very similar

+ERROR! ' ./madevent_fortran < /tmp/avalassi/input_ggttggg_x1_fortran > /tmp/avalassi/output_ggttggg_x1_fortran' failed
+d R # 5 > -0.0 -0.0 -0.0 0.4 0.4
+d R # 6 > -0.0 -0.0 -0.0 -0.0 0.4
+s min # 3> 0.0119716.0 29929.0 29929.0 0.0
+s min # 4> 0.0 0.0 29929.0 29929.0 0.0
+s min # 5> 0.0 0.0 0.0 0.0 0.0
+s min # 6> 0.0 0.0 0.0 0.0 0.0
+xqcutij # 3> 0.0 0.0 0.0 0.0 0.0
+xqcutij # 4> 0.0 0.0 0.0 0.0 0.0
+xqcutij # 5> 0.0 0.0 0.0 0.0 0.0
+xqcutij # 6> 0.0 0.0 0.0 0.0 0.0

+ERROR! xsec from fortran (0.26050333309703716) and cpp (1.2757941949814184) differ by more than 2E-14 (3.8974198518457603)

Rephrasing:

performances are not surprising
(I was expecting to see the throughput increase reported by Stephan in PR Remove smatrix counters and regenerate all processes #762 ? also because I did have the impression I saw it at some point... or are we sure that this is not again a problem on one specific machine using system calls for timing as in High CPU use in clock_gettime (TSC clocksource unavailable on itscrd03) #116 and the TSC source?)
gqttq xsec is wrong again? I would therefore reopen xsec from fortran and cpp differ in gq_ttq tmad tests (wrong order of couplings) #748... but Stefan please check (Stefan note: I am kind of sure this had been fixed by PR WIP: if needed fix the order of couplings #757, but it was broken again by PR fix running couplings ordering #761?)
I think there is a new problem with ggttggg, I will open a new bug

Voila that's all. Thanks for reviewing!

…after hard reboot including downfall mitigation) New performance baseline (move from Alma8/itscrd80 to Alma9/itscrd90 on Silver4216+V100 and include downfall mitigation) Using codebases as of commit bd255c0 (Wed Aug 16 15:05:27 2023 +0200) STARTED AT Wed Oct 25 06:25:59 PM CEST 2023 ./tput/teeThroughputX.sh -mix -hrd -makej -eemumu -ggtt -ggttg -ggttgg -gqttq -ggttggg -makeclean ENDED(1) AT Wed Oct 25 06:50:29 PM CEST 2023 [Status=0] ./tput/teeThroughputX.sh -flt -hrd -makej -eemumu -ggtt -ggttgg -inlonly -makeclean ENDED(2) AT Wed Oct 25 06:59:19 PM CEST 2023 [Status=0] ./tput/teeThroughputX.sh -makej -eemumu -ggtt -ggttg -gqttq -ggttgg -ggttggg -flt -bridge -makeclean ENDED(3) AT Wed Oct 25 07:08:28 PM CEST 2023 [Status=0] ./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -rmbhst ENDED(4) AT Wed Oct 25 07:11:30 PM CEST 2023 [Status=0] ./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -curhst ENDED(5) AT Wed Oct 25 07:14:30 PM CEST 2023 [Status=0] [avalassi@itscrd90 gcc11/usr] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp> cat /etc/redhat-release AlmaLinux release 9.2 (Turquoise Kodkod) [avalassi@itscrd90 gcc11/usr] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp> grep 'stepping\|model\|microcode' /proc/cpuinfo | sort -u microcode : 0x5003604 model : 85 model name : Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz stepping : 7

…after hard reboot including downfall mitigation) New performance baseline (move from Alma8/itscrd80 to Alma9/itscrd90 on Silver4216+V100 and include downfall mitigation) Using codebases as of commit bd255c0 (Wed Aug 16 15:05:27 2023 +0200) STARTED AT Wed Oct 25 07:17:34 PM CEST 2023 ENDED AT Wed Oct 25 11:35:50 PM CEST 2023 Status=0 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_m_inl0_hrd0.txt 0 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_d_inl0_hrd0.txt 0 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_f_inl0_hrd0.txt 0 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_m_inl0_hrd0.txt [avalassi@itscrd90 gcc11/usr] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp> cat /etc/redhat-release AlmaLinux release 9.2 (Turquoise Kodkod) [avalassi@itscrd90 gcc11/usr] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp> grep 'stepping\|model\|microcode' /proc/cpuinfo | sort -u microcode : 0x5003604 model : 85 model name : Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz stepping : 7

… but no GPU (olgpu-03 Platinum8362 el8 including downfall mitigation) NB: 512z mode is faster than 512y on this Platinum node (2 FMA units) STARTED AT Wed Oct 25 18:03:19 CEST 2023 ./tput/teeThroughputX.sh -mix -hrd -makej -eemumu -ggtt -ggttg -ggttgg -gqttq -ggttggg -makeclean ENDED(1) AT Wed Oct 25 18:26:16 CEST 2023 [Status=0] ./tput/teeThroughputX.sh -flt -hrd -makej -eemumu -ggtt -ggttgg -inlonly -makeclean ENDED(2) AT Wed Oct 25 18:36:50 CEST 2023 [Status=0] ./tput/teeThroughputX.sh -makej -eemumu -ggtt -ggttg -gqttq -ggttgg -ggttggg -flt -bridge -makeclean ENDED(3) AT Wed Oct 25 18:45:05 CEST 2023 [Status=0] ./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -rmbhst ENDED(4) AT Wed Oct 25 18:46:35 CEST 2023 [Status=0] ./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -curhst ENDED(5) AT Wed Oct 25 18:46:58 CEST 2023 [Status=0] [avalassi@olgpu-03 gcc11.2/cvmfs] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp> cat /etc/redhat-release Red Hat Enterprise Linux release 8.8 (Ootpa) [avalassi@olgpu-03 gcc11.2/cvmfs] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp> grep 'stepping\|model\|microcode' /proc/cpuinfo | sort -u microcode : 0xd0003a5 model : 106 model name : Intel(R) Xeon(R) Platinum 8362 CPU @ 2.80GHz stepping : 6

… but no GPU (olgpu-03 Platinum8362 el8 including downfall mitigation) NB: 512z mode is faster than 512y on this Platinum node (2 FMA units) STARTED AT Wed Oct 25 18:48:27 CEST 2023 ENDED AT Wed Oct 25 22:37:43 CEST 2023 Status=0 20 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_d_inl0_hrd0.txt 20 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0.txt 20 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_m_inl0_hrd0.txt 20 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_d_inl0_hrd0.txt 20 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_f_inl0_hrd0.txt 20 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_m_inl0_hrd0.txt 20 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_d_inl0_hrd0.txt 20 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0.txt 20 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_m_inl0_hrd0.txt 20 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_d_inl0_hrd0.txt 20 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0.txt 20 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_m_inl0_hrd0.txt 20 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_d_inl0_hrd0.txt 20 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0.txt 20 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_m_inl0_hrd0.txt 0 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_d_inl0_hrd0.txt 0 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_f_inl0_hrd0.txt 0 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_m_inl0_hrd0.txt [avalassi@olgpu-03 gcc11.2/cvmfs] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp> cat /etc/redhat-release Red Hat Enterprise Linux release 8.8 (Ootpa) [avalassi@olgpu-03 gcc11.2/cvmfs] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp> grep 'stepping\|model\|microco$ microcode : 0xd0003a5 model : 106 model name : Intel(R) Xeon(R) Platinum 8362 CPU @ 2.80GHz stepping : 6

…MPORARY TESTS ON PLATINUM) Revert "[oct23av] (TEMPORARY TESTS ON PLATINUM) rerun 18 tmad alltees, all ok but no GPU (olgpu-03 Platinum8362 el8 including downfall mitigation)" This reverts commit 93f2978. Revert "[oct23av] (TEMPORARY TESTS ON PLATINUM) rerun 78 tput alltees, all ok but no GPU (olgpu-03 Platinum8362 el8 including downfall mitigation)" This reverts commit ed1cd75.

No real code differences - only a few irrelevant changes due to the new code generating environment on itscrd90 Using codebases as of commit bd255c0 (Wed Aug 16 15:05:27 2023 +0200)

…- changes only in codegen logs Using codebases as of commit bd255c0 (Wed Aug 16 15:05:27 2023 +0200)

…nly changes are in codegen logs Codebase includes merging commit a6731bd (Olivier Wed Aug 23 13:23:12 2023 +0200) This uses Olivier's 'fix_mirror' branch for PR madgraph5#754

…matting, and especially the build will fail. Codebase includes merging commit a6731bd (Olivier Wed Aug 23 13:23:12 2023 +0200) This uses Olivier's 'fix_mirror' branch for PR madgraph5#754 In particular a6731bd Olivier Mattelaer Wed Aug 23 13:23:12 2023 +0200 Merge branch 'fix_mirror' 2556cdd Olivier Mattelaer Wed Aug 23 09:27:38 2023 +0200 avoid that mirroring is reset by the plugin These lines fail the build (as well as clang formatting) [NOT OK] Check formatting in: pp_tt012j.mad/SubProcesses/P0_uux_ttx/CPPProcess.cc 786c786 < constexpr int helcolDenominators[1] = { 36,36 }; // assume nprocesses == 1 (madgraph5#272 and madgraph5#343) --- > constexpr int helcolDenominators[1] = { 36, 36 }; // assume nprocesses == 1 (madgraph5#272 and madgraph5#343) The same happens in each P subdirectory. Build errors: ccache /usr/local/cuda-12.0/bin/nvcc -Xcompiler -O3 -lineinfo -I. -I../../src -I/usr/local/cuda-12.0/include/ -DUSE_NVTX -gencode arch=compute_70,code=compute_70 -gencode arch=compute_70,code=sm_70 -use_fast_math -std=c++17 -ccbin /usr/lib64/ccache/g++ -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -Xcompiler -fPIC -c gCPPProcess.cu -o gCPPProcess.o gCPPProcess.cu(779): error: static assertion failed with "Assume nprocesses == 1" gCPPProcess.cu(786): error: too many initializer values 2 errors detected in the compilation of "gCPPProcess.cu".

…, previous issues are now solved Codebase includes cherry-picking this commit (providing a fix for PR madgraph5#754) 5b22a92 Olivier Mattelaer Thu Aug 31 15:11:54 2023 +0200 fixing issue of stefan with nprocesses>1 Code changes are epochX/cudacpp/pp_tt012j.mad/SubProcesses/P0_gg_ttx/CPPProcess.cc < static_assert( nprocesses == 1, "Assume nprocesses == 1" ); < // process_id corresponds to the index of DSIG1 Fortran functions (must be 1 because cudacpp is unable to handle DSIG2) --- > static_assert( nprocesses == 1 || nprocesses == 2, "Assume nprocesses == 1 or 2" );

…rocesses (see PR madgraph5#754)

…ges (except for static assert allowing nprocesses==2) This completes the checks for PR madgraph5#754, all looks good

…ter merging Stefan's fix for coupling order in PR madgraph5#757 - code changes only in gq_ttq The new codebase includes in particular also 6bf4a65 oliviermattelaer Wed Aug 30 14:57:19 2023 +0200 Merge pull request madgraph5#754 from madgraph5/fix_mirror 0aba0c6 oliviermattelaer Wed Aug 30 14:53:17 2023 +0200 Merge pull request madgraph5#757 from roiser/fixcouplingordering 4f3cdd8 Stefan Roiser Tue Aug 29 17:43:25 2023 +0200 in PLUGIN_UFOModelConverter overwrite teh prepare_couplings function the additional functionality will re-order the dictionary keys such that they follow the order of the couplings in parameter wanted_couplings wanted_couplings contains the correct ordering as it is discovered in the OneProcessExporter class NB: generated code is identical in all processes except for gq_ttq NB: ee_mumu.mad code generation (and similary ee_mumu.sa) fails with Command "import /data/avalassi/GPU2023/madgraph4gpuX/MG5aMC/TMPOUT/CODEGEN_mad_ee_mumu.mg" interrupted in sub-command: "output madevent ../TMPOUT/CODEGEN_mad_ee_mumu --hel_recycling=False --vector_size=16384 --me_exporter=standalone_cudacpp" with error: KeyError : 'GC_3'

…qttq xsec is now correct! fixes high-priority issue madgraph5#748 These three tests now succeed (they used to fail) ./tmad/teeMadX.sh +10x -gqttq -makeclean ./tmad/teeMadX.sh +10x -gqttq -makeclean -fltonly ./tmad/teeMadX.sh +10x -gqttq -makeclean -mixonly NB: eemumu code generation remains to be fixed after PR madgraph5#757

…he ref needs updating after fixing the coupling order in PR madgraph5#757) ./tput/teeThroughputX.sh -mix -hrd -makej -gqttq -makeclean ./tput/teeThroughputX.sh -makej -gqttq -flt -bridge -makeclean

…coupling order in PR madgraph5#757 cp dump_SIGMA_SM_GUX_TTXUX_CPU_MadgraphTest.CompareMomentaAndME_0.txt ../../../CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/test/ref/

…unTest.exe

…ref file for runTest ./tput/teeThroughputX.sh -mix -hrd -makej -gqttq -makeclean ./tput/teeThroughputX.sh -makej -gqttq -flt -bridge -makeclean

This adds only one commit d644235 (Olivier, Wed Aug 30 15:14:09 2023 +0200) stupid fix for a symlink/git issue This removes some Makefile symlinks in kokkos, it does not touch cudacpp

This adds only one commit 9fc9873 (Olivier, Thu Aug 31 11:56:40 2023 +0200) fix launch issue for cudacpp, run_card, some cleaning and compilation for mac This includes several changes - updates in Submodule MG5aMC/mg5amcnlo cb4771efc..8bfd1fde4: - several changes in CODEGEN (affecting SDE in runcard, makefiles for Mac) Note: code generation breaks clang format, and patching fails some hunks. One build seemed ok, I did not try runtime tests.

…cherry-picking Olivier's 9fc9873

…eep only the mg5amcnlo update) git checkout ee9cee387c0aa4117080ec312e3f37d32c670370 CODEGEN/

…- no significant changes in code (note that my new gqttq ref also affected pptt012j)

…g merges will create conflicts Revert "[oct23av] TEMPORARELY UNDO Olivier's changes to CODEGEN in 9fc9873 (keep only the mg5amcnlo update)" This reverts commit d883478.

This is a noop. It includes only one commit that I already cherry-picked before. (Olivier Thu Aug 31 15:11:54 2023 +0200) fixing issue of stefan with nprocesses>1

This adds only one commit 3fbf7b1 (Olivier Thu Aug 31 23:13:25 2023 +0200) fix symmetry factor + missing file

oliviermattelaer · 2023-11-02T18:32:26Z

I fix the last issue with the pointer. (it was brew that messed up with something, so I have removed fully brew from my system).

But I confirm that I do need to have make_opts to have it compiling on mac.
(More precisely, I need to have gfortran setup as FC which is one of the things done in make_opts).

So one hacky solution to have this PR pushed, is to hardcode FC=gfortran within your makefile (but this is obviously not sustainable). I have pushed that. and will open an issue to track andrea issue with make_opts

oliviermattelaer · 2023-11-02T18:38:20Z

So the make_opts issue is tracked here: #787

…rce exists (e.g it is missing in .sa directories) madgraph5#787

valassi · 2023-11-03T08:48:08Z

Ok I have the impression that we converged :-)
Thanks @oliviermattelaer!

There is one pending issue in #790 about reset_simd, which is not understood. I assign this to you.

About make_opts, I fixed #787.
Just one point on Mac and gfortran: GNU make assumes FC=f77 if FC is not defined. So indeed in my Linux environment I explicitly export FC=gfortran. But I think this is the correct way
https://www.gnu.org/software/make/manual/html_node/Implicit-Variables.html

As discussed, I then consider this as ready to go.
(@roiser and @hageboeck, last chance for comments!).
I will regenerate all processes and launch all tests first.
So I will merge this evening I think.

roiser

NB: I'm have not looked at makefile changes as those will probably be changed soon again

…anall' now fails because make_opts is included madgraph5#787 (*NB OpenMP is now disabled by default!*) This boils down to make -f cudacpp.mk cleanall OMPFLAGS= AVX=512y FPTYPE=d HELINL=0 HRDCOD=0 RNDGEN=hasCurand Building in BUILDDIR=. for tag=512y_d_inl0_hrd0_hasCurand (USEBUILDDIR is not set) make USEBUILDDIR=0 clean -f ../../Source/make_opts make[1]: Entering directory '/data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx' make[1]: *** No rule to make target 'clean'. Stop. make[1]: Leaving directory '/data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx'

…w including make_opts causes issues here: the CUDACPP_MAKEFILE variable is corrupted madgraph5#787 make -f cudacpp.mk cleanall CUDACPP_MAKEFILE='cudacpp.mk' CUDACPP_MAKEFILE='../../Source/make_opts' cudacpp.mk:42: *** exit. Stop.

…ue with override, this is irrelevant! madgraph5#787 make -f cudacpp.mk cleanall CUDACPP_MAKEFILE='cudacpp.mk' CUDACPP_MAKEFILE='../../Source/make_opts' cudacpp.mk:42: *** exit. Stop.

…ly once at the beginning and avoid corruption when make_opts is included madgraph5#787 make -f cudacpp.mk cleanall CUDACPP_MAKEFILE='cudacpp.mk' CUDACPP_MAKEFILE='cudacpp.mk' cudacpp.mk:43: *** exit. Stop. However, this is still sensitive to variables coming from outside make -f cudacpp.mk cleanall CUDACPP_MAKEFILE=pippo CUDACPP_MAKEFILE='pippo' CUDACPP_MAKEFILE='pippo' cudacpp.mk:43: *** exit. Stop.

…re that CUDACPP_MAKEFILE is an internal variable that cannot be set externally madgraph5#787 make -f cudacpp.mk cleanall CUDACPP_MAKEFILE='cudacpp.mk' CUDACPP_MAKEFILE='cudacpp.mk' cudacpp.mk:43: *** exit. Stop. This is now insensitive to variables coming from outside make -f cudacpp.mk cleanall CUDACPP_MAKEFILE=pippo CUDACPP_MAKEFILE='cudacpp.mk' CUDACPP_MAKEFILE='cudacpp.mk' cudacpp.mk:43: *** exit. Stop.

…nall' now succeeds after including make_opts madgraph5#787 NB: I also checked that 'make cleanall; make' gives exactly the same output in my environment whether make_opts is included or not

…from ggtt.mad madgraph5#787

…files for Mac and for make_opts

…ble - usual failures in ggttg f/m and gqttq f (madgraph5#783), no change in performance (*NB OpenMP is now disabled by default!*) STARTED AT Fri Nov 3 10:06:44 AM CET 2023 ./tput/teeThroughputX.sh -mix -hrd -makej -eemumu -ggtt -ggttg -ggttgg -gqttq -ggttggg -makeclean ENDED(1) AT Fri Nov 3 01:30:11 PM CET 2023 [Status=2] ./tput/teeThroughputX.sh -flt -hrd -makej -eemumu -ggtt -ggttgg -inlonly -makeclean ENDED(2) AT Fri Nov 3 01:55:47 PM CET 2023 [Status=0] ./tput/teeThroughputX.sh -makej -eemumu -ggtt -ggttg -gqttq -ggttgg -ggttggg -flt -bridge -makeclean ENDED(3) AT Fri Nov 3 02:05:25 PM CET 2023 [Status=2] ./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -rmbhst ENDED(4) AT Fri Nov 3 02:08:40 PM CET 2023 [Status=0] ./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -curhst ENDED(5) AT Fri Nov 3 02:11:53 PM CET 2023 [Status=0]

…abled), no change in functionality or performance (*NB OpenMP is now disabled by default!*) STARTED AT Fri Nov 3 02:15:11 PM CET 2023 ENDED AT Fri Nov 3 02:38:38 PM CET 2023 Status=0 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_m_inl0_hrd0.txt 0 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_d_inl0_hrd0.txt 0 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_f_inl0_hrd0.txt 0 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_m_inl0_hrd0.txt

valassi · 2023-11-03T16:34:40Z

epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/madgraph/iolibs/template_files/gpu/cudacpp.mk

@@ -233,6 +233,8 @@ override OMPFLAGS = -fopenmp
 else ifneq ($(shell $(CXX) --version | egrep '^(Apple clang)'),)
 override OMPFLAGS = # AV disable OpenMP MT on Apple clang (builds fail in the CI #578)
 ###override OMPFLAGS = -fopenmp # OM reenable OpenMP MT on Apple clang? (AV Oct 2023: this still fails in the CI)
+else ifneq ($(UNAME), 'Darwin')


@oliviermattelaer I just realised there are two bugs here

It should be ifeq not ifneq

It should be UNAME_S not UNAME
That is it should read else ifeq ($(UNAME_S), 'Darwin')
The way it is done now disables OpenMP on Mac AND ALSO on Linux (except icpx and clang)
I will fix this

Ouf. There were three errors (I noticed because the CI for b647330 failed). No quotes needed. It should read else ifeq ($(UNAME_S),Darwin). Fixing this again

… the intent was to disable OpenMP on Mac, but it disabled it also on Linux, now fixed

…files to disable OpenMP on Mac but not on Linux

…c - testing this on Olivier's desktop Mac, looks ok now

…om ggttmad

…efiles to disable OpenMP on Mac but not on Linux

…ble - usual failures in ggttg f/m and gqttq f (madgraph5#783), no change in performance (*NB OpenMP is now again enabled by default*) (or maybe ~1-2% slower on average? anyway, keep OpenMP on as in the past)

… also tput with FPEs enabled), usual ggttggg failures, no change in functionality or performance (*NB OpenMP is now again enabled by default*) (or maybe ~1-2% slower on average? anyway, keep OpenMP on as in the past)

valassi · 2023-11-03T20:09:58Z

All tests in the CI including Mac are now good. Reviews addressesd. I am finally self merging.

valassi and others added 30 commits October 26, 2023 11:33

[oct23av] regenerate 8 processes mad (move from itscrd80 to itscrd90)

b20257a

No real code differences - only a few irrelevant changes due to the new code generating environment on itscrd90 Using codebases as of commit bd255c0 (Wed Aug 16 15:05:27 2023 +0200)

[oct23av] regenerate 7 processes sa (move from itscrd80 to itscrd90) …

ae5498a

…- changes only in codegen logs Using codebases as of commit bd255c0 (Wed Aug 16 15:05:27 2023 +0200)

Merge commit 'a6731bd9ead2f378b5e1a61624a55f5753a1139b' into oct23av

50ad9b7

[oct23av] regenerate 7 processes mad (all but pp012j) and all 7 sa, o…

2a67667

…nly changes are in codegen logs Codebase includes merging commit a6731bd (Olivier Wed Aug 23 13:23:12 2023 +0200) This uses Olivier's 'fix_mirror' branch for PR madgraph5#754

fixing issue of stefan with nprocesses>1

8641b29

[oct23av] in CODEGEN, improve comments to generated code for mirror p…

ec4c8b5

…rocesses (see PR madgraph5#754)

[oct23av] regenerate pp_tt012j.mad with my own changes to code comments

6e6dcbf

[oct23av] regenerate all other 7 processes mad and 7 sa, no code chan…

45ebd42

…ges (except for static assert allowing nprocesses==2) This completes the checks for PR madgraph5#754, all looks good

Merge commit '6bf4a658edbd9b3f3f0d3969ba38832e6e002d63' into oct23av

4597ecf

[oct23av] rerun 8 tput tests for gqttq - now runTest fails (I guess t…

34cf1c7

…he ref needs updating after fixing the coupling order in PR madgraph5#757) ./tput/teeThroughputX.sh -mix -hrd -makej -gqttq -makeclean ./tput/teeThroughputX.sh -makej -gqttq -flt -bridge -makeclean

[oct23av] in CODEGEN, update gqttq ref file for runTest after fixing …

7e57228

…coupling order in PR madgraph5#757 cp dump_SIGMA_SM_GUX_TTXUX_CPU_MadgraphTest.CompareMomentaAndME_0.txt ../../../CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/test/ref/

[oct23av] regenerate gqttq sa and mad with the correct ref file for r…

4eaa091

…unTest.exe

[oct23av] rerun 8 tput tests for gqttq - now all ok after fixing the …

77cd619

…ref file for runTest ./tput/teeThroughputX.sh -mix -hrd -makej -gqttq -makeclean ./tput/teeThroughputX.sh -makej -gqttq -flt -bridge -makeclean

Merge commit 'd64423586' into oct23av

b9d336e

This adds only one commit d644235 (Olivier, Wed Aug 30 15:14:09 2023 +0200) stupid fix for a symlink/git issue This removes some Makefile symlinks in kokkos, it does not touch cudacpp

[oct23av] in CODEGEN MatrixElementKernels.cc, fix clang-format after …

c01ca6f

…cherry-picking Olivier's 9fc9873

[oct23av] TEMPORARELY UNDO Olivier's changes to CODEGEN in 9fc9873 (k…

d883478

…eep only the mg5amcnlo update) git checkout ee9cee387c0aa4117080ec312e3f37d32c670370 CODEGEN/

[oct23av] regenerate 7 processes mad and 6 sa (all except 2x eemumu) …

88f45f2

…- no significant changes in code (note that my new gqttq ref also affected pptt012j)

[oct23av] reapply previous patches by Olivier, otherwise the followin…

6d449e9

…g merges will create conflicts Revert "[oct23av] TEMPORARELY UNDO Olivier's changes to CODEGEN in 9fc9873 (keep only the mg5amcnlo update)" This reverts commit d883478.

Merge commit '5b22a9201' into oct23av

cdd0251

This is a noop. It includes only one commit that I already cherry-picked before. (Olivier Thu Aug 31 15:11:54 2023 +0200) fixing issue of stefan with nprocesses>1

Merge commit '3fbf7b10c' into oct23av

6fa765d

This adds only one commit 3fbf7b1 (Olivier Thu Aug 31 23:13:25 2023 +0200) fix symmetry factor + missing file

[oct23av] in CODEGEN cudacpp.mk, only include Source/make_opts if Sou…

d69bd0b

…rce exists (e.g it is missing in .sa directories) madgraph5#787

valassi force-pushed the oct23av branch from 2702ea1 to c65bba0 Compare November 3, 2023 07:23

valassi linked an issue Nov 3, 2023 that may be closed by this pull request

cudacpp.mk should include make_opts #787

Closed

oliviermattelaer mentioned this pull request Nov 3, 2023

allow to choose fptype via the run_card #788

Merged

valassi mentioned this pull request Nov 3, 2023

path is not defined in CPPRunCard when treatcards is called outside launch #790

Closed

roiser approved these changes Nov 3, 2023

View reviewed changes

valassi added 10 commits November 3, 2023 17:18

[oct23av] in ggtt.mad cudacpp.mk, try to fix the CUDACPP_MAKEFILE iss…

c92ec8e

…ue with override, this is irrelevant! madgraph5#787 make -f cudacpp.mk cleanall CUDACPP_MAKEFILE='cudacpp.mk' CUDACPP_MAKEFILE='../../Source/make_opts' cudacpp.mk:42: *** exit. Stop.

[oct23av] in ggtt.mad cudacpp.mk, remove debug printouts - 'make clea…

418c8b1

…nall' now succeeds after including make_opts madgraph5#787 NB: I also checked that 'make cleanall; make' gives exactly the same output in my environment whether make_opts is included or not

[oct23av] in CODEGEN, backport the fixes in cudacpp.mk for make_opts …

b13ae49

…from ggtt.mad madgraph5#787

[oct23av] regenerate all processes including the last changes to make…

6b39fcb

…files for Mac and for make_opts

valassi commented Nov 3, 2023

View reviewed changes

valassi added 2 commits November 3, 2023 17:43

[oct23av] in CODEGEN, fix BUG in Olivier's 4e1dccb for OpenMP on Mac:…

c492e2c

… the intent was to disable OpenMP on Mac, but it disabled it also on Linux, now fixed

[oct23av] regenerate all processes including the last changes to make…

b647330

…files to disable OpenMP on Mac but not on Linux

valassi force-pushed the oct23av branch from 8aa51bf to b647330 Compare November 3, 2023 16:47

valassi added 5 commits November 3, 2023 18:47

[oct23av] in ggttmad cudacpp.mk, fix one third error for openmp on Ma…

5418ed5

…c - testing this on Olivier's desktop Mac, looks ok now

[oct23av] in CODEGEN, backport the third bug fix in openmp for mac fr…

1b3ea52

…om ggttmad

[oct23av] regenerate all processes including the third bug fix in mak…

08a7a7d

…efiles to disable OpenMP on Mac but not on Linux

[oct23av] rerun 78 tput tests, with FPEs enabled in the check executa…

4351daa

…ble - usual failures in ggttg f/m and gqttq f (madgraph5#783), no change in performance (*NB OpenMP is now again enabled by default*) (or maybe ~1-2% slower on average? anyway, keep OpenMP on as in the past)

valassi merged commit 63ddd4d into madgraph5:master Nov 3, 2023
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Patches and test results over the latest PRs #780

Patches and test results over the latest PRs #780

valassi commented Oct 28, 2023

oliviermattelaer commented Nov 2, 2023

oliviermattelaer commented Nov 2, 2023

valassi commented Nov 3, 2023

roiser left a comment

valassi Nov 3, 2023

valassi Nov 3, 2023 •

edited

Loading

valassi commented Nov 3, 2023

Patches and test results over the latest PRs #780

Patches and test results over the latest PRs #780

Conversation

valassi commented Oct 28, 2023

oliviermattelaer commented Nov 2, 2023

oliviermattelaer commented Nov 2, 2023

valassi commented Nov 3, 2023

roiser left a comment

Choose a reason for hiding this comment

valassi Nov 3, 2023

Choose a reason for hiding this comment

valassi Nov 3, 2023 • edited Loading

Choose a reason for hiding this comment

valassi commented Nov 3, 2023

valassi Nov 3, 2023 •

edited

Loading