Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patches and test results over the latest PRs #780

Merged
merged 137 commits into from
Nov 3, 2023
Merged

Conversation

valassi
Copy link
Member

@valassi valassi commented Oct 28, 2023

Hi Olivier, Stefan, Stephan, this is a comprehensive patch including several changes over the latest PRs. This is based on the current master including all PRs in the last two months, with some fixes that should be non controversial (clang format, build warnings, build errors etc), and other changes that instead need closer review (e.g. they may break Olivier's Mac).

It also includes new test results for 'my standard processes', using a new Alma9 machine (including downfall mitigation, which seems to have no effect anyway), using the August codebase and the most recent codebase from this branch.

I try to give a summary below. Can you please review (I suggest that you focus on CODEGEN only and on the parts that are potentially controversial). Thanks!


Summary of changes (in addition to or including minor fixes already in PRs #778 and #779).

Non-controversial? changes:

More controversial changes, can you please review

  • reenable lines commented out by Olivier in patchMad.sh: do this only in a special 'tmad mode' called from generateAndCompare.sh, and/or move the lines directly to the latter script (this in principle should also be non controversial and should not affect the full workflow?)
  • go back from a ln to a cp for counters.cc and ompthreads.cc (Stephan, maybe yours was better: I just wanted to minimise differences so I got back my working version... but if you prefer and it works, I will try the ln)
  • always dump stdout and stderr from subprocess.popen for patchMad.sh (eventually we may remove this maybe, but now I need this to more easily check the logs for errors and avoid silent failures... and personally I prefr to have it dumped anyway as it is a tricky part still)
  • in cudacpp.mk, remove 'include ../../Source/make_opts' and remove '-fopenmp' (Olivier this is for you, why do you need that? for me it works without it)
  • add a comment that in MatrixElementKernels.cc we should add the correct ifdef for Mac
  • in launch_plugin.py, do not raise an Exception in reset_simd, but just print a warning (otherwise my generation fails: by the way I still use _cudacpp, not simd and not cuda)
  • in patch.common, reenable some makefile fragments that had been commented out by Olivier (why? Mac?)

Coming to the performance and functionality tests.

I have the latest logs in these two commits:
ee83b62
and
536ad49

  1. The first log has 'tput' (check.exe, gcheck.exe, runTests.exe) results. This is a summary

Functionality:
** all NaN's have disappeared everywhere!
-- eemumu, ggtt, ggttg, ggttgg, ggttggg: all ok
** gqttq: some failures in runTest.exe

Performance:
-- eemumu: cuda ~20% slower (and 20% increase in registers used)
** eemumu: simd a factor 2 (inl0) to a factor 4 (inl1) slower
-- ggtt: cuda up to 10% slower
-- ggtt: simd ~10-15% faster (inl0), similar (inl1)
-- ggttg, ggttgg, ggttggg: generally similar performance, some minor speedups

Rephasing:

  1. The second log has 'tmad' tests (madevent executable: check xsec and lhe files). This is a summary

Functionality:

  • ggttggg: madevent crashes
  • gqttq: xsec differs again

Performance (Fortran overhead):
-- all very similar

Performance (MEs):
-- eemumu: fortran 20% faster, cuda slightly slower, simd a factor 2 to 3 slower
-- ggtt: fortran and simd 20% faster, cuda similar
-- ggttg: fortran 10% faster, simd and cuda similar
-- ggttgg: all very similar

+ERROR! ' ./madevent_fortran < /tmp/avalassi/input_ggttggg_x1_fortran > /tmp/avalassi/output_ggttggg_x1_fortran' failed
+d R # 5 > -0.0 -0.0 -0.0 0.4 0.4
+d R # 6 > -0.0 -0.0 -0.0 -0.0 0.4
+s min # 3> 0.0119716.0 29929.0 29929.0 0.0
+s min # 4> 0.0 0.0 29929.0 29929.0 0.0
+s min # 5> 0.0 0.0 0.0 0.0 0.0
+s min # 6> 0.0 0.0 0.0 0.0 0.0
+xqcutij # 3> 0.0 0.0 0.0 0.0 0.0
+xqcutij # 4> 0.0 0.0 0.0 0.0 0.0
+xqcutij # 5> 0.0 0.0 0.0 0.0 0.0
+xqcutij # 6> 0.0 0.0 0.0 0.0 0.0

+ERROR! xsec from fortran (0.26050333309703716) and cpp (1.2757941949814184) differ by more than 2E-14 (3.8974198518457603)

Rephrasing:


Voila that's all. Thanks for reviewing!

valassi and others added 30 commits October 26, 2023 11:33
…after hard reboot including downfall mitigation)

New performance baseline (move from Alma8/itscrd80 to Alma9/itscrd90 on Silver4216+V100 and include downfall mitigation)
Using codebases as of commit bd255c0 (Wed Aug 16 15:05:27 2023 +0200)

STARTED  AT Wed Oct 25 06:25:59 PM CEST 2023
./tput/teeThroughputX.sh -mix -hrd -makej -eemumu -ggtt -ggttg -ggttgg -gqttq -ggttggg -makeclean
ENDED(1) AT Wed Oct 25 06:50:29 PM CEST 2023 [Status=0]
./tput/teeThroughputX.sh -flt -hrd -makej -eemumu -ggtt -ggttgg -inlonly -makeclean
ENDED(2) AT Wed Oct 25 06:59:19 PM CEST 2023 [Status=0]
./tput/teeThroughputX.sh -makej -eemumu -ggtt -ggttg -gqttq -ggttgg -ggttggg -flt -bridge -makeclean
ENDED(3) AT Wed Oct 25 07:08:28 PM CEST 2023 [Status=0]
./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -rmbhst
ENDED(4) AT Wed Oct 25 07:11:30 PM CEST 2023 [Status=0]
./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -curhst
ENDED(5) AT Wed Oct 25 07:14:30 PM CEST 2023 [Status=0]

[avalassi@itscrd90 gcc11/usr] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp> cat /etc/redhat-release
AlmaLinux release 9.2 (Turquoise Kodkod)

[avalassi@itscrd90 gcc11/usr] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp> grep 'stepping\|model\|microcode' /proc/cpuinfo | sort -u
microcode       : 0x5003604
model           : 85
model name      : Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz
stepping        : 7
…after hard reboot including downfall mitigation)

New performance baseline (move from Alma8/itscrd80 to Alma9/itscrd90 on Silver4216+V100 and include downfall mitigation)
Using codebases as of commit bd255c0 (Wed Aug 16 15:05:27 2023 +0200)

STARTED AT Wed Oct 25 07:17:34 PM CEST 2023
ENDED   AT Wed Oct 25 11:35:50 PM CEST 2023

Status=0

24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_m_inl0_hrd0.txt
0 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_d_inl0_hrd0.txt
0 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_f_inl0_hrd0.txt
0 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_m_inl0_hrd0.txt

[avalassi@itscrd90 gcc11/usr] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp> cat /etc/redhat-release
AlmaLinux release 9.2 (Turquoise Kodkod)

[avalassi@itscrd90 gcc11/usr] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp> grep 'stepping\|model\|microcode' /proc/cpuinfo | sort -u
microcode       : 0x5003604
model           : 85
model name      : Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz
stepping        : 7
… but no GPU (olgpu-03 Platinum8362 el8 including downfall mitigation)

NB: 512z mode is faster than 512y on this Platinum node (2 FMA units)

STARTED  AT Wed Oct 25 18:03:19 CEST 2023
./tput/teeThroughputX.sh -mix -hrd -makej -eemumu -ggtt -ggttg -ggttgg -gqttq -ggttggg -makeclean
ENDED(1) AT Wed Oct 25 18:26:16 CEST 2023 [Status=0]
./tput/teeThroughputX.sh -flt -hrd -makej -eemumu -ggtt -ggttgg -inlonly -makeclean
ENDED(2) AT Wed Oct 25 18:36:50 CEST 2023 [Status=0]
./tput/teeThroughputX.sh -makej -eemumu -ggtt -ggttg -gqttq -ggttgg -ggttggg -flt -bridge -makeclean
ENDED(3) AT Wed Oct 25 18:45:05 CEST 2023 [Status=0]
./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -rmbhst
ENDED(4) AT Wed Oct 25 18:46:35 CEST 2023 [Status=0]
./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -curhst
ENDED(5) AT Wed Oct 25 18:46:58 CEST 2023 [Status=0]

[avalassi@olgpu-03 gcc11.2/cvmfs] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp> cat /etc/redhat-release
Red Hat Enterprise Linux release 8.8 (Ootpa)

[avalassi@olgpu-03 gcc11.2/cvmfs] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp> grep 'stepping\|model\|microcode' /proc/cpuinfo | sort -u
microcode       : 0xd0003a5
model           : 106
model name      : Intel(R) Xeon(R) Platinum 8362 CPU @ 2.80GHz
stepping        : 6
… but no GPU (olgpu-03 Platinum8362 el8 including downfall mitigation)

NB: 512z mode is faster than 512y on this Platinum node (2 FMA units)

STARTED AT Wed Oct 25 18:48:27 CEST 2023
ENDED   AT Wed Oct 25 22:37:43 CEST 2023

Status=0

20 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_d_inl0_hrd0.txt
20 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0.txt
20 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_m_inl0_hrd0.txt
20 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_d_inl0_hrd0.txt
20 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_f_inl0_hrd0.txt
20 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_m_inl0_hrd0.txt
20 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_d_inl0_hrd0.txt
20 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0.txt
20 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_m_inl0_hrd0.txt
20 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_d_inl0_hrd0.txt
20 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0.txt
20 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_m_inl0_hrd0.txt
20 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_d_inl0_hrd0.txt
20 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0.txt
20 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_m_inl0_hrd0.txt
0 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_d_inl0_hrd0.txt
0 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_f_inl0_hrd0.txt
0 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_m_inl0_hrd0.txt

[avalassi@olgpu-03 gcc11.2/cvmfs] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp> cat /etc/redhat-release
Red Hat Enterprise Linux release 8.8 (Ootpa)

[avalassi@olgpu-03 gcc11.2/cvmfs] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp> grep 'stepping\|model\|microco$
microcode       : 0xd0003a5
model           : 106
model name      : Intel(R) Xeon(R) Platinum 8362 CPU @ 2.80GHz
stepping        : 6
…MPORARY TESTS ON PLATINUM)

Revert "[oct23av] (TEMPORARY TESTS ON PLATINUM) rerun 18 tmad alltees, all ok but no GPU (olgpu-03 Platinum8362 el8 including downfall mitigation)"
This reverts commit 93f2978.

Revert "[oct23av] (TEMPORARY TESTS ON PLATINUM) rerun 78 tput alltees, all ok but no GPU (olgpu-03 Platinum8362 el8 including downfall mitigation)"
This reverts commit ed1cd75.
No real code differences - only a few irrelevant changes due to the new code generating environment on itscrd90

Using codebases as of commit bd255c0 (Wed Aug 16 15:05:27 2023 +0200)
…- changes only in codegen logs

Using codebases as of commit bd255c0 (Wed Aug 16 15:05:27 2023 +0200)
…nly changes are in codegen logs

Codebase includes merging commit a6731bd (Olivier Wed Aug 23 13:23:12 2023 +0200)
This uses Olivier's 'fix_mirror' branch for PR madgraph5#754
…matting, and especially the build will fail.

Codebase includes merging commit a6731bd (Olivier Wed Aug 23 13:23:12 2023 +0200)
This uses Olivier's 'fix_mirror' branch for PR madgraph5#754

In particular
  a6731bd       Olivier Mattelaer       Wed Aug 23 13:23:12 2023 +0200  Merge branch 'fix_mirror'
  2556cdd       Olivier Mattelaer       Wed Aug 23 09:27:38 2023 +0200  avoid that mirroring is reset by the plugin

These lines fail the build (as well as clang formatting)
[NOT OK] Check formatting in: pp_tt012j.mad/SubProcesses/P0_uux_ttx/CPPProcess.cc
786c786
<     constexpr int helcolDenominators[1] = { 36,36 }; // assume nprocesses == 1 (madgraph5#272 and madgraph5#343)
---
>     constexpr int helcolDenominators[1] = { 36, 36 }; // assume nprocesses == 1 (madgraph5#272 and madgraph5#343)
The same happens in each P subdirectory.

Build errors:
  ccache /usr/local/cuda-12.0/bin/nvcc   -Xcompiler -O3 -lineinfo -I. -I../../src -I/usr/local/cuda-12.0/include/ -DUSE_NVTX -gencode arch=compute_70,code=compute_70 -gencode arch=compute_70,code=sm_70 -use_fast_math -std=c++17  -ccbin /usr/lib64/ccache/g++ -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_FPTYPE2_DOUBLE -Xcompiler -fPIC -c gCPPProcess.cu -o gCPPProcess.o
  gCPPProcess.cu(779): error: static assertion failed with "Assume nprocesses == 1"
  gCPPProcess.cu(786): error: too many initializer values
  2 errors detected in the compilation of "gCPPProcess.cu".
…, previous issues are now solved

Codebase includes cherry-picking this commit (providing a fix for PR madgraph5#754)
  5b22a92       Olivier Mattelaer       Thu Aug 31 15:11:54 2023 +0200  fixing issue of stefan with nprocesses>1

Code changes are
  epochX/cudacpp/pp_tt012j.mad/SubProcesses/P0_gg_ttx/CPPProcess.cc
  <       static_assert( nprocesses == 1, "Assume nprocesses == 1" );
  <       // process_id corresponds to the index of DSIG1 Fortran functions (must be 1 because cudacpp is unable to handle DSIG2)
  ---
  >       static_assert( nprocesses == 1 || nprocesses == 2, "Assume nprocesses == 1 or 2" );
…ges (except for static assert allowing nprocesses==2)

This completes the checks for PR madgraph5#754, all looks good
…ter merging Stefan's fix for coupling order in PR madgraph5#757 - code changes only in gq_ttq

The new codebase includes in particular also
  6bf4a65       oliviermattelaer        Wed Aug 30 14:57:19 2023 +0200  Merge pull request madgraph5#754 from madgraph5/fix_mirror
  0aba0c6       oliviermattelaer        Wed Aug 30 14:53:17 2023 +0200  Merge pull request madgraph5#757 from roiser/fixcouplingordering
  4f3cdd8       Stefan Roiser   Tue Aug 29 17:43:25 2023 +0200  in PLUGIN_UFOModelConverter overwrite teh prepare_couplings function the additional functionality will re-order the dictionary keys such that they follow the order of the couplings in parameter wanted_couplings wanted_couplings contains the correct ordering as it is discovered in the OneProcessExporter class

NB: generated code is identical in all processes except for gq_ttq

NB: ee_mumu.mad code generation (and similary ee_mumu.sa) fails with
  Command "import /data/avalassi/GPU2023/madgraph4gpuX/MG5aMC/TMPOUT/CODEGEN_mad_ee_mumu.mg" interrupted in sub-command:
  "output madevent ../TMPOUT/CODEGEN_mad_ee_mumu --hel_recycling=False --vector_size=16384 --me_exporter=standalone_cudacpp" with error:
  KeyError : 'GC_3'
…qttq xsec is now correct! fixes high-priority issue madgraph5#748

These three tests now succeed (they used to fail)
  ./tmad/teeMadX.sh +10x -gqttq -makeclean
  ./tmad/teeMadX.sh +10x -gqttq -makeclean -fltonly
  ./tmad/teeMadX.sh +10x -gqttq -makeclean -mixonly

NB: eemumu code generation remains to be fixed after PR madgraph5#757
…he ref needs updating after fixing the coupling order in PR madgraph5#757)

./tput/teeThroughputX.sh -mix -hrd -makej -gqttq -makeclean
./tput/teeThroughputX.sh -makej -gqttq -flt -bridge -makeclean
…coupling order in PR madgraph5#757

cp dump_SIGMA_SM_GUX_TTXUX_CPU_MadgraphTest.CompareMomentaAndME_0.txt ../../../CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/test/ref/
…ref file for runTest

./tput/teeThroughputX.sh -mix -hrd -makej -gqttq -makeclean
./tput/teeThroughputX.sh -makej -gqttq -flt -bridge -makeclean
This adds only one commit d644235
(Olivier, Wed Aug 30 15:14:09 2023 +0200)
stupid fix for a symlink/git issue

This removes some Makefile symlinks in kokkos, it does not touch cudacpp
This adds only one commit 9fc9873
(Olivier, Thu Aug 31 11:56:40 2023 +0200)
fix launch issue for cudacpp, run_card, some cleaning and compilation for mac

This includes several changes
- updates in Submodule MG5aMC/mg5amcnlo cb4771efc..8bfd1fde4:
- several changes in CODEGEN (affecting SDE in runcard, makefiles for Mac)
Note: code generation breaks clang format, and patching fails some hunks. One build seemed ok, I did not try runtime tests.
…eep only the mg5amcnlo update)

git checkout ee9cee387c0aa4117080ec312e3f37d32c670370 CODEGEN/
…- no significant changes in code

(note that my new gqttq ref also affected pptt012j)
…g merges will create conflicts

Revert "[oct23av] TEMPORARELY UNDO Olivier's changes to CODEGEN in 9fc9873 (keep only the mg5amcnlo update)"
This reverts commit d883478.
This is a noop. It includes only one commit that I already cherry-picked before.
(Olivier Thu Aug 31 15:11:54 2023 +0200)
fixing issue of stefan with nprocesses>1
This adds only one commit 3fbf7b1
(Olivier Thu Aug 31 23:13:25 2023 +0200)
fix symmetry factor + missing file
@oliviermattelaer
Copy link
Member

I fix the last issue with the pointer. (it was brew that messed up with something, so I have removed fully brew from my system).

But I confirm that I do need to have make_opts to have it compiling on mac.
(More precisely, I need to have gfortran setup as FC which is one of the things done in make_opts).

So one hacky solution to have this PR pushed, is to hardcode FC=gfortran within your makefile (but this is obviously not sustainable). I have pushed that. and will open an issue to track andrea issue with make_opts

@oliviermattelaer
Copy link
Member

So the make_opts issue is tracked here: #787

@valassi
Copy link
Member Author

valassi commented Nov 3, 2023

Ok I have the impression that we converged :-)
Thanks @oliviermattelaer!

There is one pending issue in #790 about reset_simd, which is not understood. I assign this to you.

About make_opts, I fixed #787.
Just one point on Mac and gfortran: GNU make assumes FC=f77 if FC is not defined. So indeed in my Linux environment I explicitly export FC=gfortran. But I think this is the correct way
https://www.gnu.org/software/make/manual/html_node/Implicit-Variables.html

As discussed, I then consider this as ready to go.
(@roiser and @hageboeck, last chance for comments!).
I will regenerate all processes and launch all tests first.
So I will merge this evening I think.

Copy link
Member

@roiser roiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB: I'm have not looked at makefile changes as those will probably be changed soon again

…anall' now fails because make_opts is included madgraph5#787 (*NB OpenMP is now disabled by default!*)

This boils down to

make -f cudacpp.mk cleanall
  OMPFLAGS=
  AVX=512y
  FPTYPE=d
  HELINL=0
  HRDCOD=0
  RNDGEN=hasCurand
  Building in BUILDDIR=. for tag=512y_d_inl0_hrd0_hasCurand (USEBUILDDIR is not set)
  make USEBUILDDIR=0 clean -f ../../Source/make_opts
  make[1]: Entering directory '/data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx'
  make[1]: *** No rule to make target 'clean'.  Stop.
  make[1]: Leaving directory '/data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx'
…w including make_opts causes issues here: the CUDACPP_MAKEFILE variable is corrupted madgraph5#787

make -f cudacpp.mk cleanall
  CUDACPP_MAKEFILE='cudacpp.mk'
  CUDACPP_MAKEFILE='../../Source/make_opts'
  cudacpp.mk:42: *** exit.  Stop.
…ue with override, this is irrelevant! madgraph5#787

make -f cudacpp.mk cleanall
  CUDACPP_MAKEFILE='cudacpp.mk'
  CUDACPP_MAKEFILE='../../Source/make_opts'
  cudacpp.mk:42: *** exit.  Stop.
…ly once at the beginning and avoid corruption when make_opts is included madgraph5#787

make -f cudacpp.mk cleanall
  CUDACPP_MAKEFILE='cudacpp.mk'
  CUDACPP_MAKEFILE='cudacpp.mk'
  cudacpp.mk:43: *** exit.  Stop.

However, this is still sensitive to variables coming from outside

make -f cudacpp.mk cleanall CUDACPP_MAKEFILE=pippo
  CUDACPP_MAKEFILE='pippo'
  CUDACPP_MAKEFILE='pippo'
  cudacpp.mk:43: *** exit.  Stop.
…re that CUDACPP_MAKEFILE is an internal variable that cannot be set externally madgraph5#787

make -f cudacpp.mk cleanall
  CUDACPP_MAKEFILE='cudacpp.mk'
  CUDACPP_MAKEFILE='cudacpp.mk'
  cudacpp.mk:43: *** exit.  Stop.

This is now insensitive to variables coming from outside

make -f cudacpp.mk cleanall CUDACPP_MAKEFILE=pippo
  CUDACPP_MAKEFILE='cudacpp.mk'
  CUDACPP_MAKEFILE='cudacpp.mk'
  cudacpp.mk:43: *** exit.  Stop.
…nall' now succeeds after including make_opts madgraph5#787

NB: I also checked that 'make cleanall; make' gives exactly the same output in my environment whether make_opts is included or not
…ble - usual failures in ggttg f/m and gqttq f (madgraph5#783), no change in performance (*NB OpenMP is now disabled by default!*)

STARTED  AT Fri Nov  3 10:06:44 AM CET 2023
./tput/teeThroughputX.sh -mix -hrd -makej -eemumu -ggtt -ggttg -ggttgg -gqttq -ggttggg -makeclean
ENDED(1) AT Fri Nov  3 01:30:11 PM CET 2023 [Status=2]
./tput/teeThroughputX.sh -flt -hrd -makej -eemumu -ggtt -ggttgg -inlonly -makeclean
ENDED(2) AT Fri Nov  3 01:55:47 PM CET 2023 [Status=0]
./tput/teeThroughputX.sh -makej -eemumu -ggtt -ggttg -gqttq -ggttgg -ggttggg -flt -bridge -makeclean
ENDED(3) AT Fri Nov  3 02:05:25 PM CET 2023 [Status=2]
./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -rmbhst
ENDED(4) AT Fri Nov  3 02:08:40 PM CET 2023 [Status=0]
./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -curhst
ENDED(5) AT Fri Nov  3 02:11:53 PM CET 2023 [Status=0]
…abled), no change in functionality or performance (*NB OpenMP is now disabled by default!*)

STARTED AT Fri Nov  3 02:15:11 PM CET 2023
ENDED   AT Fri Nov  3 02:38:38 PM CET 2023

Status=0

24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_m_inl0_hrd0.txt
0 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_d_inl0_hrd0.txt
0 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_f_inl0_hrd0.txt
0 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_m_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_d_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_f_inl0_hrd0.txt
24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_m_inl0_hrd0.txt
@@ -233,6 +233,8 @@ override OMPFLAGS = -fopenmp
else ifneq ($(shell $(CXX) --version | egrep '^(Apple clang)'),)
override OMPFLAGS = # AV disable OpenMP MT on Apple clang (builds fail in the CI #578)
###override OMPFLAGS = -fopenmp # OM reenable OpenMP MT on Apple clang? (AV Oct 2023: this still fails in the CI)
else ifneq ($(UNAME), 'Darwin')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oliviermattelaer I just realised there are two bugs here

  • It should be ifeq not ifneq
  • It should be UNAME_S not UNAME
    That is it should read else ifeq ($(UNAME_S), 'Darwin')
    The way it is done now disables OpenMP on Mac AND ALSO on Linux (except icpx and clang)
    I will fix this

Copy link
Member Author

@valassi valassi Nov 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouf. There were three errors (I noticed because the CI for b647330 failed). No quotes needed. It should read else ifeq ($(UNAME_S),Darwin). Fixing this again

… the intent was to disable OpenMP on Mac, but it disabled it also on Linux, now fixed
…files to disable OpenMP on Mac but not on Linux
…c - testing this on Olivier's desktop Mac, looks ok now
…efiles to disable OpenMP on Mac but not on Linux
…ble - usual failures in ggttg f/m and gqttq f (madgraph5#783), no change in performance (*NB OpenMP is now again enabled by default*)

(or maybe ~1-2% slower on average? anyway, keep OpenMP on as in the past)
… also tput with FPEs enabled), usual ggttggg failures, no change in functionality or performance (*NB OpenMP is now again enabled by default*)

(or maybe ~1-2% slower on average? anyway, keep OpenMP on as in the past)
@valassi
Copy link
Member Author

valassi commented Nov 3, 2023

All tests in the CI including Mac are now good. Reviews addressesd. I am finally self merging.

@valassi valassi merged commit 63ddd4d into madgraph5:master Nov 3, 2023
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cudacpp.mk should include make_opts
4 participants