Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: studies on CMS DY with phase space optimizations #970

Draft
wants to merge 406 commits into
base: master
Choose a base branch
from

Conversation

valassi
Copy link
Member

@valassi valassi commented Aug 15, 2024

This is a WIP PR extending the CMS DY studies in #946 (which itself includes bits and pieces of many other PRs)

In addition to that, this includes some studies on phase space sampling optimizations, related to #963 #967 #968 #969

Now "Fortran Other" becomes negative again, there is again some double counting

./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp
 [COUNTERS] PROGRAM TOTAL               :    0.7511s
 [COUNTERS] Fortran Other        (  0 ) :   -0.0373s
 [COUNTERS] Fortran X2F          (  1 ) :    0.0168s for    16399 events => throughput is 1.02E-06 events/s
 [COUNTERS] Fortran PDF          (  2 ) :    0.0965s for    32768 events => throughput is 2.94E-06 events/s
 [COUNTERS] Fortran final_I/O    (  3 ) :    0.2598s for    16399 events => throughput is 1.58E-05 events/s
 [COUNTERS] CudaCpp HEL          (  5 ) :    0.0008s
 [COUNTERS] CudaCpp MEs          (  6 ) :    0.0868s for    16384 events => throughput is 5.30E-06 events/s
 [COUNTERS] Fortran initial_I/O  (  7 ) :    0.0670s
 [COUNTERS] PROGRAM sample_full  ( 11 ) :    0.6811s
 [COUNTERS] Fortran TEST         ( 12 ) :    0.0506s for    16384 events => throughput is 3.09E-06 events/s
 [COUNTERS] Fortran TEST2        ( 13 ) :    0.0099s for    16384 events => throughput is 6.01E-07 events/s
 [COUNTERS] Fortran TEST3        ( 14 ) :    0.0541s for    16384 events => throughput is 3.30E-06 events/s
 [COUNTERS] Fortran TEST5        ( 16 ) :    0.1462s for    16384 events => throughput is 8.93E-06 events/s
This makes it clearer that programtotal = samplefull + initialIO

./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp
 [COUNTERS] PROGRAM TOTAL               :    0.7554s
 [COUNTERS] Fortran Other        (  0 ) :   -0.0393s
 [COUNTERS] Fortran X2F          (  1 ) :    0.0171s for    16399 events => throughput is 1.04E-06 events/s
 [COUNTERS] Fortran PDF          (  2 ) :    0.0984s for    32768 events => throughput is 3.00E-06 events/s
 [COUNTERS] Fortran final_I/O    (  3 ) :    0.2621s for    16399 events => throughput is 1.60E-05 events/s
 [COUNTERS] CudaCpp HEL          (  5 ) :    0.0007s
 [COUNTERS] CudaCpp MEs          (  6 ) :    0.0872s for    16384 events => throughput is 5.32E-06 events/s
 [COUNTERS] Fortran initial_I/O  (  7 ) :    0.0688s
 [COUNTERS] Fortran TEST         ( 12 ) :    0.0521s for    16384 events => throughput is 3.18E-06 events/s
 [COUNTERS] Fortran TEST2        ( 13 ) :    0.0100s for    16384 events => throughput is 6.08E-07 events/s
 [COUNTERS] Fortran TEST3        ( 14 ) :    0.0507s for    16384 events => throughput is 3.09E-06 events/s
 [COUNTERS] Fortran TEST5        ( 16 ) :    0.1478s for    16384 events => throughput is 9.02E-06 events/s
 [COUNTERS] PROGRAM initial_I/O  ( 19 ) :    0.0688s
 [COUNTERS] PROGRAM sample_full  ( 20 ) :    0.6838s
…grouping

./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp
 [COUNTERS] PROGRAM TOTAL               :    0.7428s
 [COUNTERS] Fortran Other        (  0 ) :   -0.0409s
 [COUNTERS] Fortran X2F          (  1 ) :    0.0169s for    16399 events => throughput is 1.03E-06 events/s
 [COUNTERS] Fortran PDF          (  2 ) :    0.0982s for    32768 events => throughput is 3.00E-06 events/s
 [COUNTERS] Fortran final_I/O    (  3 ) :    0.2585s for    16399 events => throughput is 1.58E-05 events/s
 [COUNTERS] CudaCpp HEL          (  5 ) :    0.0007s
 [COUNTERS] CudaCpp MEs          (  6 ) :    0.0865s for    16384 events => throughput is 5.28E-06 events/s
 [COUNTERS] Fortran initial_I/O  (  7 ) :    0.0670s
 [COUNTERS] Fortran grouping     ( 12 ) :    0.0520s for    16384 events => throughput is 3.17E-06 events/s
 [COUNTERS] Fortran scale        ( 13 ) :    0.0098s for    16384 events => throughput is 5.98E-07 events/s
 [COUNTERS] Fortran rewgt        ( 14 ) :    0.0497s for    16384 events => throughput is 3.03E-06 events/s
 [COUNTERS] Fortran unwgt        ( 16 ) :    0.1445s for    16384 events => throughput is 8.82E-06 events/s
 [COUNTERS] PROGRAM initial_I/O  ( 19 ) :    0.0670s
 [COUNTERS] PROGRAM sample_full  ( 20 ) :    0.6728s
…s, which was causing double counting and a negative Fortran Other

The problem is that select_grouping_choice calls dsigproc, which eventually calls dsig1, which includes pdf profiling

./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp
 [COUNTERS] PROGRAM TOTAL               :    0.7643s
 [COUNTERS] Fortran Other        (  0 ) :    0.0111s
 [COUNTERS] Fortran X2F          (  1 ) :    0.0164s for    16399 events => throughput is 9.98E-07 events/s
 [COUNTERS] Fortran PDF          (  2 ) :    0.1013s for    32768 events => throughput is 3.09E-06 events/s
 [COUNTERS] Fortran final_I/O    (  3 ) :    0.2712s for    16399 events => throughput is 1.65E-05 events/s
 [COUNTERS] CudaCpp HEL          (  5 ) :    0.0008s
 [COUNTERS] CudaCpp MEs          (  6 ) :    0.0874s for    16384 events => throughput is 5.34E-06 events/s
 [COUNTERS] Fortran initial_I/O  (  7 ) :    0.0663s
 [COUNTERS] Fortran scale        ( 13 ) :    0.0103s for    16384 events => throughput is 6.26E-07 events/s
 [COUNTERS] Fortran rewgt        ( 14 ) :    0.0511s for    16384 events => throughput is 3.12E-06 events/s
 [COUNTERS] Fortran unwgt        ( 16 ) :    0.1484s for    16384 events => throughput is 9.06E-06 events/s
 [COUNTERS] PROGRAM initial_I/O  ( 19 ) :    0.0663s
 [COUNTERS] PROGRAM sample_full  ( 20 ) :    0.6950s
…sig1 (not only dsig1_vec), but it does not show up! - will revert

./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp
 [COUNTERS] PROGRAM TOTAL               :    0.7479s
 [COUNTERS] Fortran Other        (  0 ) :    0.0122s
 [COUNTERS] Fortran X2F          (  1 ) :    0.0166s for    16399 events => throughput is 1.01E-06 events/s
 [COUNTERS] Fortran PDF          (  2 ) :    0.0974s for    32768 events => throughput is 2.97E-06 events/s
 [COUNTERS] Fortran final_I/O    (  3 ) :    0.2625s for    16399 events => throughput is 1.60E-05 events/s
 [COUNTERS] CudaCpp HEL          (  5 ) :    0.0007s
 [COUNTERS] CudaCpp MEs          (  6 ) :    0.0873s for    16384 events => throughput is 5.33E-06 events/s
 [COUNTERS] Fortran initial_I/O  (  7 ) :    0.0657s
 [COUNTERS] Fortran scale        ( 13 ) :    0.0102s for    16384 events => throughput is 6.21E-07 events/s
 [COUNTERS] Fortran rewgt        ( 14 ) :    0.0494s for    16384 events => throughput is 3.01E-06 events/s
 [COUNTERS] Fortran unwgt        ( 16 ) :    0.1459s for    16384 events => throughput is 8.90E-06 events/s
 [COUNTERS] PROGRAM initial_I/O  ( 19 ) :    0.0657s
 [COUNTERS] PROGRAM sample_full  ( 20 ) :    0.6793s
Revert "[prof] in gg_tt.mad auto_dsig1.f, add profiling for matrix1 also in dsig1 (not only dsig1_vec), but it does not show up! - will revert"
This reverts commit d3165cb.
…ble counting in x2f for instance) - small contribution, will revert

The contribution is small because it does not make Fortran Other decrease... (while x2f increases due to profiling overhead)

./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp
 [COUNTERS] PROGRAM TOTAL               :    0.7472s
 [COUNTERS] Fortran Other        (  0 ) :    0.0105s
 [COUNTERS] Fortran X2F          (  1 ) :    0.0212s for    16399 events => throughput is 1.29E-06 events/s
 [COUNTERS] Fortran PDF          (  2 ) :    0.0943s for    32768 events => throughput is 2.88E-06 events/s
 [COUNTERS] Fortran final_I/O    (  3 ) :    0.2576s for    16399 events => throughput is 1.57E-05 events/s
 [COUNTERS] CudaCpp HEL          (  5 ) :    0.0007s
 [COUNTERS] CudaCpp MEs          (  6 ) :    0.0860s for    16384 events => throughput is 5.25E-06 events/s
 [COUNTERS] Fortran initial_I/O  (  7 ) :    0.0638s
 [COUNTERS] Fortran ranmar       ( 12 ) :    0.0057s for   114719 events => throughput is 4.93E-08 events/s
 [COUNTERS] Fortran scale        ( 13 ) :    0.0098s for    16384 events => throughput is 6.00E-07 events/s
 [COUNTERS] Fortran rewgt        ( 14 ) :    0.0508s for    16384 events => throughput is 3.10E-06 events/s
 [COUNTERS] Fortran unwgt        ( 16 ) :    0.1470s for    16384 events => throughput is 8.97E-06 events/s
 [COUNTERS] PROGRAM initial_I/O  ( 19 ) :    0.0638s
 [COUNTERS] PROGRAM sample_full  ( 20 ) :    0.6805s
Revert "[prof] in gg_tt.mad, profile ranmar (in ranmar.f: but this causes double counting in x2f for instance) - small contribution, will revert"
This reverts commit 59dbf04.
…st12 and test15): very large for cuda runs!

./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
 [COUNTERS] PROGRAM TOTAL               :    1.1881s
 [COUNTERS] Fortran Other        (  0 ) :    0.0114s
 [COUNTERS] Fortran X2F          (  1 ) :    0.0171s for    16399 events => throughput is 1.04E-06 events/s
 [COUNTERS] Fortran PDF          (  2 ) :    0.1026s for    32768 events => throughput is 3.13E-06 events/s
 [COUNTERS] Fortran final_I/O    (  3 ) :    0.3368s for    16399 events => throughput is 2.05E-05 events/s
 [COUNTERS] CudaCpp HEL          (  5 ) :    0.0011s
 [COUNTERS] CudaCpp MEs          (  6 ) :    0.0010s for    16384 events => throughput is 6.20E-08 events/s
 [COUNTERS] Fortran initial_I/O  (  7 ) :    0.0783s
 [COUNTERS] Fortran TEST12       ( 12 ) :    0.0256s
 [COUNTERS] Fortran scale        ( 13 ) :    0.0104s for    16384 events => throughput is 6.37E-07 events/s
 [COUNTERS] Fortran rewgt        ( 14 ) :    0.0512s for    16384 events => throughput is 3.12E-06 events/s
 [COUNTERS] Fortran TEST15       ( 15 ) :    0.4023s
 [COUNTERS] Fortran unwgt        ( 16 ) :    0.1503s for    16384 events => throughput is 9.18E-06 events/s
 [COUNTERS] PROGRAM initial_I/O  ( 19 ) :    0.0783s
 [COUNTERS] PROGRAM sample_full  ( 20 ) :    0.6814s
…on, helicity calculation) and finalise (bridge deletion) timers

Now "Fortran Other" is 1% of the total, will stop here and clean up the rest

./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
 [COUNTERS] PROGRAM TOTAL               :    1.1485s
 [COUNTERS] Fortran Other        (  0 ) :    0.0119s
 [COUNTERS] Fortran X2F          (  1 ) :    0.0174s for    16399 events => throughput is 1.06E-06 events/s
 [COUNTERS] Fortran PDF          (  2 ) :    0.1005s for    32768 events => throughput is 3.07E-06 events/s
 [COUNTERS] Fortran final_I/O    (  3 ) :    0.2783s for    16399 events => throughput is 1.70E-05 events/s
 [COUNTERS] CudaCpp initialise   (  5 ) :    0.4243s
 [COUNTERS] CudaCpp MEs          (  6 ) :    0.0010s for    16384 events => throughput is 6.20E-08 events/s
 [COUNTERS] Fortran initial_I/O  (  7 ) :    0.0733s
 [COUNTERS] CudaCpp finalise     (  8 ) :    0.0259s
 [COUNTERS] Fortran scale        ( 13 ) :    0.0098s for    16384 events => throughput is 6.01E-07 events/s
 [COUNTERS] Fortran rewgt        ( 14 ) :    0.0525s for    16384 events => throughput is 3.20E-06 events/s
 [COUNTERS] Fortran unwgt        ( 16 ) :    0.1535s for    16384 events => throughput is 9.37E-06 events/s
 [COUNTERS] PROGRAM initial_I/O  ( 19 ) :    0.0733s
 [COUNTERS] PROGRAM sample_full  ( 20 ) :    0.6245s

./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp
./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp
 [COUNTERS] PROGRAM TOTAL               :    0.7643s
 [COUNTERS] Fortran Other        (  0 ) :    0.0102s
 [COUNTERS] Fortran X2F          (  1 ) :    0.0167s for    16399 events => throughput is 1.02E-06 events/s
 [COUNTERS] Fortran PDF          (  2 ) :    0.0983s for    32768 events => throughput is 3.00E-06 events/s
 [COUNTERS] Fortran final_I/O    (  3 ) :    0.2644s for    16399 events => throughput is 1.61E-05 events/s
 [COUNTERS] CudaCpp initialise   (  5 ) :    0.0022s
 [COUNTERS] CudaCpp MEs          (  6 ) :    0.0919s for    16384 events => throughput is 5.61E-06 events/s
 [COUNTERS] Fortran initial_I/O  (  7 ) :    0.0659s
 [COUNTERS] CudaCpp finalise     (  8 ) :    0.0002s
 [COUNTERS] Fortran scale        ( 13 ) :    0.0100s for    16384 events => throughput is 6.11E-07 events/s
 [COUNTERS] Fortran rewgt        ( 14 ) :    0.0527s for    16384 events => throughput is 3.22E-06 events/s
 [COUNTERS] Fortran unwgt        ( 16 ) :    0.1518s for    16384 events => throughput is 9.26E-06 events/s
 [COUNTERS] PROGRAM initial_I/O  ( 19 ) :    0.0659s
 [COUNTERS] PROGRAM sample_full  ( 20 ) :    0.6949s
… timers and the three TEST timers

./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
 [COUNTERS] PROGRAM TOTAL               :    1.0922s
 [COUNTERS] Fortran Other        (  0 ) :    0.0113s
 [COUNTERS] Fortran X2F          (  1 ) :    0.0168s for    16399 events => throughput is 1.02E-06 events/s
 [COUNTERS] Fortran PDF          (  2 ) :    0.0947s for    32768 events => throughput is 2.89E-06 events/s
 [COUNTERS] Fortran final_I/O    (  3 ) :    0.2625s for    16399 events => throughput is 1.60E-05 events/s
 [COUNTERS] CudaCpp initialise   (  5 ) :    0.4035s
 [COUNTERS] CudaCpp MEs          (  6 ) :    0.0010s for    16384 events => throughput is 6.07E-08 events/s
 [COUNTERS] Fortran initial_I/O  (  7 ) :    0.0703s
 [COUNTERS] CudaCpp finalise     (  8 ) :    0.0253s
 [COUNTERS] Fortran scale        ( 13 ) :    0.0096s for    16384 events => throughput is 5.87E-07 events/s
 [COUNTERS] Fortran rewgt        ( 14 ) :    0.0488s for    16384 events => throughput is 2.98E-06 events/s
 [COUNTERS] Fortran unwgt        ( 16 ) :    0.1485s for    16384 events => throughput is 9.06E-06 events/s

./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp
./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp
 [COUNTERS] PROGRAM TOTAL               :    0.7471s
 [COUNTERS] Fortran Other        (  0 ) :    0.0098s
 [COUNTERS] Fortran X2F          (  1 ) :    0.0168s for    16399 events => throughput is 1.02E-06 events/s
 [COUNTERS] Fortran PDF          (  2 ) :    0.0966s for    32768 events => throughput is 2.95E-06 events/s
 [COUNTERS] Fortran final_I/O    (  3 ) :    0.2632s for    16399 events => throughput is 1.60E-05 events/s
 [COUNTERS] CudaCpp initialise   (  5 ) :    0.0023s
 [COUNTERS] CudaCpp MEs          (  6 ) :    0.0854s for    16384 events => throughput is 5.21E-06 events/s
 [COUNTERS] Fortran initial_I/O  (  7 ) :    0.0656s
 [COUNTERS] CudaCpp finalise     (  8 ) :    0.0002s
 [COUNTERS] Fortran scale        ( 13 ) :    0.0097s for    16384 events => throughput is 5.93E-07 events/s
 [COUNTERS] Fortran rewgt        ( 14 ) :    0.0497s for    16384 events => throughput is 3.03E-06 events/s
 [COUNTERS] Fortran unwgt        ( 16 ) :    0.1479s for    16384 events => throughput is 9.03E-06 events/s
…d order

./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
 [COUNTERS] PROGRAM TOTAL                         :    1.1034s
 [COUNTERS] Fortran Other                  (  0 ) :    0.0111s
 [COUNTERS] Fortran Initialise(I/O)        (  1 ) :    0.0716s
 [COUNTERS] Fortran Random2Momenta         (  3 ) :    0.0170s for    16399 events => throughput is 1.03E-06 events/s
 [COUNTERS] Fortran PDFs                   (  4 ) :    0.0989s for    32768 events => throughput is 3.02E-06 events/s
 [COUNTERS] Fortran UpdateScaleCouplings   (  5 ) :    0.0102s for    16384 events => throughput is 6.20E-07 events/s
 [COUNTERS] Fortran Reweight               (  6 ) :    0.0511s for    16384 events => throughput is 3.12E-06 events/s
 [COUNTERS] Fortran Unweight(LHE-I/O)      (  7 ) :    0.1456s for    16384 events => throughput is 8.89E-06 events/s
 [COUNTERS] Fortran SamplePutPoint         (  8 ) :    0.2672s for    16399 events => throughput is 1.63E-05 events/s
 [COUNTERS] CudaCpp Initialise             ( 11 ) :    0.4048s
 [COUNTERS] CudaCpp Finalise               ( 12 ) :    0.0250s
 [COUNTERS] CudaCpp MEs                    ( 19 ) :    0.0010s for    16384 events => throughput is 6.07E-08 events/s

./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp
./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp
 [COUNTERS] PROGRAM TOTAL                         :    0.7943s
 [COUNTERS] Fortran Other                  (  0 ) :    0.0111s
 [COUNTERS] Fortran Initialise(I/O)        (  1 ) :    0.0685s
 [COUNTERS] Fortran Random2Momenta         (  3 ) :    0.0171s for    16399 events => throughput is 1.04E-06 events/s
 [COUNTERS] Fortran PDFs                   (  4 ) :    0.1047s for    32768 events => throughput is 3.20E-06 events/s
 [COUNTERS] Fortran UpdateScaleCouplings   (  5 ) :    0.0105s for    16384 events => throughput is 6.39E-07 events/s
 [COUNTERS] Fortran Reweight               (  6 ) :    0.0536s for    16384 events => throughput is 3.27E-06 events/s
 [COUNTERS] Fortran Unweight(LHE-I/O)      (  7 ) :    0.1569s for    16384 events => throughput is 9.58E-06 events/s
 [COUNTERS] Fortran SamplePutPoint         (  8 ) :    0.2773s for    16399 events => throughput is 1.69E-05 events/s
 [COUNTERS] CudaCpp Initialise             ( 11 ) :    0.0023s
 [COUNTERS] CudaCpp Finalise               ( 12 ) :    0.0002s
 [COUNTERS] CudaCpp MEs                    ( 19 ) :    0.0921s for    16384 events => throughput is 5.62E-06 events/s
…Es" counters

./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
 [COUNTERS] PROGRAM TOTAL                         :    1.0988s
 [COUNTERS] Fortran Other                  (  0 ) :    0.0117s
 [COUNTERS] Fortran Initialise(I/O)        (  1 ) :    0.0697s
 [COUNTERS] Fortran Random2Momenta         (  3 ) :    0.0167s for    16399 events => throughput is 1.02E-06 events/s
 [COUNTERS] Fortran PDFs                   (  4 ) :    0.0910s for    32768 events => throughput is 2.78E-06 events/s
 [COUNTERS] Fortran UpdateScaleCouplings   (  5 ) :    0.0098s for    16384 events => throughput is 5.99E-07 events/s
 [COUNTERS] Fortran Reweight               (  6 ) :    0.0473s for    16384 events => throughput is 2.89E-06 events/s
 [COUNTERS] Fortran Unweight(LHE-I/O)      (  7 ) :    0.1488s for    16384 events => throughput is 9.08E-06 events/s
 [COUNTERS] Fortran SamplePutPoint         (  8 ) :    0.2702s for    16399 events => throughput is 1.65E-05 events/s
 [COUNTERS] CudaCpp Initialise             ( 11 ) :    0.4077s
 [COUNTERS] CudaCpp Finalise               ( 12 ) :    0.0250s
 [COUNTERS] CudaCpp MEs                    ( 19 ) :    0.0010s for    16384 events => throughput is 6.02E-08 events/s
 [COUNTERS] OVERALL NON-MEs                ( 21 ) :    1.0979s
 [COUNTERS] OVERALL MEs                    ( 22 ) :    0.0010s for    16384 events => throughput is 6.02E-08 events/s

./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp
./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp
 [COUNTERS] PROGRAM TOTAL                         :    0.7378s
 [COUNTERS] Fortran Other                  (  0 ) :    0.0097s
 [COUNTERS] Fortran Initialise(I/O)        (  1 ) :    0.0654s
 [COUNTERS] Fortran Random2Momenta         (  3 ) :    0.0166s for    16399 events => throughput is 1.01E-06 events/s
 [COUNTERS] Fortran PDFs                   (  4 ) :    0.0924s for    32768 events => throughput is 2.82E-06 events/s
 [COUNTERS] Fortran UpdateScaleCouplings   (  5 ) :    0.0096s for    16384 events => throughput is 5.88E-07 events/s
 [COUNTERS] Fortran Reweight               (  6 ) :    0.0465s for    16384 events => throughput is 2.84E-06 events/s
 [COUNTERS] Fortran Unweight(LHE-I/O)      (  7 ) :    0.1475s for    16384 events => throughput is 9.00E-06 events/s
 [COUNTERS] Fortran SamplePutPoint         (  8 ) :    0.2621s for    16399 events => throughput is 1.60E-05 events/s
 [COUNTERS] CudaCpp Initialise             ( 11 ) :    0.0022s
 [COUNTERS] CudaCpp Finalise               ( 12 ) :    0.0002s
 [COUNTERS] CudaCpp MEs                    ( 19 ) :    0.0857s for    16384 events => throughput is 5.23E-06 events/s
 [COUNTERS] OVERALL NON-MEs                ( 21 ) :    0.6521s
 [COUNTERS] OVERALL MEs                    ( 22 ) :    0.0857s for    16384 events => throughput is 5.23E-06 events/s
…E_DISABLECOUNTERS to disable individual counters

CUDACPP_RUNTIME_DISABLECOUNTERS=1 ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
CUDACPP_RUNTIME_DISABLECOUNTERS=1 ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
 [COUNTERS] PROGRAM TOTAL                         :    1.0898s

CUDACPP_RUNTIME_DISABLECOUNTERS=1 ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp
CUDACPP_RUNTIME_DISABLECOUNTERS=1 ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp
 [COUNTERS] PROGRAM TOTAL                         :    0.7309s
Revert "[prof] in gg_tt.mad counters.cc, consider printing throughputs only for MEs counters - will revert"
This reverts commit 5b24462.
…ounters: must include again dsample.f and auto_dsig.f in patches

The only files that still need to be patched are
- 4 in patch.common: Source/makefile, Source/genps.inc, Source/dsample.f, SubProcesses/makefile
- 4 in patch.P1: auto_dsig1.f, auto_dsig.f, driver.f, matrix1.f

./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch
git diff --no-ext-diff -R gg_tt.mad/Source/makefile gg_tt.mad/Source/genps.inc gg_tt.mad/SubProcesses/makefile gg_tt.mad/Source/dsample.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common
git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig.f gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1
git checkout gg_tt.mad
git checkout prof $(git ls-tree --name-only prof */CODEGEN*txt)
Revert "[cmsdy] regenerate all processes (ops this is not what I was normally doing in cmsdy... but ok)"
This reverts commit 7122b9c.
…cluding the latest counters

CUDACPP_RUNTIME_DISABLEFPE=1 ./tlau/lauX.sh -fortran pp_dy3j.mad -togridpack
…ld91

CUDACPP_RUNTIME_DISABLEFPE=1 ./tlau/lauX.sh -nomakeclean -ALL pp_dy3j.mad -fromgridpack

pp_dy3j.mad//fortran/output.txt
[GridPackCmd.launch] GRIDPCK TOTAL    447.7169 seconds
[madevent COUNTERS]  PROGRAM TOTAL 443.48
[madevent COUNTERS]  Fortran Other 6.5439
[madevent COUNTERS]  Fortran Initialise(I/O) 4.4648
[madevent COUNTERS]  Fortran Random2Momenta 93.2692
[madevent COUNTERS]  Fortran PDFs 8.2697
[madevent COUNTERS]  Fortran UpdateScaleCouplings 7.3142
[madevent COUNTERS]  Fortran Reweight 3.6975
[madevent COUNTERS]  Fortran Unweight(LHE-I/O) 4.8636
[madevent COUNTERS]  Fortran SamplePutPoint 8.3255
[madevent COUNTERS]  Fortran MEs 306.731
[madevent COUNTERS]  OVERALL NON-MEs 136.748
[madevent COUNTERS]  OVERALL MEs 306.731
--------------------------------------------------------------------------------
pp_dy3j.mad//cppnone/output.txt
[GridPackCmd.launch] GRIDPCK TOTAL    448.1598 seconds
[madevent COUNTERS]  PROGRAM TOTAL 443.898
[madevent COUNTERS]  Fortran Other 6.5229
[madevent COUNTERS]  Fortran Initialise(I/O) 4.4882
[madevent COUNTERS]  Fortran Random2Momenta 93.1882
[madevent COUNTERS]  Fortran PDFs 8.2912
[madevent COUNTERS]  Fortran UpdateScaleCouplings 7.282
[madevent COUNTERS]  Fortran Reweight 3.703
[madevent COUNTERS]  Fortran Unweight(LHE-I/O) 4.8634
[madevent COUNTERS]  Fortran SamplePutPoint 8.2912
[madevent COUNTERS]  CudaCpp Initialise 1.1875
[madevent COUNTERS]  CudaCpp Finalise 0.0215
[madevent COUNTERS]  CudaCpp MEs 306.061
[madevent COUNTERS]  OVERALL NON-MEs 137.837
[madevent COUNTERS]  OVERALL MEs 306.061
--------------------------------------------------------------------------------
pp_dy3j.mad//cppsse4/output.txt
[GridPackCmd.launch] GRIDPCK TOTAL    295.7847 seconds
[madevent COUNTERS]  PROGRAM TOTAL 291.523
[madevent COUNTERS]  Fortran Other 6.5192
[madevent COUNTERS]  Fortran Initialise(I/O) 4.4843
[madevent COUNTERS]  Fortran Random2Momenta 93.2004
[madevent COUNTERS]  Fortran PDFs 8.2944
[madevent COUNTERS]  Fortran UpdateScaleCouplings 7.2823
[madevent COUNTERS]  Fortran Reweight 3.7002
[madevent COUNTERS]  Fortran Unweight(LHE-I/O) 4.8692
[madevent COUNTERS]  Fortran SamplePutPoint 8.285
[madevent COUNTERS]  CudaCpp Initialise 0.7227
[madevent COUNTERS]  CudaCpp Finalise 0.0218
[madevent COUNTERS]  CudaCpp MEs 154.147
[madevent COUNTERS]  OVERALL NON-MEs 137.375
[madevent COUNTERS]  OVERALL MEs 154.147
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
pp_dy3j.mad//cppavx2/output.txt
[GridPackCmd.launch] GRIDPCK TOTAL    204.7001 seconds
[madevent COUNTERS]  PROGRAM TOTAL 200.453
[madevent COUNTERS]  Fortran Other 6.5698
[madevent COUNTERS]  Fortran Initialise(I/O) 4.4868
[madevent COUNTERS]  Fortran Random2Momenta 93.2293
[madevent COUNTERS]  Fortran PDFs 8.2934
[madevent COUNTERS]  Fortran UpdateScaleCouplings 7.3016
[madevent COUNTERS]  Fortran Reweight 3.7048
[madevent COUNTERS]  Fortran Unweight(LHE-I/O) 4.8693
[madevent COUNTERS]  Fortran SamplePutPoint 8.2838
[madevent COUNTERS]  CudaCpp Initialise 0.445
[madevent COUNTERS]  CudaCpp Finalise 0.0217
[madevent COUNTERS]  CudaCpp MEs 63.25
[madevent COUNTERS]  OVERALL NON-MEs 137.203
[madevent COUNTERS]  OVERALL MEs 63.25
--------------------------------------------------------------------------------
pp_dy3j.mad//cpp512y/output.txt
[GridPackCmd.launch] GRIDPCK TOTAL    201.0406 seconds
[madevent COUNTERS]  PROGRAM TOTAL 196.745
[madevent COUNTERS]  Fortran Other 6.6031
[madevent COUNTERS]  Fortran Initialise(I/O) 4.4911
[madevent COUNTERS]  Fortran Random2Momenta 93.3414
[madevent COUNTERS]  Fortran PDFs 8.2981
[madevent COUNTERS]  Fortran UpdateScaleCouplings 7.2869
[madevent COUNTERS]  Fortran Reweight 3.7033
[madevent COUNTERS]  Fortran Unweight(LHE-I/O) 4.8707
[madevent COUNTERS]  Fortran SamplePutPoint 8.2976
[madevent COUNTERS]  CudaCpp Initialise 0.4341
[madevent COUNTERS]  CudaCpp Finalise 0.0217
[madevent COUNTERS]  CudaCpp MEs 59.3994
[madevent COUNTERS]  OVERALL NON-MEs 137.345
[madevent COUNTERS]  OVERALL MEs 59.3994
--------------------------------------------------------------------------------
pp_dy3j.mad//cpp512z/output.txt
[GridPackCmd.launch] GRIDPCK TOTAL    176.8891 seconds
[madevent COUNTERS]  PROGRAM TOTAL 172.637
[madevent COUNTERS]  Fortran Other 6.5768
[madevent COUNTERS]  Fortran Initialise(I/O) 4.486
[madevent COUNTERS]  Fortran Random2Momenta 93.2907
[madevent COUNTERS]  Fortran PDFs 8.2998
[madevent COUNTERS]  Fortran UpdateScaleCouplings 7.2827
[madevent COUNTERS]  Fortran Reweight 3.7045
[madevent COUNTERS]  Fortran Unweight(LHE-I/O) 4.8719
[madevent COUNTERS]  Fortran SamplePutPoint 8.2892
[madevent COUNTERS]  CudaCpp Initialise 0.3619
[madevent COUNTERS]  CudaCpp Finalise 0.0221
[madevent COUNTERS]  CudaCpp MEs 35.4557
[madevent COUNTERS]  OVERALL NON-MEs 137.181
[madevent COUNTERS]  OVERALL MEs 35.4557
--------------------------------------------------------------------------------
pp_dy3j.mad//cuda/output.txt
File not found: SKIP backend cuda
--------------------------------------------------------------------------------
pp_dy3j.mad//hip/output.txt
File not found: SKIP backend hip
--------------------------------------------------------------------------------
…SpaceSampling

These are the first results where timer overhead is removed: looks nice,
but the overhead should be computed in the counters.cc calls rather than in the individual timers
(this would also make more sense with respect to timermap.h where this will not be possible - remane the env, too)

./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
 [COUNTERS] *** USING RDTSC-BASED TIMERS (do not remove timer overhead) ***
 [COUNTERS] PROGRAM TOTAL                         :    4.4608s
 [COUNTERS] Fortran Other                  (  0 ) :    0.1171s
 [COUNTERS] Fortran Initialise(I/O)        (  1 ) :    0.0690s
 [COUNTERS] Fortran PhaseSpaceSampling     (  3 ) :    3.2317s for  1087437 events => throughput is 3.36E+05 events/s
 [COUNTERS] Fortran PDFs                   (  4 ) :    0.0917s for    32768 events => throughput is 3.57E+05 events/s
 [COUNTERS] Fortran UpdateScaleCouplings   (  5 ) :    0.1719s for    16384 events => throughput is 9.53E+04 events/s
 [COUNTERS] Fortran Reweight               (  6 ) :    0.0483s for    16384 events => throughput is 3.39E+05 events/s
 [COUNTERS] Fortran Unweight(LHE-I/O)      (  7 ) :    0.0691s for    16384 events => throughput is 2.37E+05 events/s
 [COUNTERS] Fortran SamplePutPoint         (  8 ) :    0.1276s for  1087437 events => throughput is 8.52E+06 events/s
 [COUNTERS] CudaCpp Initialise             ( 11 ) :    0.4718s
 [COUNTERS] CudaCpp Finalise               ( 12 ) :    0.0269s
 [COUNTERS] CudaCpp MEs                    ( 19 ) :    0.0357s for    16384 events => throughput is 4.59E+05 events/s
 [COUNTERS] TEST    SampleGetX             ( 21 ) :    2.3519s for 14136681 events => throughput is 6.01E+06 events/s
 [COUNTERS] OVERALL NON-MEs                ( 31 ) :    4.4251s
 [COUNTERS] OVERALL MEs                    ( 32 ) :    0.0357s for    16384 events => throughput is 4.59E+05 events/s

CUDACPP_RUNTIME_USECHRONOTIMERS=1 \
./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
 [COUNTERS] *** USING STD::CHRONO TIMERS (do not remove timer overhead) ***
 [COUNTERS] PROGRAM TOTAL                         :    5.2204s
 [COUNTERS] Fortran Other                  (  0 ) :    0.1550s
 [COUNTERS] Fortran Initialise(I/O)        (  1 ) :    0.0697s
 [COUNTERS] Fortran PhaseSpaceSampling     (  3 ) :    3.9335s for  1087437 events => throughput is 2.76E+05 events/s
 [COUNTERS] Fortran PDFs                   (  4 ) :    0.0924s for    32768 events => throughput is 3.55E+05 events/s
 [COUNTERS] Fortran UpdateScaleCouplings   (  5 ) :    0.1722s for    16384 events => throughput is 9.52E+04 events/s
 [COUNTERS] Fortran Reweight               (  6 ) :    0.0487s for    16384 events => throughput is 3.36E+05 events/s
 [COUNTERS] Fortran Unweight(LHE-I/O)      (  7 ) :    0.0689s for    16384 events => throughput is 2.38E+05 events/s
 [COUNTERS] Fortran SamplePutPoint         (  8 ) :    0.1401s for  1087437 events => throughput is 7.76E+06 events/s
 [COUNTERS] CudaCpp Initialise             ( 11 ) :    0.4779s
 [COUNTERS] CudaCpp Finalise               ( 12 ) :    0.0263s
 [COUNTERS] CudaCpp MEs                    ( 19 ) :    0.0358s for    16384 events => throughput is 4.58E+05 events/s
 [COUNTERS] TEST    SampleGetX             ( 21 ) :    2.8064s for 14136681 events => throughput is 5.04E+06 events/s
 [COUNTERS] OVERALL NON-MEs                ( 31 ) :    5.1846s
 [COUNTERS] OVERALL MEs                    ( 32 ) :    0.0358s for    16384 events => throughput is 4.58E+05 events/s

CUDACPP_RUNTIME_REMOVETIMEROVERHEAD=1 \
./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
 INFO: RdtscTimer overhead :    0.0179s for 1M start/stop cycles
 [COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD         :    4.4668s
 [COUNTERS] PROGRAM COUNTEROVERHEAD               :    0.2924s
 -------------------------------------------------------------
 [COUNTERS] *** USING RDTSC-BASED TIMERS (remove timer overhead) ***
 [COUNTERS] PROGRAM TOTAL                         :    4.1745s
 [COUNTERS] Fortran Other                  (  0 ) :    0.1190s
 [COUNTERS] Fortran Initialise(I/O)        (  1 ) :    0.0696s
 [COUNTERS] Fortran PhaseSpaceSampling     (  3 ) :    2.9612s for  1087437 events => throughput is 3.67E+05 events/s
 [COUNTERS] Fortran PDFs                   (  4 ) :    0.0913s for    32768 events => throughput is 3.59E+05 events/s
 [COUNTERS] Fortran UpdateScaleCouplings   (  5 ) :    0.1709s for    16384 events => throughput is 9.59E+04 events/s
 [COUNTERS] Fortran Reweight               (  6 ) :    0.0482s for    16384 events => throughput is 3.40E+05 events/s
 [COUNTERS] Fortran Unweight(LHE-I/O)      (  7 ) :    0.0678s for    16384 events => throughput is 2.42E+05 events/s
 [COUNTERS] Fortran SamplePutPoint         (  8 ) :    0.1125s for  1087437 events => throughput is 9.67E+06 events/s
 [COUNTERS] CudaCpp Initialise             ( 11 ) :    0.4716s
 [COUNTERS] CudaCpp Finalise               ( 12 ) :    0.0266s
 [COUNTERS] CudaCpp MEs                    ( 19 ) :    0.0358s for    16384 events => throughput is 4.58E+05 events/s
 [COUNTERS] TEST    SampleGetX             ( 21 ) :    2.0989s for 14136681 events => throughput is 6.74E+06 events/s
 [COUNTERS] OVERALL NON-MEs                ( 31 ) :    4.1387s
 [COUNTERS] OVERALL MEs                    ( 32 ) :    0.0358s for    16384 events => throughput is 4.58E+05 events/s

CUDACPP_RUNTIME_USECHRONOTIMERS=1 CUDACPP_RUNTIME_REMOVETIMEROVERHEAD=1 \
./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
 INFO: ChronoTimer overhead :    0.0489s for 1M start/stop cycles
 [COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD         :    5.2779s
 [COUNTERS] PROGRAM COUNTEROVERHEAD               :    0.7998s
 -------------------------------------------------------------
 [COUNTERS] *** USING STD::CHRONO TIMERS (remove timer overhead) ***
 [COUNTERS] PROGRAM TOTAL                         :    4.4781s
 [COUNTERS] Fortran Other                  (  0 ) :    0.1570s
 [COUNTERS] Fortran Initialise(I/O)        (  1 ) :    0.0669s
 [COUNTERS] Fortran PhaseSpaceSampling     (  3 ) :    3.2485s for  1087437 events => throughput is 3.35E+05 events/s
 [COUNTERS] Fortran PDFs                   (  4 ) :    0.0930s for    32768 events => throughput is 3.52E+05 events/s
 [COUNTERS] Fortran UpdateScaleCouplings   (  5 ) :    0.1716s for    16384 events => throughput is 9.55E+04 events/s
 [COUNTERS] Fortran Reweight               (  6 ) :    0.0474s for    16384 events => throughput is 3.46E+05 events/s
 [COUNTERS] Fortran Unweight(LHE-I/O)      (  7 ) :    0.0681s for    16384 events => throughput is 2.41E+05 events/s
 [COUNTERS] Fortran SamplePutPoint         (  8 ) :    0.0929s for  1087437 events => throughput is 1.17E+07 events/s
 [COUNTERS] CudaCpp Initialise             ( 11 ) :    0.4705s
 [COUNTERS] CudaCpp Finalise               ( 12 ) :    0.0266s
 [COUNTERS] CudaCpp MEs                    ( 19 ) :    0.0357s for    16384 events => throughput is 4.59E+05 events/s
 [COUNTERS] TEST    SampleGetX             ( 21 ) :    2.1629s for 14136681 events => throughput is 6.54E+06 events/s
 [COUNTERS] OVERALL NON-MEs                ( 31 ) :    4.4424s
 [COUNTERS] OVERALL MEs                    ( 32 ) :    0.0357s for    16384 events => throughput is 4.59E+05 events/s

CUDACPP_RUNTIME_REMOVETIMEROVERHEAD=1 CUDACPP_RUNTIME_DISABLECALLTIMERS=1 \
./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
 [COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD         :    3.8210s
 [COUNTERS] PROGRAM COUNTEROVERHEAD               :    0.0000s
 -------------------------------------------------------------
 [COUNTERS] *** USING RDTSC-BASED TIMERS (remove timer overhead) ***
 [COUNTERS] PROGRAM TOTAL                         :    3.8210s

CUDACPP_RUNTIME_USECHRONOTIMERS=1 CUDACPP_RUNTIME_REMOVETIMEROVERHEAD=1 CUDACPP_RUNTIME_DISABLECALLTIMERS=1 \
./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
 [COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD         :    3.8301s
 [COUNTERS] PROGRAM COUNTEROVERHEAD               :    0.0000s
 -------------------------------------------------------------
 [COUNTERS] *** USING STD::CHRONO TIMERS (remove timer overhead) ***
 [COUNTERS] PROGRAM TOTAL                         :    3.8301s
…s: this will be moved to counters alone

Revert "[prof] in gux_taptamggux.mad timer.h, add instead a getTotalOverheadSeconds() call and go back to the old getTotalDurationSeconds"
This reverts commit ad9b747.

Revert "[prof] in gux_taptamggux.mad timer.h, add the option to remove overhead from getTotalDurationSeconds calls"
This reverts commit 5c0a2ed.
…unter overhead (remove it from timer.h: there will be none for tiumermap.h)

Rename the env as CUDACPP_RUNTIME_REMOVECOUNTEROVERHEAD to make it clear that this is in the counters.cc infrastructure

These are the results

(1) keep overhead

./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
 [COUNTERS] *** USING RDTSC-BASED TIMERS (do not remove timer overhead) ***
 [COUNTERS] PROGRAM TOTAL                         :    4.5315s
 [COUNTERS] Fortran Other                  (  0 ) :    0.1198s
 [COUNTERS] Fortran Initialise(I/O)        (  1 ) :    0.0678s
 [COUNTERS] Fortran PhaseSpaceSampling     (  3 ) :    3.2691s for  1087437 events => throughput is 3.33E+05 events/s
 [COUNTERS] Fortran PDFs                   (  4 ) :    0.1044s for    32768 events => throughput is 3.14E+05 events/s
 [COUNTERS] Fortran UpdateScaleCouplings   (  5 ) :    0.1757s for    16384 events => throughput is 9.33E+04 events/s
 [COUNTERS] Fortran Reweight               (  6 ) :    0.0543s for    16384 events => throughput is 3.02E+05 events/s
 [COUNTERS] Fortran Unweight(LHE-I/O)      (  7 ) :    0.0731s for    16384 events => throughput is 2.24E+05 events/s
 [COUNTERS] Fortran SamplePutPoint         (  8 ) :    0.1322s for  1087437 events => throughput is 8.23E+06 events/s
 [COUNTERS] CudaCpp Initialise             ( 11 ) :    0.4719s
 [COUNTERS] CudaCpp Finalise               ( 12 ) :    0.0274s
 [COUNTERS] CudaCpp MEs                    ( 19 ) :    0.0358s for    16384 events => throughput is 4.57E+05 events/s
 [COUNTERS] TEST    SampleGetX             ( 21 ) :    2.3686s for 14136681 events => throughput is 5.97E+06 events/s
 [COUNTERS] OVERALL NON-MEs                ( 31 ) :    4.4957s
 [COUNTERS] OVERALL MEs                    ( 32 ) :    0.0358s for    16384 events => throughput is 4.57E+05 events/s

CUDACPP_RUNTIME_USECHRONOTIMERS=1 \
./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
 [COUNTERS] *** USING STD::CHRONO TIMERS (do not remove timer overhead) ***
 [COUNTERS] PROGRAM TOTAL                         :    5.2048s
 [COUNTERS] Fortran Other                  (  0 ) :    0.1559s
 [COUNTERS] Fortran Initialise(I/O)        (  1 ) :    0.0673s
 [COUNTERS] Fortran PhaseSpaceSampling     (  3 ) :    3.9265s for  1087437 events => throughput is 2.77E+05 events/s
 [COUNTERS] Fortran PDFs                   (  4 ) :    0.0993s for    32768 events => throughput is 3.30E+05 events/s
 [COUNTERS] Fortran UpdateScaleCouplings   (  5 ) :    0.1648s for    16384 events => throughput is 9.94E+04 events/s
 [COUNTERS] Fortran Reweight               (  6 ) :    0.0514s for    16384 events => throughput is 3.19E+05 events/s
 [COUNTERS] Fortran Unweight(LHE-I/O)      (  7 ) :    0.0700s for    16384 events => throughput is 2.34E+05 events/s
 [COUNTERS] Fortran SamplePutPoint         (  8 ) :    0.1365s for  1087437 events => throughput is 7.97E+06 events/s
 [COUNTERS] CudaCpp Initialise             ( 11 ) :    0.4711s
 [COUNTERS] CudaCpp Finalise               ( 12 ) :    0.0264s
 [COUNTERS] CudaCpp MEs                    ( 19 ) :    0.0357s for    16384 events => throughput is 4.59E+05 events/s
 [COUNTERS] TEST    SampleGetX             ( 21 ) :    2.8006s for 14136681 events => throughput is 5.05E+06 events/s
 [COUNTERS] OVERALL NON-MEs                ( 31 ) :    5.1691s
 [COUNTERS] OVERALL MEs                    ( 32 ) :    0.0357s for    16384 events => throughput is 4.59E+05 events/s

(2) remove overhead

CUDACPP_RUNTIME_REMOVECOUNTEROVERHEAD=1 \
./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
 INFO: COUNTERS overhead :    0.0331s for 1M start/stop cycles
 [COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD         :    4.5208s
 [COUNTERS] PROGRAM COUNTEROVERHEAD               :    0.5413s
 -------------------------------------------------------------
 [COUNTERS] *** USING RDTSC-BASED TIMERS (remove timer overhead) ***
 [COUNTERS] PROGRAM TOTAL                         :    3.9795s
 [COUNTERS] Fortran Other                  (  0 ) :    0.1548s
 [COUNTERS] Fortran Initialise(I/O)        (  1 ) :    0.0670s
 [COUNTERS] Fortran PhaseSpaceSampling     (  3 ) :    2.7547s for  1087437 events => throughput is 3.95E+05 events/s
 [COUNTERS] Fortran PDFs                   (  4 ) :    0.0988s for    32768 events => throughput is 3.32E+05 events/s
 [COUNTERS] Fortran UpdateScaleCouplings   (  5 ) :    0.1639s for    16384 events => throughput is 1.00E+05 events/s
 [COUNTERS] Fortran Reweight               (  6 ) :    0.0510s for    16384 events => throughput is 3.21E+05 events/s
 [COUNTERS] Fortran Unweight(LHE-I/O)      (  7 ) :    0.0674s for    16384 events => throughput is 2.43E+05 events/s
 [COUNTERS] Fortran SamplePutPoint         (  8 ) :    0.0898s for  1087437 events => throughput is 1.21E+07 events/s
 [COUNTERS] CudaCpp Initialise             ( 11 ) :    0.4700s
 [COUNTERS] CudaCpp Finalise               ( 12 ) :    0.0266s
 [COUNTERS] CudaCpp MEs                    ( 19 ) :    0.0356s for    16384 events => throughput is 4.60E+05 events/s
 [COUNTERS] TEST    SampleGetX             ( 21 ) :    1.8855s for 14136681 events => throughput is 7.50E+06 events/s
 [COUNTERS] OVERALL NON-MEs                ( 31 ) :    3.9439s
 [COUNTERS] OVERALL MEs                    ( 32 ) :    0.0356s for    16384 events => throughput is 4.60E+05 events/s

CUDACPP_RUNTIME_USECHRONOTIMERS=1 CUDACPP_RUNTIME_REMOVECOUNTEROVERHEAD=1 \
./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
 INFO: COUNTERS overhead :    0.0640s for 1M start/stop cycles
 [COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD         :    5.3491s
 [COUNTERS] PROGRAM COUNTEROVERHEAD               :    1.0455s
 -------------------------------------------------------------
 [COUNTERS] *** USING STD::CHRONO TIMERS (remove timer overhead) ***
 [COUNTERS] PROGRAM TOTAL                         :    4.3036s
 [COUNTERS] Fortran Other                  (  0 ) :    0.2216s
 [COUNTERS] Fortran Initialise(I/O)        (  1 ) :    0.0692s
 [COUNTERS] Fortran PhaseSpaceSampling     (  3 ) :    3.0230s for  1087437 events => throughput is 3.60E+05 events/s
 [COUNTERS] Fortran PDFs                   (  4 ) :    0.0992s for    32768 events => throughput is 3.30E+05 events/s
 [COUNTERS] Fortran UpdateScaleCouplings   (  5 ) :    0.1652s for    16384 events => throughput is 9.92E+04 events/s
 [COUNTERS] Fortran Reweight               (  6 ) :    0.0504s for    16384 events => throughput is 3.25E+05 events/s
 [COUNTERS] Fortran Unweight(LHE-I/O)      (  7 ) :    0.0684s for    16384 events => throughput is 2.39E+05 events/s
 [COUNTERS] Fortran SamplePutPoint         (  8 ) :    0.0716s for  1087437 events => throughput is 1.52E+07 events/s
 [COUNTERS] CudaCpp Initialise             ( 11 ) :    0.4727s
 [COUNTERS] CudaCpp Finalise               ( 12 ) :    0.0266s
 [COUNTERS] CudaCpp MEs                    ( 19 ) :    0.0357s for    16384 events => throughput is 4.59E+05 events/s
 [COUNTERS] TEST    SampleGetX             ( 21 ) :    1.9427s for 14136681 events => throughput is 7.28E+06 events/s
 [COUNTERS] OVERALL NON-MEs                ( 31 ) :    4.2679s
 [COUNTERS] OVERALL MEs                    ( 32 ) :    0.0357s for    16384 events => throughput is 4.59E+05 events/s

(3) remove overhead, disable individual timers (so here the overhead is 0)

CUDACPP_RUNTIME_REMOVECOUNTEROVERHEAD=1 CUDACPP_RUNTIME_DISABLECALLTIMERS=1 \
./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
 INFO: COUNTERS overhead :    0.0039s for 1M start/stop cycles
 [COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD         :    3.7998s
 [COUNTERS] PROGRAM COUNTEROVERHEAD               :    0.0000s
 -------------------------------------------------------------
 [COUNTERS] *** USING RDTSC-BASED TIMERS (remove timer overhead) ***
 [COUNTERS] PROGRAM TOTAL                         :    3.7998s

CUDACPP_RUNTIME_USECHRONOTIMERS=1 CUDACPP_RUNTIME_REMOVECOUNTEROVERHEAD=1 CUDACPP_RUNTIME_DISABLECALLTIMERS=1 \
./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
 INFO: COUNTERS overhead :    0.0038s for 1M start/stop cycles
 [COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD         :    3.9067s
 [COUNTERS] PROGRAM COUNTEROVERHEAD               :    0.0000s
 -------------------------------------------------------------
 [COUNTERS] *** USING STD::CHRONO TIMERS (remove timer overhead) ***
 [COUNTERS] PROGRAM TOTAL                         :    3.9067s
…ter overhead

These are the results

(1) keep overhead

./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
 [COUNTERS] *** USING RDTSC-BASED TIMERS (do not remove timer overhead) ***
 [COUNTERS] PROGRAM TOTAL                         :    4.4766s
 [COUNTERS] Fortran Other                  (  0 ) :    0.1202s
 [COUNTERS] Fortran Initialise(I/O)        (  1 ) :    0.0685s
 [COUNTERS] Fortran PhaseSpaceSampling     (  3 ) :    3.2400s for  1087437 events => throughput is 3.36E+05 events/s
 [COUNTERS] Fortran PDFs                   (  4 ) :    0.1007s for    32768 events => throughput is 3.25E+05 events/s
 [COUNTERS] Fortran UpdateScaleCouplings   (  5 ) :    0.1673s for    16384 events => throughput is 9.79E+04 events/s
 [COUNTERS] Fortran Reweight               (  6 ) :    0.0521s for    16384 events => throughput is 3.14E+05 events/s
 [COUNTERS] Fortran Unweight(LHE-I/O)      (  7 ) :    0.0687s for    16384 events => throughput is 2.38E+05 events/s
 [COUNTERS] Fortran SamplePutPoint         (  8 ) :    0.1237s for  1087437 events => throughput is 8.79E+06 events/s
 [COUNTERS] CudaCpp Initialise             ( 11 ) :    0.4728s
 [COUNTERS] CudaCpp Finalise               ( 12 ) :    0.0269s
 [COUNTERS] CudaCpp MEs                    ( 19 ) :    0.0357s for    16384 events => throughput is 4.59E+05 events/s
 [COUNTERS] TEST    SampleGetX             ( 21 ) :    2.3496s for 14136681 events => throughput is 6.02E+06 events/s
 [COUNTERS] OVERALL NON-MEs                ( 31 ) :    4.4409s
 [COUNTERS] OVERALL MEs                    ( 32 ) :    0.0357s for    16384 events => throughput is 4.59E+05 events/s

CUDACPP_RUNTIME_USECHRONOTIMERS=1 \
./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
 [COUNTERS] *** USING STD::CHRONO TIMERS (do not remove timer overhead) ***
 [COUNTERS] PROGRAM TOTAL                         :    5.3144s
 [COUNTERS] Fortran Other                  (  0 ) :    0.1588s
 [COUNTERS] Fortran Initialise(I/O)        (  1 ) :    0.0674s
 [COUNTERS] Fortran PhaseSpaceSampling     (  3 ) :    4.0191s for  1087437 events => throughput is 2.71E+05 events/s
 [COUNTERS] Fortran PDFs                   (  4 ) :    0.0996s for    32768 events => throughput is 3.29E+05 events/s
 [COUNTERS] Fortran UpdateScaleCouplings   (  5 ) :    0.1660s for    16384 events => throughput is 9.87E+04 events/s
 [COUNTERS] Fortran Reweight               (  6 ) :    0.0508s for    16384 events => throughput is 3.22E+05 events/s
 [COUNTERS] Fortran Unweight(LHE-I/O)      (  7 ) :    0.0704s for    16384 events => throughput is 2.33E+05 events/s
 [COUNTERS] Fortran SamplePutPoint         (  8 ) :    0.1482s for  1087437 events => throughput is 7.34E+06 events/s
 [COUNTERS] CudaCpp Initialise             ( 11 ) :    0.4718s
 [COUNTERS] CudaCpp Finalise               ( 12 ) :    0.0267s
 [COUNTERS] CudaCpp MEs                    ( 19 ) :    0.0357s for    16384 events => throughput is 4.59E+05 events/s
 [COUNTERS] TEST    SampleGetX             ( 21 ) :    2.8646s for 14136681 events => throughput is 4.94E+06 events/s
 [COUNTERS] OVERALL NON-MEs                ( 31 ) :    5.2787s
 [COUNTERS] OVERALL MEs                    ( 32 ) :    0.0357s for    16384 events => throughput is 4.59E+05 events/s

(2) remove overhead

CUDACPP_RUNTIME_REMOVECOUNTEROVERHEAD=1 \
./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
 INFO: COUNTERS overhead :    0.0338s for 1M start/stop cycles
 [COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD         :    4.8244s
 [COUNTERS] PROGRAM COUNTEROVERHEAD               :    0.8905s
 -------------------------------------------------------------
 [COUNTERS] *** USING RDTSC-BASED TIMERS (remove timer overhead) ***
 [COUNTERS] PROGRAM TOTAL                         :    3.9339s
 [COUNTERS] Fortran Other                  (  0 ) :    0.2954s
 [COUNTERS] Fortran Initialise(I/O)        (  1 ) :    0.0674s
 [COUNTERS] Fortran PhaseSpaceSampling     (  3 ) :    2.7332s for  1087437 events => throughput is 3.98E+05 events/s
 [COUNTERS] Fortran PDFs                   (  4 ) :    0.1003s for    32768 events => throughput is 3.27E+05 events/s
 [COUNTERS] Fortran UpdateScaleCouplings   (  5 ) :    0.1688s for    16384 events => throughput is 9.71E+04 events/s
 [COUNTERS] Fortran Reweight               (  6 ) :    0.0507s for    16384 events => throughput is 3.23E+05 events/s
 [COUNTERS] Fortran Unweight(LHE-I/O)      (  7 ) :    0.0695s for    16384 events => throughput is 2.36E+05 events/s
 [COUNTERS] Fortran SamplePutPoint         (  8 ) :    0.0924s for  1087437 events => throughput is 1.18E+07 events/s
 [COUNTERS] CudaCpp Initialise             ( 11 ) :    0.4692s
 [COUNTERS] CudaCpp Finalise               ( 12 ) :    0.0263s
 [COUNTERS] CudaCpp MEs                    ( 19 ) :    0.0357s for    16384 events => throughput is 4.59E+05 events/s
 [COUNTERS] TEST    SampleGetX             ( 21 ) :    1.8723s for 14136681 events => throughput is 7.55E+06 events/s
 [COUNTERS] OVERALL NON-MEs                ( 31 ) :    3.8982s
 [COUNTERS] OVERALL MEs                    ( 32 ) :    0.0357s for    16384 events => throughput is 4.59E+05 events/s

CUDACPP_RUNTIME_USECHRONOTIMERS=1 CUDACPP_RUNTIME_REMOVECOUNTEROVERHEAD=1 \
./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
 INFO: COUNTERS overhead :    0.0637s for 1M start/stop cycles
 [COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD         :    5.8826s
 [COUNTERS] PROGRAM COUNTEROVERHEAD               :    1.6786s
 -------------------------------------------------------------
 [COUNTERS] *** USING STD::CHRONO TIMERS (remove timer overhead) ***
 [COUNTERS] PROGRAM TOTAL                         :    4.2040s
 [COUNTERS] Fortran Other                  (  0 ) :    0.4831s
 [COUNTERS] Fortran Initialise(I/O)        (  1 ) :    0.0691s
 [COUNTERS] Fortran PhaseSpaceSampling     (  3 ) :    2.9924s for  1087437 events => throughput is 3.63E+05 events/s
 [COUNTERS] Fortran PDFs                   (  4 ) :    0.0983s for    32768 events => throughput is 3.33E+05 events/s
 [COUNTERS] Fortran UpdateScaleCouplings   (  5 ) :    0.1669s for    16384 events => throughput is 9.81E+04 events/s
 [COUNTERS] Fortran Reweight               (  6 ) :    0.0506s for    16384 events => throughput is 3.24E+05 events/s
 [COUNTERS] Fortran Unweight(LHE-I/O)      (  7 ) :    0.0676s for    16384 events => throughput is 2.42E+05 events/s
 [COUNTERS] Fortran SamplePutPoint         (  8 ) :    0.0698s for  1087437 events => throughput is 1.56E+07 events/s
 [COUNTERS] CudaCpp Initialise             ( 11 ) :    0.4712s
 [COUNTERS] CudaCpp Finalise               ( 12 ) :    0.0267s
 [COUNTERS] CudaCpp MEs                    ( 19 ) :    0.0350s for    16384 events => throughput is 4.68E+05 events/s
 [COUNTERS] TEST    SampleGetX             ( 21 ) :    1.9227s for 14136681 events => throughput is 7.35E+06 events/s
 [COUNTERS] OVERALL NON-MEs                ( 31 ) :    4.1690s
 [COUNTERS] OVERALL MEs                    ( 32 ) :    0.0350s for    16384 events => throughput is 4.68E+05 events/s

(3) remove overhead, disable individual timers (so here the overhead is 0)

CUDACPP_RUNTIME_REMOVECOUNTEROVERHEAD=1 CUDACPP_RUNTIME_DISABLECALLTIMERS=1 \
./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
 INFO: COUNTERS overhead :    0.0333s for 1M start/stop cycles
 [COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD         :    4.1897s
 [COUNTERS] PROGRAM COUNTEROVERHEAD               :    0.3330s
 -------------------------------------------------------------
 [COUNTERS] *** USING RDTSC-BASED TIMERS (remove timer overhead) ***
 [COUNTERS] PROGRAM TOTAL                         :    3.8567s

CUDACPP_RUNTIME_USECHRONOTIMERS=1 CUDACPP_RUNTIME_REMOVECOUNTEROVERHEAD=1 CUDACPP_RUNTIME_DISABLECALLTIMERS=1 \
./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
 INFO: COUNTERS overhead :    0.0659s for 1M start/stop cycles
 [COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD         :    4.5119s
 [COUNTERS] PROGRAM COUNTEROVERHEAD               :    0.6594s
 -------------------------------------------------------------
 [COUNTERS] *** USING STD::CHRONO TIMERS (remove timer overhead) ***
 [COUNTERS] PROGRAM TOTAL                         :    3.8525s

(4) do not remove overhead, disable individual timers (remove also the overhead from the estimation of the overhead)
(this test was done on another day on the same machine and build, but the results are compatible with the previous ones)

CUDACPP_RUNTIME_DISABLECALLTIMERS=1 \
./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
 [COUNTERS] *** USING RDTSC-BASED TIMERS (do not remove timer overhead) ***
 [COUNTERS] PROGRAM TOTAL                         :    3.8072s

CUDACPP_RUNTIME_USECHRONOTIMERS=1 CUDACPP_RUNTIME_DISABLECALLTIMERS=1 \
./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
 [COUNTERS] *** USING STD::CHRONO TIMERS (do not remove timer overhead) ***
 [COUNTERS] PROGRAM TOTAL                         :    3.8214s
…r merging

git checkout upstream/master $(git ls-tree --name-only upstream/master */CODEGEN*txt)
…Source/makefile madgraph5#980) into prof

(Checked that regenerating gg_tt.mad is all ok)
…r merging

git checkout upstream/master $(git ls-tree --name-only upstream/master */CODEGEN*txt)
…er merging

git checkout upstream/master $(git ls-tree --name-only upstream/master */CODEGEN*txt)
…adgraph5#980) into cmsdy

Fix conflicts:
- epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common (remove Source/makefile)
- epochX/cudacpp/CODEGEN/allGenerateAndCompare.sh (add processes from both branches)

(Checked that regenerating gg_tt.mad is ok)
…rce/makefile madgraph5#980) into cmsdyps

Fix conflicts:
- epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common (remove Source/makefile)

(NB regenerating gg_tt.mad is not ok: the newranmar.o is now missing)
…(avoid including Source/makefile in patch.common)
…ivalent (and Source/makefile could still be avoided in patch.common)

(I checked that gg_tt.mad can now build successfully)
…ier merging

git checkout upstream/master $(git ls-tree --name-only HEAD tput/logs* tmad/logs*)
…nerated code except gg_tt.mad for easier merging

git checkout upstream/master $(git ls-tree --name-only upstream/master *.mad/SubProcesses/P*/auto_dsig1.f | grep -v ^gg_tt.mad)
…dhel, for360) into prof

Fix conflicts:
- epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f (use upstream/master, will add back all counters as in prof)
- epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 (use upstream/master, will regenerate this)
- epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common (use upstream/master, will regenerate this)
…f branch before merging upstream/master (fix conflicts)
…pstream/master including june24, goodhel, for360

The only files that still need to be patched are
- 2 in patch.common: Source/dsample.f, SubProcesses/makefile
- 4 in patch.P1: auto_dsig1.f, auto_dsig.f, driver.f, matrix1.f

Note: this is 3 files more than those needed in upstream/master (added Source/dsample.f, auto_dsig1.f, auto_dsig.f)

./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch
git diff --no-ext-diff -R gg_tt.mad/SubProcesses/makefile gg_tt.mad/Source/dsample.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common
git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig.f gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1
git checkout gg_tt.mad

(Later checked that gg_tt.mad can be regenerated ok)
…' (including june24, goodhel, for360) into prof

Also add to the repo a few missing files in gux_taptamggux.mad and nobm_pp_ttW.mad
…ging

git checkout upstream/master $(git ls-tree --name-only upstream/master */CODEGEN*txt)
…ated code except gg_tt.mad for easier merging

git checkout upstream/master $(git ls-tree --name-only upstream/master *.mad/Source/dsample.f | grep -v ^gg_tt.mad)
…also amd and v1.00.01 fixes) into prof

Fix conflicts (use upstream/master version): epochX/cudacpp/gg_tt.mad/Source/dsample.f

Will then regenerate patches from this gg_tt.mad
…/master including v1.00.00 and also amd and v1.00.01 fixes

The only files that still need to be patched are
- 2 in patch.common: Source/dsample.f, SubProcesses/makefile
- 4 in patch.P1: auto_dsig1.f, auto_dsig.f, driver.f, matrix1.f

Note: this is 3 files more than those needed in upstream/master (added Source/dsample.f, auto_dsig1.f, auto_dsig.f)

./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch
git diff --no-ext-diff -R gg_tt.mad/SubProcesses/makefile gg_tt.mad/Source/dsample.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common
git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig.f gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1
git checkout gg_tt.mad

(Later checked that regenerating gg_tt.mad gives no change)
… v1.00.00 and with AMD and v1.00.01 fixes) into cmsdy

Fix conflicts:
- epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 (manual attempt, will regenerate anyway)
- epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common (manual attempt, will regenerate anyway)
- epochX/cudacpp/CODEGEN/recreateRefs.sh (use profs version)
…est prof (with upstream/master v1.00.00 and AMD/v1.00.01 fixes) into cmsdy

The only files that still need to be patched are
- 2 in patch.common: Source/dsample.f, SubProcesses/makefile
- 4 in patch.P1: auto_dsig1.f, auto_dsig.f, driver.f, matrix1.f

Note: this is 3 files more than those needed in upstream/master (added Source/dsample.f, auto_dsig1.f, auto_dsig.f)

./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch
git diff --no-ext-diff -R gg_tt.mad/SubProcesses/makefile gg_tt.mad/Source/dsample.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common
git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig.f gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1
git checkout gg_tt.mad
… plus AMD/v1.00.01 fixes; NOT grid) into cmsdyps

Fix conflicts:
- epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 (manual attempt, will regenerate anyway)
- epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common (manual attempt, will regenerate anyway)
- epochX/cudacpp/gg_tt.mad/CODEGEN_mad_gg_tt_log.txt (take cmsdy)
…atest cmsdy (including prof and master but not grid)

The only files that still need to be patched are
- 2 in patch.common: Source/dsample.f, SubProcesses/makefile
- 4 in patch.P1: auto_dsig1.f, auto_dsig.f, driver.f, matrix1.f

Note: this is 3 files more than those needed in upstream/master (added Source/dsample.f, auto_dsig1.f, auto_dsig.f)

./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch
git diff --no-ext-diff -R gg_tt.mad/SubProcesses/makefile gg_tt.mad/Source/dsample.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common
git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig.f gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1
git checkout gg_tt.mad
@valassi
Copy link
Member Author

valassi commented Oct 5, 2024

Now including the latest cmsdy #946 which includes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment