-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: studies on CMS DY with phase space optimizations #970
Draft
valassi
wants to merge
406
commits into
madgraph5:master
Choose a base branch
from
valassi:cmsdyps
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Now "Fortran Other" becomes negative again, there is again some double counting ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 0.7511s [COUNTERS] Fortran Other ( 0 ) : -0.0373s [COUNTERS] Fortran X2F ( 1 ) : 0.0168s for 16399 events => throughput is 1.02E-06 events/s [COUNTERS] Fortran PDF ( 2 ) : 0.0965s for 32768 events => throughput is 2.94E-06 events/s [COUNTERS] Fortran final_I/O ( 3 ) : 0.2598s for 16399 events => throughput is 1.58E-05 events/s [COUNTERS] CudaCpp HEL ( 5 ) : 0.0008s [COUNTERS] CudaCpp MEs ( 6 ) : 0.0868s for 16384 events => throughput is 5.30E-06 events/s [COUNTERS] Fortran initial_I/O ( 7 ) : 0.0670s [COUNTERS] PROGRAM sample_full ( 11 ) : 0.6811s [COUNTERS] Fortran TEST ( 12 ) : 0.0506s for 16384 events => throughput is 3.09E-06 events/s [COUNTERS] Fortran TEST2 ( 13 ) : 0.0099s for 16384 events => throughput is 6.01E-07 events/s [COUNTERS] Fortran TEST3 ( 14 ) : 0.0541s for 16384 events => throughput is 3.30E-06 events/s [COUNTERS] Fortran TEST5 ( 16 ) : 0.1462s for 16384 events => throughput is 8.93E-06 events/s
This makes it clearer that programtotal = samplefull + initialIO ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 0.7554s [COUNTERS] Fortran Other ( 0 ) : -0.0393s [COUNTERS] Fortran X2F ( 1 ) : 0.0171s for 16399 events => throughput is 1.04E-06 events/s [COUNTERS] Fortran PDF ( 2 ) : 0.0984s for 32768 events => throughput is 3.00E-06 events/s [COUNTERS] Fortran final_I/O ( 3 ) : 0.2621s for 16399 events => throughput is 1.60E-05 events/s [COUNTERS] CudaCpp HEL ( 5 ) : 0.0007s [COUNTERS] CudaCpp MEs ( 6 ) : 0.0872s for 16384 events => throughput is 5.32E-06 events/s [COUNTERS] Fortran initial_I/O ( 7 ) : 0.0688s [COUNTERS] Fortran TEST ( 12 ) : 0.0521s for 16384 events => throughput is 3.18E-06 events/s [COUNTERS] Fortran TEST2 ( 13 ) : 0.0100s for 16384 events => throughput is 6.08E-07 events/s [COUNTERS] Fortran TEST3 ( 14 ) : 0.0507s for 16384 events => throughput is 3.09E-06 events/s [COUNTERS] Fortran TEST5 ( 16 ) : 0.1478s for 16384 events => throughput is 9.02E-06 events/s [COUNTERS] PROGRAM initial_I/O ( 19 ) : 0.0688s [COUNTERS] PROGRAM sample_full ( 20 ) : 0.6838s
…grouping ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 0.7428s [COUNTERS] Fortran Other ( 0 ) : -0.0409s [COUNTERS] Fortran X2F ( 1 ) : 0.0169s for 16399 events => throughput is 1.03E-06 events/s [COUNTERS] Fortran PDF ( 2 ) : 0.0982s for 32768 events => throughput is 3.00E-06 events/s [COUNTERS] Fortran final_I/O ( 3 ) : 0.2585s for 16399 events => throughput is 1.58E-05 events/s [COUNTERS] CudaCpp HEL ( 5 ) : 0.0007s [COUNTERS] CudaCpp MEs ( 6 ) : 0.0865s for 16384 events => throughput is 5.28E-06 events/s [COUNTERS] Fortran initial_I/O ( 7 ) : 0.0670s [COUNTERS] Fortran grouping ( 12 ) : 0.0520s for 16384 events => throughput is 3.17E-06 events/s [COUNTERS] Fortran scale ( 13 ) : 0.0098s for 16384 events => throughput is 5.98E-07 events/s [COUNTERS] Fortran rewgt ( 14 ) : 0.0497s for 16384 events => throughput is 3.03E-06 events/s [COUNTERS] Fortran unwgt ( 16 ) : 0.1445s for 16384 events => throughput is 8.82E-06 events/s [COUNTERS] PROGRAM initial_I/O ( 19 ) : 0.0670s [COUNTERS] PROGRAM sample_full ( 20 ) : 0.6728s
…s, which was causing double counting and a negative Fortran Other The problem is that select_grouping_choice calls dsigproc, which eventually calls dsig1, which includes pdf profiling ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 0.7643s [COUNTERS] Fortran Other ( 0 ) : 0.0111s [COUNTERS] Fortran X2F ( 1 ) : 0.0164s for 16399 events => throughput is 9.98E-07 events/s [COUNTERS] Fortran PDF ( 2 ) : 0.1013s for 32768 events => throughput is 3.09E-06 events/s [COUNTERS] Fortran final_I/O ( 3 ) : 0.2712s for 16399 events => throughput is 1.65E-05 events/s [COUNTERS] CudaCpp HEL ( 5 ) : 0.0008s [COUNTERS] CudaCpp MEs ( 6 ) : 0.0874s for 16384 events => throughput is 5.34E-06 events/s [COUNTERS] Fortran initial_I/O ( 7 ) : 0.0663s [COUNTERS] Fortran scale ( 13 ) : 0.0103s for 16384 events => throughput is 6.26E-07 events/s [COUNTERS] Fortran rewgt ( 14 ) : 0.0511s for 16384 events => throughput is 3.12E-06 events/s [COUNTERS] Fortran unwgt ( 16 ) : 0.1484s for 16384 events => throughput is 9.06E-06 events/s [COUNTERS] PROGRAM initial_I/O ( 19 ) : 0.0663s [COUNTERS] PROGRAM sample_full ( 20 ) : 0.6950s
…sig1 (not only dsig1_vec), but it does not show up! - will revert ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 0.7479s [COUNTERS] Fortran Other ( 0 ) : 0.0122s [COUNTERS] Fortran X2F ( 1 ) : 0.0166s for 16399 events => throughput is 1.01E-06 events/s [COUNTERS] Fortran PDF ( 2 ) : 0.0974s for 32768 events => throughput is 2.97E-06 events/s [COUNTERS] Fortran final_I/O ( 3 ) : 0.2625s for 16399 events => throughput is 1.60E-05 events/s [COUNTERS] CudaCpp HEL ( 5 ) : 0.0007s [COUNTERS] CudaCpp MEs ( 6 ) : 0.0873s for 16384 events => throughput is 5.33E-06 events/s [COUNTERS] Fortran initial_I/O ( 7 ) : 0.0657s [COUNTERS] Fortran scale ( 13 ) : 0.0102s for 16384 events => throughput is 6.21E-07 events/s [COUNTERS] Fortran rewgt ( 14 ) : 0.0494s for 16384 events => throughput is 3.01E-06 events/s [COUNTERS] Fortran unwgt ( 16 ) : 0.1459s for 16384 events => throughput is 8.90E-06 events/s [COUNTERS] PROGRAM initial_I/O ( 19 ) : 0.0657s [COUNTERS] PROGRAM sample_full ( 20 ) : 0.6793s
Revert "[prof] in gg_tt.mad auto_dsig1.f, add profiling for matrix1 also in dsig1 (not only dsig1_vec), but it does not show up! - will revert" This reverts commit d3165cb.
…ble counting in x2f for instance) - small contribution, will revert The contribution is small because it does not make Fortran Other decrease... (while x2f increases due to profiling overhead) ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 0.7472s [COUNTERS] Fortran Other ( 0 ) : 0.0105s [COUNTERS] Fortran X2F ( 1 ) : 0.0212s for 16399 events => throughput is 1.29E-06 events/s [COUNTERS] Fortran PDF ( 2 ) : 0.0943s for 32768 events => throughput is 2.88E-06 events/s [COUNTERS] Fortran final_I/O ( 3 ) : 0.2576s for 16399 events => throughput is 1.57E-05 events/s [COUNTERS] CudaCpp HEL ( 5 ) : 0.0007s [COUNTERS] CudaCpp MEs ( 6 ) : 0.0860s for 16384 events => throughput is 5.25E-06 events/s [COUNTERS] Fortran initial_I/O ( 7 ) : 0.0638s [COUNTERS] Fortran ranmar ( 12 ) : 0.0057s for 114719 events => throughput is 4.93E-08 events/s [COUNTERS] Fortran scale ( 13 ) : 0.0098s for 16384 events => throughput is 6.00E-07 events/s [COUNTERS] Fortran rewgt ( 14 ) : 0.0508s for 16384 events => throughput is 3.10E-06 events/s [COUNTERS] Fortran unwgt ( 16 ) : 0.1470s for 16384 events => throughput is 8.97E-06 events/s [COUNTERS] PROGRAM initial_I/O ( 19 ) : 0.0638s [COUNTERS] PROGRAM sample_full ( 20 ) : 0.6805s
Revert "[prof] in gg_tt.mad, profile ranmar (in ranmar.f: but this causes double counting in x2f for instance) - small contribution, will revert" This reverts commit 59dbf04.
…st12 and test15): very large for cuda runs! ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 1.1881s [COUNTERS] Fortran Other ( 0 ) : 0.0114s [COUNTERS] Fortran X2F ( 1 ) : 0.0171s for 16399 events => throughput is 1.04E-06 events/s [COUNTERS] Fortran PDF ( 2 ) : 0.1026s for 32768 events => throughput is 3.13E-06 events/s [COUNTERS] Fortran final_I/O ( 3 ) : 0.3368s for 16399 events => throughput is 2.05E-05 events/s [COUNTERS] CudaCpp HEL ( 5 ) : 0.0011s [COUNTERS] CudaCpp MEs ( 6 ) : 0.0010s for 16384 events => throughput is 6.20E-08 events/s [COUNTERS] Fortran initial_I/O ( 7 ) : 0.0783s [COUNTERS] Fortran TEST12 ( 12 ) : 0.0256s [COUNTERS] Fortran scale ( 13 ) : 0.0104s for 16384 events => throughput is 6.37E-07 events/s [COUNTERS] Fortran rewgt ( 14 ) : 0.0512s for 16384 events => throughput is 3.12E-06 events/s [COUNTERS] Fortran TEST15 ( 15 ) : 0.4023s [COUNTERS] Fortran unwgt ( 16 ) : 0.1503s for 16384 events => throughput is 9.18E-06 events/s [COUNTERS] PROGRAM initial_I/O ( 19 ) : 0.0783s [COUNTERS] PROGRAM sample_full ( 20 ) : 0.6814s
…on, helicity calculation) and finalise (bridge deletion) timers Now "Fortran Other" is 1% of the total, will stop here and clean up the rest ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 1.1485s [COUNTERS] Fortran Other ( 0 ) : 0.0119s [COUNTERS] Fortran X2F ( 1 ) : 0.0174s for 16399 events => throughput is 1.06E-06 events/s [COUNTERS] Fortran PDF ( 2 ) : 0.1005s for 32768 events => throughput is 3.07E-06 events/s [COUNTERS] Fortran final_I/O ( 3 ) : 0.2783s for 16399 events => throughput is 1.70E-05 events/s [COUNTERS] CudaCpp initialise ( 5 ) : 0.4243s [COUNTERS] CudaCpp MEs ( 6 ) : 0.0010s for 16384 events => throughput is 6.20E-08 events/s [COUNTERS] Fortran initial_I/O ( 7 ) : 0.0733s [COUNTERS] CudaCpp finalise ( 8 ) : 0.0259s [COUNTERS] Fortran scale ( 13 ) : 0.0098s for 16384 events => throughput is 6.01E-07 events/s [COUNTERS] Fortran rewgt ( 14 ) : 0.0525s for 16384 events => throughput is 3.20E-06 events/s [COUNTERS] Fortran unwgt ( 16 ) : 0.1535s for 16384 events => throughput is 9.37E-06 events/s [COUNTERS] PROGRAM initial_I/O ( 19 ) : 0.0733s [COUNTERS] PROGRAM sample_full ( 20 ) : 0.6245s ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 0.7643s [COUNTERS] Fortran Other ( 0 ) : 0.0102s [COUNTERS] Fortran X2F ( 1 ) : 0.0167s for 16399 events => throughput is 1.02E-06 events/s [COUNTERS] Fortran PDF ( 2 ) : 0.0983s for 32768 events => throughput is 3.00E-06 events/s [COUNTERS] Fortran final_I/O ( 3 ) : 0.2644s for 16399 events => throughput is 1.61E-05 events/s [COUNTERS] CudaCpp initialise ( 5 ) : 0.0022s [COUNTERS] CudaCpp MEs ( 6 ) : 0.0919s for 16384 events => throughput is 5.61E-06 events/s [COUNTERS] Fortran initial_I/O ( 7 ) : 0.0659s [COUNTERS] CudaCpp finalise ( 8 ) : 0.0002s [COUNTERS] Fortran scale ( 13 ) : 0.0100s for 16384 events => throughput is 6.11E-07 events/s [COUNTERS] Fortran rewgt ( 14 ) : 0.0527s for 16384 events => throughput is 3.22E-06 events/s [COUNTERS] Fortran unwgt ( 16 ) : 0.1518s for 16384 events => throughput is 9.26E-06 events/s [COUNTERS] PROGRAM initial_I/O ( 19 ) : 0.0659s [COUNTERS] PROGRAM sample_full ( 20 ) : 0.6949s
… timers and the three TEST timers ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 1.0922s [COUNTERS] Fortran Other ( 0 ) : 0.0113s [COUNTERS] Fortran X2F ( 1 ) : 0.0168s for 16399 events => throughput is 1.02E-06 events/s [COUNTERS] Fortran PDF ( 2 ) : 0.0947s for 32768 events => throughput is 2.89E-06 events/s [COUNTERS] Fortran final_I/O ( 3 ) : 0.2625s for 16399 events => throughput is 1.60E-05 events/s [COUNTERS] CudaCpp initialise ( 5 ) : 0.4035s [COUNTERS] CudaCpp MEs ( 6 ) : 0.0010s for 16384 events => throughput is 6.07E-08 events/s [COUNTERS] Fortran initial_I/O ( 7 ) : 0.0703s [COUNTERS] CudaCpp finalise ( 8 ) : 0.0253s [COUNTERS] Fortran scale ( 13 ) : 0.0096s for 16384 events => throughput is 5.87E-07 events/s [COUNTERS] Fortran rewgt ( 14 ) : 0.0488s for 16384 events => throughput is 2.98E-06 events/s [COUNTERS] Fortran unwgt ( 16 ) : 0.1485s for 16384 events => throughput is 9.06E-06 events/s ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 0.7471s [COUNTERS] Fortran Other ( 0 ) : 0.0098s [COUNTERS] Fortran X2F ( 1 ) : 0.0168s for 16399 events => throughput is 1.02E-06 events/s [COUNTERS] Fortran PDF ( 2 ) : 0.0966s for 32768 events => throughput is 2.95E-06 events/s [COUNTERS] Fortran final_I/O ( 3 ) : 0.2632s for 16399 events => throughput is 1.60E-05 events/s [COUNTERS] CudaCpp initialise ( 5 ) : 0.0023s [COUNTERS] CudaCpp MEs ( 6 ) : 0.0854s for 16384 events => throughput is 5.21E-06 events/s [COUNTERS] Fortran initial_I/O ( 7 ) : 0.0656s [COUNTERS] CudaCpp finalise ( 8 ) : 0.0002s [COUNTERS] Fortran scale ( 13 ) : 0.0097s for 16384 events => throughput is 5.93E-07 events/s [COUNTERS] Fortran rewgt ( 14 ) : 0.0497s for 16384 events => throughput is 3.03E-06 events/s [COUNTERS] Fortran unwgt ( 16 ) : 0.1479s for 16384 events => throughput is 9.03E-06 events/s
…d order ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 1.1034s [COUNTERS] Fortran Other ( 0 ) : 0.0111s [COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0716s [COUNTERS] Fortran Random2Momenta ( 3 ) : 0.0170s for 16399 events => throughput is 1.03E-06 events/s [COUNTERS] Fortran PDFs ( 4 ) : 0.0989s for 32768 events => throughput is 3.02E-06 events/s [COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.0102s for 16384 events => throughput is 6.20E-07 events/s [COUNTERS] Fortran Reweight ( 6 ) : 0.0511s for 16384 events => throughput is 3.12E-06 events/s [COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.1456s for 16384 events => throughput is 8.89E-06 events/s [COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.2672s for 16399 events => throughput is 1.63E-05 events/s [COUNTERS] CudaCpp Initialise ( 11 ) : 0.4048s [COUNTERS] CudaCpp Finalise ( 12 ) : 0.0250s [COUNTERS] CudaCpp MEs ( 19 ) : 0.0010s for 16384 events => throughput is 6.07E-08 events/s ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 0.7943s [COUNTERS] Fortran Other ( 0 ) : 0.0111s [COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0685s [COUNTERS] Fortran Random2Momenta ( 3 ) : 0.0171s for 16399 events => throughput is 1.04E-06 events/s [COUNTERS] Fortran PDFs ( 4 ) : 0.1047s for 32768 events => throughput is 3.20E-06 events/s [COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.0105s for 16384 events => throughput is 6.39E-07 events/s [COUNTERS] Fortran Reweight ( 6 ) : 0.0536s for 16384 events => throughput is 3.27E-06 events/s [COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.1569s for 16384 events => throughput is 9.58E-06 events/s [COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.2773s for 16399 events => throughput is 1.69E-05 events/s [COUNTERS] CudaCpp Initialise ( 11 ) : 0.0023s [COUNTERS] CudaCpp Finalise ( 12 ) : 0.0002s [COUNTERS] CudaCpp MEs ( 19 ) : 0.0921s for 16384 events => throughput is 5.62E-06 events/s
…Es" counters ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 1.0988s [COUNTERS] Fortran Other ( 0 ) : 0.0117s [COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0697s [COUNTERS] Fortran Random2Momenta ( 3 ) : 0.0167s for 16399 events => throughput is 1.02E-06 events/s [COUNTERS] Fortran PDFs ( 4 ) : 0.0910s for 32768 events => throughput is 2.78E-06 events/s [COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.0098s for 16384 events => throughput is 5.99E-07 events/s [COUNTERS] Fortran Reweight ( 6 ) : 0.0473s for 16384 events => throughput is 2.89E-06 events/s [COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.1488s for 16384 events => throughput is 9.08E-06 events/s [COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.2702s for 16399 events => throughput is 1.65E-05 events/s [COUNTERS] CudaCpp Initialise ( 11 ) : 0.4077s [COUNTERS] CudaCpp Finalise ( 12 ) : 0.0250s [COUNTERS] CudaCpp MEs ( 19 ) : 0.0010s for 16384 events => throughput is 6.02E-08 events/s [COUNTERS] OVERALL NON-MEs ( 21 ) : 1.0979s [COUNTERS] OVERALL MEs ( 22 ) : 0.0010s for 16384 events => throughput is 6.02E-08 events/s ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 0.7378s [COUNTERS] Fortran Other ( 0 ) : 0.0097s [COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0654s [COUNTERS] Fortran Random2Momenta ( 3 ) : 0.0166s for 16399 events => throughput is 1.01E-06 events/s [COUNTERS] Fortran PDFs ( 4 ) : 0.0924s for 32768 events => throughput is 2.82E-06 events/s [COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.0096s for 16384 events => throughput is 5.88E-07 events/s [COUNTERS] Fortran Reweight ( 6 ) : 0.0465s for 16384 events => throughput is 2.84E-06 events/s [COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.1475s for 16384 events => throughput is 9.00E-06 events/s [COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.2621s for 16399 events => throughput is 1.60E-05 events/s [COUNTERS] CudaCpp Initialise ( 11 ) : 0.0022s [COUNTERS] CudaCpp Finalise ( 12 ) : 0.0002s [COUNTERS] CudaCpp MEs ( 19 ) : 0.0857s for 16384 events => throughput is 5.23E-06 events/s [COUNTERS] OVERALL NON-MEs ( 21 ) : 0.6521s [COUNTERS] OVERALL MEs ( 22 ) : 0.0857s for 16384 events => throughput is 5.23E-06 events/s
…E_DISABLECOUNTERS to disable individual counters CUDACPP_RUNTIME_DISABLECOUNTERS=1 ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp CUDACPP_RUNTIME_DISABLECOUNTERS=1 ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 1.0898s CUDACPP_RUNTIME_DISABLECOUNTERS=1 ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp CUDACPP_RUNTIME_DISABLECOUNTERS=1 ./build.none_d_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL : 0.7309s
…or MEs counters - will revert
Revert "[prof] in gg_tt.mad counters.cc, consider printing throughputs only for MEs counters - will revert" This reverts commit 5b24462.
…ounters: must include again dsample.f and auto_dsig.f in patches The only files that still need to be patched are - 4 in patch.common: Source/makefile, Source/genps.inc, Source/dsample.f, SubProcesses/makefile - 4 in patch.P1: auto_dsig1.f, auto_dsig.f, driver.f, matrix1.f ./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch git diff --no-ext-diff -R gg_tt.mad/Source/makefile gg_tt.mad/Source/genps.inc gg_tt.mad/SubProcesses/makefile gg_tt.mad/Source/dsample.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig.f gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 git checkout gg_tt.mad
git checkout prof $(git ls-tree --name-only prof */CODEGEN*txt)
… doing in cmsdy... but ok)
Revert "[cmsdy] regenerate all processes (ops this is not what I was normally doing in cmsdy... but ok)" This reverts commit 7122b9c.
…cluding the latest counters CUDACPP_RUNTIME_DISABLEFPE=1 ./tlau/lauX.sh -fortran pp_dy3j.mad -togridpack
…to the new counters
…ld91 CUDACPP_RUNTIME_DISABLEFPE=1 ./tlau/lauX.sh -nomakeclean -ALL pp_dy3j.mad -fromgridpack pp_dy3j.mad//fortran/output.txt [GridPackCmd.launch] GRIDPCK TOTAL 447.7169 seconds [madevent COUNTERS] PROGRAM TOTAL 443.48 [madevent COUNTERS] Fortran Other 6.5439 [madevent COUNTERS] Fortran Initialise(I/O) 4.4648 [madevent COUNTERS] Fortran Random2Momenta 93.2692 [madevent COUNTERS] Fortran PDFs 8.2697 [madevent COUNTERS] Fortran UpdateScaleCouplings 7.3142 [madevent COUNTERS] Fortran Reweight 3.6975 [madevent COUNTERS] Fortran Unweight(LHE-I/O) 4.8636 [madevent COUNTERS] Fortran SamplePutPoint 8.3255 [madevent COUNTERS] Fortran MEs 306.731 [madevent COUNTERS] OVERALL NON-MEs 136.748 [madevent COUNTERS] OVERALL MEs 306.731 -------------------------------------------------------------------------------- pp_dy3j.mad//cppnone/output.txt [GridPackCmd.launch] GRIDPCK TOTAL 448.1598 seconds [madevent COUNTERS] PROGRAM TOTAL 443.898 [madevent COUNTERS] Fortran Other 6.5229 [madevent COUNTERS] Fortran Initialise(I/O) 4.4882 [madevent COUNTERS] Fortran Random2Momenta 93.1882 [madevent COUNTERS] Fortran PDFs 8.2912 [madevent COUNTERS] Fortran UpdateScaleCouplings 7.282 [madevent COUNTERS] Fortran Reweight 3.703 [madevent COUNTERS] Fortran Unweight(LHE-I/O) 4.8634 [madevent COUNTERS] Fortran SamplePutPoint 8.2912 [madevent COUNTERS] CudaCpp Initialise 1.1875 [madevent COUNTERS] CudaCpp Finalise 0.0215 [madevent COUNTERS] CudaCpp MEs 306.061 [madevent COUNTERS] OVERALL NON-MEs 137.837 [madevent COUNTERS] OVERALL MEs 306.061 -------------------------------------------------------------------------------- pp_dy3j.mad//cppsse4/output.txt [GridPackCmd.launch] GRIDPCK TOTAL 295.7847 seconds [madevent COUNTERS] PROGRAM TOTAL 291.523 [madevent COUNTERS] Fortran Other 6.5192 [madevent COUNTERS] Fortran Initialise(I/O) 4.4843 [madevent COUNTERS] Fortran Random2Momenta 93.2004 [madevent COUNTERS] Fortran PDFs 8.2944 [madevent COUNTERS] Fortran UpdateScaleCouplings 7.2823 [madevent COUNTERS] Fortran Reweight 3.7002 [madevent COUNTERS] Fortran Unweight(LHE-I/O) 4.8692 [madevent COUNTERS] Fortran SamplePutPoint 8.285 [madevent COUNTERS] CudaCpp Initialise 0.7227 [madevent COUNTERS] CudaCpp Finalise 0.0218 [madevent COUNTERS] CudaCpp MEs 154.147 [madevent COUNTERS] OVERALL NON-MEs 137.375 [madevent COUNTERS] OVERALL MEs 154.147 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- pp_dy3j.mad//cppavx2/output.txt [GridPackCmd.launch] GRIDPCK TOTAL 204.7001 seconds [madevent COUNTERS] PROGRAM TOTAL 200.453 [madevent COUNTERS] Fortran Other 6.5698 [madevent COUNTERS] Fortran Initialise(I/O) 4.4868 [madevent COUNTERS] Fortran Random2Momenta 93.2293 [madevent COUNTERS] Fortran PDFs 8.2934 [madevent COUNTERS] Fortran UpdateScaleCouplings 7.3016 [madevent COUNTERS] Fortran Reweight 3.7048 [madevent COUNTERS] Fortran Unweight(LHE-I/O) 4.8693 [madevent COUNTERS] Fortran SamplePutPoint 8.2838 [madevent COUNTERS] CudaCpp Initialise 0.445 [madevent COUNTERS] CudaCpp Finalise 0.0217 [madevent COUNTERS] CudaCpp MEs 63.25 [madevent COUNTERS] OVERALL NON-MEs 137.203 [madevent COUNTERS] OVERALL MEs 63.25 -------------------------------------------------------------------------------- pp_dy3j.mad//cpp512y/output.txt [GridPackCmd.launch] GRIDPCK TOTAL 201.0406 seconds [madevent COUNTERS] PROGRAM TOTAL 196.745 [madevent COUNTERS] Fortran Other 6.6031 [madevent COUNTERS] Fortran Initialise(I/O) 4.4911 [madevent COUNTERS] Fortran Random2Momenta 93.3414 [madevent COUNTERS] Fortran PDFs 8.2981 [madevent COUNTERS] Fortran UpdateScaleCouplings 7.2869 [madevent COUNTERS] Fortran Reweight 3.7033 [madevent COUNTERS] Fortran Unweight(LHE-I/O) 4.8707 [madevent COUNTERS] Fortran SamplePutPoint 8.2976 [madevent COUNTERS] CudaCpp Initialise 0.4341 [madevent COUNTERS] CudaCpp Finalise 0.0217 [madevent COUNTERS] CudaCpp MEs 59.3994 [madevent COUNTERS] OVERALL NON-MEs 137.345 [madevent COUNTERS] OVERALL MEs 59.3994 -------------------------------------------------------------------------------- pp_dy3j.mad//cpp512z/output.txt [GridPackCmd.launch] GRIDPCK TOTAL 176.8891 seconds [madevent COUNTERS] PROGRAM TOTAL 172.637 [madevent COUNTERS] Fortran Other 6.5768 [madevent COUNTERS] Fortran Initialise(I/O) 4.486 [madevent COUNTERS] Fortran Random2Momenta 93.2907 [madevent COUNTERS] Fortran PDFs 8.2998 [madevent COUNTERS] Fortran UpdateScaleCouplings 7.2827 [madevent COUNTERS] Fortran Reweight 3.7045 [madevent COUNTERS] Fortran Unweight(LHE-I/O) 4.8719 [madevent COUNTERS] Fortran SamplePutPoint 8.2892 [madevent COUNTERS] CudaCpp Initialise 0.3619 [madevent COUNTERS] CudaCpp Finalise 0.0221 [madevent COUNTERS] CudaCpp MEs 35.4557 [madevent COUNTERS] OVERALL NON-MEs 137.181 [madevent COUNTERS] OVERALL MEs 35.4557 -------------------------------------------------------------------------------- pp_dy3j.mad//cuda/output.txt File not found: SKIP backend cuda -------------------------------------------------------------------------------- pp_dy3j.mad//hip/output.txt File not found: SKIP backend hip --------------------------------------------------------------------------------
…txt file, and add pp_dy3j.mad/summary.txt madgraph5#943
…_dy3j.mad/summary.txt madgraph5#943
…SpaceSampling These are the first results where timer overhead is removed: looks nice, but the overhead should be computed in the counters.cc calls rather than in the individual timers (this would also make more sense with respect to timermap.h where this will not be possible - remane the env, too) ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] *** USING RDTSC-BASED TIMERS (do not remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 4.4608s [COUNTERS] Fortran Other ( 0 ) : 0.1171s [COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0690s [COUNTERS] Fortran PhaseSpaceSampling ( 3 ) : 3.2317s for 1087437 events => throughput is 3.36E+05 events/s [COUNTERS] Fortran PDFs ( 4 ) : 0.0917s for 32768 events => throughput is 3.57E+05 events/s [COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1719s for 16384 events => throughput is 9.53E+04 events/s [COUNTERS] Fortran Reweight ( 6 ) : 0.0483s for 16384 events => throughput is 3.39E+05 events/s [COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0691s for 16384 events => throughput is 2.37E+05 events/s [COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.1276s for 1087437 events => throughput is 8.52E+06 events/s [COUNTERS] CudaCpp Initialise ( 11 ) : 0.4718s [COUNTERS] CudaCpp Finalise ( 12 ) : 0.0269s [COUNTERS] CudaCpp MEs ( 19 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s [COUNTERS] TEST SampleGetX ( 21 ) : 2.3519s for 14136681 events => throughput is 6.01E+06 events/s [COUNTERS] OVERALL NON-MEs ( 31 ) : 4.4251s [COUNTERS] OVERALL MEs ( 32 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s CUDACPP_RUNTIME_USECHRONOTIMERS=1 \ ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] *** USING STD::CHRONO TIMERS (do not remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 5.2204s [COUNTERS] Fortran Other ( 0 ) : 0.1550s [COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0697s [COUNTERS] Fortran PhaseSpaceSampling ( 3 ) : 3.9335s for 1087437 events => throughput is 2.76E+05 events/s [COUNTERS] Fortran PDFs ( 4 ) : 0.0924s for 32768 events => throughput is 3.55E+05 events/s [COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1722s for 16384 events => throughput is 9.52E+04 events/s [COUNTERS] Fortran Reweight ( 6 ) : 0.0487s for 16384 events => throughput is 3.36E+05 events/s [COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0689s for 16384 events => throughput is 2.38E+05 events/s [COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.1401s for 1087437 events => throughput is 7.76E+06 events/s [COUNTERS] CudaCpp Initialise ( 11 ) : 0.4779s [COUNTERS] CudaCpp Finalise ( 12 ) : 0.0263s [COUNTERS] CudaCpp MEs ( 19 ) : 0.0358s for 16384 events => throughput is 4.58E+05 events/s [COUNTERS] TEST SampleGetX ( 21 ) : 2.8064s for 14136681 events => throughput is 5.04E+06 events/s [COUNTERS] OVERALL NON-MEs ( 31 ) : 5.1846s [COUNTERS] OVERALL MEs ( 32 ) : 0.0358s for 16384 events => throughput is 4.58E+05 events/s CUDACPP_RUNTIME_REMOVETIMEROVERHEAD=1 \ ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp INFO: RdtscTimer overhead : 0.0179s for 1M start/stop cycles [COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD : 4.4668s [COUNTERS] PROGRAM COUNTEROVERHEAD : 0.2924s ------------------------------------------------------------- [COUNTERS] *** USING RDTSC-BASED TIMERS (remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 4.1745s [COUNTERS] Fortran Other ( 0 ) : 0.1190s [COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0696s [COUNTERS] Fortran PhaseSpaceSampling ( 3 ) : 2.9612s for 1087437 events => throughput is 3.67E+05 events/s [COUNTERS] Fortran PDFs ( 4 ) : 0.0913s for 32768 events => throughput is 3.59E+05 events/s [COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1709s for 16384 events => throughput is 9.59E+04 events/s [COUNTERS] Fortran Reweight ( 6 ) : 0.0482s for 16384 events => throughput is 3.40E+05 events/s [COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0678s for 16384 events => throughput is 2.42E+05 events/s [COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.1125s for 1087437 events => throughput is 9.67E+06 events/s [COUNTERS] CudaCpp Initialise ( 11 ) : 0.4716s [COUNTERS] CudaCpp Finalise ( 12 ) : 0.0266s [COUNTERS] CudaCpp MEs ( 19 ) : 0.0358s for 16384 events => throughput is 4.58E+05 events/s [COUNTERS] TEST SampleGetX ( 21 ) : 2.0989s for 14136681 events => throughput is 6.74E+06 events/s [COUNTERS] OVERALL NON-MEs ( 31 ) : 4.1387s [COUNTERS] OVERALL MEs ( 32 ) : 0.0358s for 16384 events => throughput is 4.58E+05 events/s CUDACPP_RUNTIME_USECHRONOTIMERS=1 CUDACPP_RUNTIME_REMOVETIMEROVERHEAD=1 \ ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp INFO: ChronoTimer overhead : 0.0489s for 1M start/stop cycles [COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD : 5.2779s [COUNTERS] PROGRAM COUNTEROVERHEAD : 0.7998s ------------------------------------------------------------- [COUNTERS] *** USING STD::CHRONO TIMERS (remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 4.4781s [COUNTERS] Fortran Other ( 0 ) : 0.1570s [COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0669s [COUNTERS] Fortran PhaseSpaceSampling ( 3 ) : 3.2485s for 1087437 events => throughput is 3.35E+05 events/s [COUNTERS] Fortran PDFs ( 4 ) : 0.0930s for 32768 events => throughput is 3.52E+05 events/s [COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1716s for 16384 events => throughput is 9.55E+04 events/s [COUNTERS] Fortran Reweight ( 6 ) : 0.0474s for 16384 events => throughput is 3.46E+05 events/s [COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0681s for 16384 events => throughput is 2.41E+05 events/s [COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.0929s for 1087437 events => throughput is 1.17E+07 events/s [COUNTERS] CudaCpp Initialise ( 11 ) : 0.4705s [COUNTERS] CudaCpp Finalise ( 12 ) : 0.0266s [COUNTERS] CudaCpp MEs ( 19 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s [COUNTERS] TEST SampleGetX ( 21 ) : 2.1629s for 14136681 events => throughput is 6.54E+06 events/s [COUNTERS] OVERALL NON-MEs ( 31 ) : 4.4424s [COUNTERS] OVERALL MEs ( 32 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s CUDACPP_RUNTIME_REMOVETIMEROVERHEAD=1 CUDACPP_RUNTIME_DISABLECALLTIMERS=1 \ ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD : 3.8210s [COUNTERS] PROGRAM COUNTEROVERHEAD : 0.0000s ------------------------------------------------------------- [COUNTERS] *** USING RDTSC-BASED TIMERS (remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 3.8210s CUDACPP_RUNTIME_USECHRONOTIMERS=1 CUDACPP_RUNTIME_REMOVETIMEROVERHEAD=1 CUDACPP_RUNTIME_DISABLECALLTIMERS=1 \ ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD : 3.8301s [COUNTERS] PROGRAM COUNTEROVERHEAD : 0.0000s ------------------------------------------------------------- [COUNTERS] *** USING STD::CHRONO TIMERS (remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 3.8301s
…s: this will be moved to counters alone Revert "[prof] in gux_taptamggux.mad timer.h, add instead a getTotalOverheadSeconds() call and go back to the old getTotalDurationSeconds" This reverts commit ad9b747. Revert "[prof] in gux_taptamggux.mad timer.h, add the option to remove overhead from getTotalDurationSeconds calls" This reverts commit 5c0a2ed.
…unter overhead (remove it from timer.h: there will be none for tiumermap.h) Rename the env as CUDACPP_RUNTIME_REMOVECOUNTEROVERHEAD to make it clear that this is in the counters.cc infrastructure These are the results (1) keep overhead ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] *** USING RDTSC-BASED TIMERS (do not remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 4.5315s [COUNTERS] Fortran Other ( 0 ) : 0.1198s [COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0678s [COUNTERS] Fortran PhaseSpaceSampling ( 3 ) : 3.2691s for 1087437 events => throughput is 3.33E+05 events/s [COUNTERS] Fortran PDFs ( 4 ) : 0.1044s for 32768 events => throughput is 3.14E+05 events/s [COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1757s for 16384 events => throughput is 9.33E+04 events/s [COUNTERS] Fortran Reweight ( 6 ) : 0.0543s for 16384 events => throughput is 3.02E+05 events/s [COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0731s for 16384 events => throughput is 2.24E+05 events/s [COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.1322s for 1087437 events => throughput is 8.23E+06 events/s [COUNTERS] CudaCpp Initialise ( 11 ) : 0.4719s [COUNTERS] CudaCpp Finalise ( 12 ) : 0.0274s [COUNTERS] CudaCpp MEs ( 19 ) : 0.0358s for 16384 events => throughput is 4.57E+05 events/s [COUNTERS] TEST SampleGetX ( 21 ) : 2.3686s for 14136681 events => throughput is 5.97E+06 events/s [COUNTERS] OVERALL NON-MEs ( 31 ) : 4.4957s [COUNTERS] OVERALL MEs ( 32 ) : 0.0358s for 16384 events => throughput is 4.57E+05 events/s CUDACPP_RUNTIME_USECHRONOTIMERS=1 \ ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] *** USING STD::CHRONO TIMERS (do not remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 5.2048s [COUNTERS] Fortran Other ( 0 ) : 0.1559s [COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0673s [COUNTERS] Fortran PhaseSpaceSampling ( 3 ) : 3.9265s for 1087437 events => throughput is 2.77E+05 events/s [COUNTERS] Fortran PDFs ( 4 ) : 0.0993s for 32768 events => throughput is 3.30E+05 events/s [COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1648s for 16384 events => throughput is 9.94E+04 events/s [COUNTERS] Fortran Reweight ( 6 ) : 0.0514s for 16384 events => throughput is 3.19E+05 events/s [COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0700s for 16384 events => throughput is 2.34E+05 events/s [COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.1365s for 1087437 events => throughput is 7.97E+06 events/s [COUNTERS] CudaCpp Initialise ( 11 ) : 0.4711s [COUNTERS] CudaCpp Finalise ( 12 ) : 0.0264s [COUNTERS] CudaCpp MEs ( 19 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s [COUNTERS] TEST SampleGetX ( 21 ) : 2.8006s for 14136681 events => throughput is 5.05E+06 events/s [COUNTERS] OVERALL NON-MEs ( 31 ) : 5.1691s [COUNTERS] OVERALL MEs ( 32 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s (2) remove overhead CUDACPP_RUNTIME_REMOVECOUNTEROVERHEAD=1 \ ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp INFO: COUNTERS overhead : 0.0331s for 1M start/stop cycles [COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD : 4.5208s [COUNTERS] PROGRAM COUNTEROVERHEAD : 0.5413s ------------------------------------------------------------- [COUNTERS] *** USING RDTSC-BASED TIMERS (remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 3.9795s [COUNTERS] Fortran Other ( 0 ) : 0.1548s [COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0670s [COUNTERS] Fortran PhaseSpaceSampling ( 3 ) : 2.7547s for 1087437 events => throughput is 3.95E+05 events/s [COUNTERS] Fortran PDFs ( 4 ) : 0.0988s for 32768 events => throughput is 3.32E+05 events/s [COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1639s for 16384 events => throughput is 1.00E+05 events/s [COUNTERS] Fortran Reweight ( 6 ) : 0.0510s for 16384 events => throughput is 3.21E+05 events/s [COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0674s for 16384 events => throughput is 2.43E+05 events/s [COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.0898s for 1087437 events => throughput is 1.21E+07 events/s [COUNTERS] CudaCpp Initialise ( 11 ) : 0.4700s [COUNTERS] CudaCpp Finalise ( 12 ) : 0.0266s [COUNTERS] CudaCpp MEs ( 19 ) : 0.0356s for 16384 events => throughput is 4.60E+05 events/s [COUNTERS] TEST SampleGetX ( 21 ) : 1.8855s for 14136681 events => throughput is 7.50E+06 events/s [COUNTERS] OVERALL NON-MEs ( 31 ) : 3.9439s [COUNTERS] OVERALL MEs ( 32 ) : 0.0356s for 16384 events => throughput is 4.60E+05 events/s CUDACPP_RUNTIME_USECHRONOTIMERS=1 CUDACPP_RUNTIME_REMOVECOUNTEROVERHEAD=1 \ ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp INFO: COUNTERS overhead : 0.0640s for 1M start/stop cycles [COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD : 5.3491s [COUNTERS] PROGRAM COUNTEROVERHEAD : 1.0455s ------------------------------------------------------------- [COUNTERS] *** USING STD::CHRONO TIMERS (remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 4.3036s [COUNTERS] Fortran Other ( 0 ) : 0.2216s [COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0692s [COUNTERS] Fortran PhaseSpaceSampling ( 3 ) : 3.0230s for 1087437 events => throughput is 3.60E+05 events/s [COUNTERS] Fortran PDFs ( 4 ) : 0.0992s for 32768 events => throughput is 3.30E+05 events/s [COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1652s for 16384 events => throughput is 9.92E+04 events/s [COUNTERS] Fortran Reweight ( 6 ) : 0.0504s for 16384 events => throughput is 3.25E+05 events/s [COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0684s for 16384 events => throughput is 2.39E+05 events/s [COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.0716s for 1087437 events => throughput is 1.52E+07 events/s [COUNTERS] CudaCpp Initialise ( 11 ) : 0.4727s [COUNTERS] CudaCpp Finalise ( 12 ) : 0.0266s [COUNTERS] CudaCpp MEs ( 19 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s [COUNTERS] TEST SampleGetX ( 21 ) : 1.9427s for 14136681 events => throughput is 7.28E+06 events/s [COUNTERS] OVERALL NON-MEs ( 31 ) : 4.2679s [COUNTERS] OVERALL MEs ( 32 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s (3) remove overhead, disable individual timers (so here the overhead is 0) CUDACPP_RUNTIME_REMOVECOUNTEROVERHEAD=1 CUDACPP_RUNTIME_DISABLECALLTIMERS=1 \ ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp INFO: COUNTERS overhead : 0.0039s for 1M start/stop cycles [COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD : 3.7998s [COUNTERS] PROGRAM COUNTEROVERHEAD : 0.0000s ------------------------------------------------------------- [COUNTERS] *** USING RDTSC-BASED TIMERS (remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 3.7998s CUDACPP_RUNTIME_USECHRONOTIMERS=1 CUDACPP_RUNTIME_REMOVECOUNTEROVERHEAD=1 CUDACPP_RUNTIME_DISABLECALLTIMERS=1 \ ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp INFO: COUNTERS overhead : 0.0038s for 1M start/stop cycles [COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD : 3.9067s [COUNTERS] PROGRAM COUNTEROVERHEAD : 0.0000s ------------------------------------------------------------- [COUNTERS] *** USING STD::CHRONO TIMERS (remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 3.9067s
…ter overhead These are the results (1) keep overhead ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] *** USING RDTSC-BASED TIMERS (do not remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 4.4766s [COUNTERS] Fortran Other ( 0 ) : 0.1202s [COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0685s [COUNTERS] Fortran PhaseSpaceSampling ( 3 ) : 3.2400s for 1087437 events => throughput is 3.36E+05 events/s [COUNTERS] Fortran PDFs ( 4 ) : 0.1007s for 32768 events => throughput is 3.25E+05 events/s [COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1673s for 16384 events => throughput is 9.79E+04 events/s [COUNTERS] Fortran Reweight ( 6 ) : 0.0521s for 16384 events => throughput is 3.14E+05 events/s [COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0687s for 16384 events => throughput is 2.38E+05 events/s [COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.1237s for 1087437 events => throughput is 8.79E+06 events/s [COUNTERS] CudaCpp Initialise ( 11 ) : 0.4728s [COUNTERS] CudaCpp Finalise ( 12 ) : 0.0269s [COUNTERS] CudaCpp MEs ( 19 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s [COUNTERS] TEST SampleGetX ( 21 ) : 2.3496s for 14136681 events => throughput is 6.02E+06 events/s [COUNTERS] OVERALL NON-MEs ( 31 ) : 4.4409s [COUNTERS] OVERALL MEs ( 32 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s CUDACPP_RUNTIME_USECHRONOTIMERS=1 \ ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] *** USING STD::CHRONO TIMERS (do not remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 5.3144s [COUNTERS] Fortran Other ( 0 ) : 0.1588s [COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0674s [COUNTERS] Fortran PhaseSpaceSampling ( 3 ) : 4.0191s for 1087437 events => throughput is 2.71E+05 events/s [COUNTERS] Fortran PDFs ( 4 ) : 0.0996s for 32768 events => throughput is 3.29E+05 events/s [COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1660s for 16384 events => throughput is 9.87E+04 events/s [COUNTERS] Fortran Reweight ( 6 ) : 0.0508s for 16384 events => throughput is 3.22E+05 events/s [COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0704s for 16384 events => throughput is 2.33E+05 events/s [COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.1482s for 1087437 events => throughput is 7.34E+06 events/s [COUNTERS] CudaCpp Initialise ( 11 ) : 0.4718s [COUNTERS] CudaCpp Finalise ( 12 ) : 0.0267s [COUNTERS] CudaCpp MEs ( 19 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s [COUNTERS] TEST SampleGetX ( 21 ) : 2.8646s for 14136681 events => throughput is 4.94E+06 events/s [COUNTERS] OVERALL NON-MEs ( 31 ) : 5.2787s [COUNTERS] OVERALL MEs ( 32 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s (2) remove overhead CUDACPP_RUNTIME_REMOVECOUNTEROVERHEAD=1 \ ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp INFO: COUNTERS overhead : 0.0338s for 1M start/stop cycles [COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD : 4.8244s [COUNTERS] PROGRAM COUNTEROVERHEAD : 0.8905s ------------------------------------------------------------- [COUNTERS] *** USING RDTSC-BASED TIMERS (remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 3.9339s [COUNTERS] Fortran Other ( 0 ) : 0.2954s [COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0674s [COUNTERS] Fortran PhaseSpaceSampling ( 3 ) : 2.7332s for 1087437 events => throughput is 3.98E+05 events/s [COUNTERS] Fortran PDFs ( 4 ) : 0.1003s for 32768 events => throughput is 3.27E+05 events/s [COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1688s for 16384 events => throughput is 9.71E+04 events/s [COUNTERS] Fortran Reweight ( 6 ) : 0.0507s for 16384 events => throughput is 3.23E+05 events/s [COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0695s for 16384 events => throughput is 2.36E+05 events/s [COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.0924s for 1087437 events => throughput is 1.18E+07 events/s [COUNTERS] CudaCpp Initialise ( 11 ) : 0.4692s [COUNTERS] CudaCpp Finalise ( 12 ) : 0.0263s [COUNTERS] CudaCpp MEs ( 19 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s [COUNTERS] TEST SampleGetX ( 21 ) : 1.8723s for 14136681 events => throughput is 7.55E+06 events/s [COUNTERS] OVERALL NON-MEs ( 31 ) : 3.8982s [COUNTERS] OVERALL MEs ( 32 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s CUDACPP_RUNTIME_USECHRONOTIMERS=1 CUDACPP_RUNTIME_REMOVECOUNTEROVERHEAD=1 \ ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp INFO: COUNTERS overhead : 0.0637s for 1M start/stop cycles [COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD : 5.8826s [COUNTERS] PROGRAM COUNTEROVERHEAD : 1.6786s ------------------------------------------------------------- [COUNTERS] *** USING STD::CHRONO TIMERS (remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 4.2040s [COUNTERS] Fortran Other ( 0 ) : 0.4831s [COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0691s [COUNTERS] Fortran PhaseSpaceSampling ( 3 ) : 2.9924s for 1087437 events => throughput is 3.63E+05 events/s [COUNTERS] Fortran PDFs ( 4 ) : 0.0983s for 32768 events => throughput is 3.33E+05 events/s [COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1669s for 16384 events => throughput is 9.81E+04 events/s [COUNTERS] Fortran Reweight ( 6 ) : 0.0506s for 16384 events => throughput is 3.24E+05 events/s [COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0676s for 16384 events => throughput is 2.42E+05 events/s [COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.0698s for 1087437 events => throughput is 1.56E+07 events/s [COUNTERS] CudaCpp Initialise ( 11 ) : 0.4712s [COUNTERS] CudaCpp Finalise ( 12 ) : 0.0267s [COUNTERS] CudaCpp MEs ( 19 ) : 0.0350s for 16384 events => throughput is 4.68E+05 events/s [COUNTERS] TEST SampleGetX ( 21 ) : 1.9227s for 14136681 events => throughput is 7.35E+06 events/s [COUNTERS] OVERALL NON-MEs ( 31 ) : 4.1690s [COUNTERS] OVERALL MEs ( 32 ) : 0.0350s for 16384 events => throughput is 4.68E+05 events/s (3) remove overhead, disable individual timers (so here the overhead is 0) CUDACPP_RUNTIME_REMOVECOUNTEROVERHEAD=1 CUDACPP_RUNTIME_DISABLECALLTIMERS=1 \ ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp INFO: COUNTERS overhead : 0.0333s for 1M start/stop cycles [COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD : 4.1897s [COUNTERS] PROGRAM COUNTEROVERHEAD : 0.3330s ------------------------------------------------------------- [COUNTERS] *** USING RDTSC-BASED TIMERS (remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 3.8567s CUDACPP_RUNTIME_USECHRONOTIMERS=1 CUDACPP_RUNTIME_REMOVECOUNTEROVERHEAD=1 CUDACPP_RUNTIME_DISABLECALLTIMERS=1 \ ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp INFO: COUNTERS overhead : 0.0659s for 1M start/stop cycles [COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD : 4.5119s [COUNTERS] PROGRAM COUNTEROVERHEAD : 0.6594s ------------------------------------------------------------- [COUNTERS] *** USING STD::CHRONO TIMERS (remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 3.8525s (4) do not remove overhead, disable individual timers (remove also the overhead from the estimation of the overhead) (this test was done on another day on the same machine and build, but the results are compatible with the previous ones) CUDACPP_RUNTIME_DISABLECALLTIMERS=1 \ ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] *** USING RDTSC-BASED TIMERS (do not remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 3.8072s CUDACPP_RUNTIME_USECHRONOTIMERS=1 CUDACPP_RUNTIME_DISABLECALLTIMERS=1 \ ./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp [COUNTERS] *** USING STD::CHRONO TIMERS (do not remove timer overhead) *** [COUNTERS] PROGRAM TOTAL : 3.8214s
…r merging git checkout upstream/master $(git ls-tree --name-only upstream/master */CODEGEN*txt)
…Source/makefile madgraph5#980) into prof (Checked that regenerating gg_tt.mad is all ok)
…r merging git checkout upstream/master $(git ls-tree --name-only upstream/master */CODEGEN*txt)
…Source/makefile madgraph5#980) into grid
…er merging git checkout upstream/master $(git ls-tree --name-only upstream/master */CODEGEN*txt)
…adgraph5#980) into cmsdy Fix conflicts: - epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common (remove Source/makefile) - epochX/cudacpp/CODEGEN/allGenerateAndCompare.sh (add processes from both branches) (Checked that regenerating gg_tt.mad is ok)
…rce/makefile madgraph5#980) into cmsdyps Fix conflicts: - epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common (remove Source/makefile) (NB regenerating gg_tt.mad is not ok: the newranmar.o is now missing)
…(avoid including Source/makefile in patch.common)
…ivalent (and Source/makefile could still be avoided in patch.common) (I checked that gg_tt.mad can now build successfully)
…ier merging git checkout upstream/master $(git ls-tree --name-only HEAD tput/logs* tmad/logs*)
…nerated code except gg_tt.mad for easier merging git checkout upstream/master $(git ls-tree --name-only upstream/master *.mad/SubProcesses/P*/auto_dsig1.f | grep -v ^gg_tt.mad)
…dhel, for360) into prof Fix conflicts: - epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f (use upstream/master, will add back all counters as in prof) - epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 (use upstream/master, will regenerate this) - epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common (use upstream/master, will regenerate this)
…f branch before merging upstream/master (fix conflicts)
…pstream/master including june24, goodhel, for360 The only files that still need to be patched are - 2 in patch.common: Source/dsample.f, SubProcesses/makefile - 4 in patch.P1: auto_dsig1.f, auto_dsig.f, driver.f, matrix1.f Note: this is 3 files more than those needed in upstream/master (added Source/dsample.f, auto_dsig1.f, auto_dsig.f) ./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch git diff --no-ext-diff -R gg_tt.mad/SubProcesses/makefile gg_tt.mad/Source/dsample.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig.f gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 git checkout gg_tt.mad (Later checked that gg_tt.mad can be regenerated ok)
…' (including june24, goodhel, for360) into prof Also add to the repo a few missing files in gux_taptamggux.mad and nobm_pp_ttW.mad
…ging git checkout upstream/master $(git ls-tree --name-only upstream/master */CODEGEN*txt)
…ated code except gg_tt.mad for easier merging git checkout upstream/master $(git ls-tree --name-only upstream/master *.mad/Source/dsample.f | grep -v ^gg_tt.mad)
…also amd and v1.00.01 fixes) into prof Fix conflicts (use upstream/master version): epochX/cudacpp/gg_tt.mad/Source/dsample.f Will then regenerate patches from this gg_tt.mad
…/master including v1.00.00 and also amd and v1.00.01 fixes The only files that still need to be patched are - 2 in patch.common: Source/dsample.f, SubProcesses/makefile - 4 in patch.P1: auto_dsig1.f, auto_dsig.f, driver.f, matrix1.f Note: this is 3 files more than those needed in upstream/master (added Source/dsample.f, auto_dsig1.f, auto_dsig.f) ./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch git diff --no-ext-diff -R gg_tt.mad/SubProcesses/makefile gg_tt.mad/Source/dsample.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig.f gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 git checkout gg_tt.mad (Later checked that regenerating gg_tt.mad gives no change)
…and also amd and v1.00.01 fixes)
… v1.00.00 and with AMD and v1.00.01 fixes) into cmsdy Fix conflicts: - epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 (manual attempt, will regenerate anyway) - epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common (manual attempt, will regenerate anyway) - epochX/cudacpp/CODEGEN/recreateRefs.sh (use profs version)
…est prof (with upstream/master v1.00.00 and AMD/v1.00.01 fixes) into cmsdy The only files that still need to be patched are - 2 in patch.common: Source/dsample.f, SubProcesses/makefile - 4 in patch.P1: auto_dsig1.f, auto_dsig.f, driver.f, matrix1.f Note: this is 3 files more than those needed in upstream/master (added Source/dsample.f, auto_dsig1.f, auto_dsig.f) ./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch git diff --no-ext-diff -R gg_tt.mad/SubProcesses/makefile gg_tt.mad/Source/dsample.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig.f gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 git checkout gg_tt.mad
… plus AMD/v1.00.01 fixes; NOT grid) into cmsdyps Fix conflicts: - epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 (manual attempt, will regenerate anyway) - epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common (manual attempt, will regenerate anyway) - epochX/cudacpp/gg_tt.mad/CODEGEN_mad_gg_tt_log.txt (take cmsdy)
…atest cmsdy (including prof and master but not grid) The only files that still need to be patched are - 2 in patch.common: Source/dsample.f, SubProcesses/makefile - 4 in patch.P1: auto_dsig1.f, auto_dsig.f, driver.f, matrix1.f Note: this is 3 files more than those needed in upstream/master (added Source/dsample.f, auto_dsig1.f, auto_dsig.f) ./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch git diff --no-ext-diff -R gg_tt.mad/SubProcesses/makefile gg_tt.mad/Source/dsample.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig.f gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 git checkout gg_tt.mad
Now including the latest cmsdy #946 which includes
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a WIP PR extending the CMS DY studies in #946 (which itself includes bits and pieces of many other PRs)
In addition to that, this includes some studies on phase space sampling optimizations, related to #963 #967 #968 #969