-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(3 in pipeline) - runcard fixes, multi-backend gridpacks, python/bash profiling in tlau #948
base: master
Are you sure you want to change the base?
Conversation
As suggested by @oliviermattelaer the fix for #947 is moved to mg5amcnlo/mg5amcnlo#125 |
23ac026
to
43f80dd
Compare
(August 2024: Fix conflicts in epochX/cudacpp/tlau/lauX.sh after rebasing on upstream/master)
./tlau/lauX.sh -cuda gg_tt.mad
…_type' to 'cudacpp_fptype' for runcard options (madgraph5#700, madgraph5#788, madgraph5#835) (Older commit on gg_tt.mad, later cherry-picked fixing AVX conflicts, and later backported to CODEGEN too)
…acpp_backend in add_input_for_banner Then regenerate gg_tt.mad: this only affects the gg_tt.mad/bin/internal/plugin_run_card file
…acpp_backend in default_setup (will revert but keep the added comment) Then regenerate gg_tt.mad: this only adds "CUDACPP_BACKEND = 'cpp'" to run_card.inc
…dacpp_backend in default_setup Regenerate gg_tt.mad: this removes "CUDACPP_BACKEND = 'cpp'" from run_card.inc
…acpp_fptype in default_setup and comment out fct_mod (will revert but keep the added comment) Then regenerate gg_tt.mad: this only adds "CUDACPP_FPTYPE = 'm'" to run_card.inc (if fct_mod is commented out, else this fails codegen)
…dacpp_fptype in default_setup and uncomment out fct_mod Then regenerate gg_tt.mad: this removes "CUDACPP_FPTYPE = 'm'" from run_card.inc
…p_fptype Then regenerate gg_tt.mad: this gives a warning "run_card missed argument cudacpp_fptype. Takes default: m", but no code/card change?
…e runcard template, and keep hidden=False Then regenerate gg_tt.mad: a cudacpp_fptype line is added to the runcard and the warning "run_card missed argument cudacpp_fptype. Takes default: m" disappears
…nner (and reorder code) Then regenerate gg_tt.mad: this only affects the gg_tt.mad/bin/internal/plugin_run_card file
…ILDDIR=1 to make_opts (part of madgraph5#945) Then regenerate gg_tt.mad: this only affects launch_plugin.py itself Launching lauX.sh on ggtt fails as follows Error detected in "generate_events -f" write debug file /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.mad/run_01_tag_1_debug.log If you need help with this issue please contact us on https://answers.launchpad.net/mg5amcnlo str : A compilation Error occurs when trying to compile /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx. The compilation fails with the following output message: make USEGTEST=0 BACKEND=cuda build.auto_m_inl0_hrd0/madevent_cuda make[1]: Entering directory '/data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx' make[1]: *** No rule to make target 'build.auto_m_inl0_hrd0/madevent_cuda'. Stop. make[1]: Leaving directory '/data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx' make: *** [makefile:163: madevent_cuda_link] Error 2 Please try to fix this compilations issue and retry. Help might be found at https://answers.launchpad.net/mg5amcnlo. If you think that this is a bug, you can report this at https://bugs.launchpad.net/mg5amcnlo
… madevent_xxx_link' in args[0][0] to support USEBUILDDIR=1 (part of madgraph5#945) Then regenerate gg_tt.mad However lauX.sh still fails
…][1] to 'make madevent_xxx_link' to support USEBUILDDIR=1 (part of madgraph5#945) Then regenerate gg_tt.mad
…two patches, will fix gg_tt.mad makefile instead Revert "[runcard] in CODEGEN launch_plugin.py, append 'BACKEND=xxx' as args[0][1] to 'make madevent_xxx_link' to support USEBUILDDIR=1 (part of madgraph5#945)" This reverts commit ce67fb5. Revert "[runcard] in CODEGEN launch_plugin.py, prepend 'BACKEND=xxx' to 'make madevent_xxx_link' in args[0][0] to support USEBUILDDIR=1 (part of madgraph5#945)" This reverts commit b51a59a.
…UILDDIR=1' (part of madgraph5#945)
…targets instead of having this in a python script (part of madgraph5#945)
…rgets make commands (eg propagate FPTYPE=m from make_opts)
…_opts configuration, this will be hardcoded in link targets (part of madgraph5#945)" Revert "[runcard] in CODEGEN launch_plugin.py, configure compile to add USEBUILDDIR=1 to make_opts (part of madgraph5#945)" This reverts commit 853a20e.
…(and propagate FPTYPE from make_opts) in _link targets for madgraph5#945 The only files that still need to be patched are - 3 in patch.common: Source/makefile, Source/genps.inc, SubProcesses/makefile - 3 in patch.P1: auto_dsig1.f, driver.f, matrix1.f ./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch git diff --no-ext-diff -R gg_tt.mad/Source/makefile gg_tt.mad/Source/genps.inc gg_tt.mad/SubProcesses/makefile > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 git checkout gg_tt.mad
…o that they can be used when set in the runcards madgraph5#700
…st v1.00.00 and fixing SubProcesses/makefile madgraph5#700 The only files that still need to be patched are - 1 in patch.common: Source/genps.inc, SubProcesses/makefile - 2 in patch.P1: driver.f, matrix1.f ./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch git diff --no-ext-diff -R gg_tt.mad/SubProcesses/makefile > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 git checkout gg_tt.mad (Later regenerated gg_tt.mad and checked that all is ok)
…rect and that runcard and launch_plugin are ok madgraph5#700
… runcards madgraph5#700 (SubProcesses/makefile, runcard, launch_plugin)
…h) after merging banner+makefiles - all ok ./tput/allTees.sh -hip Approximately: STARTED AT Sun 06 Oct 2024 02:20:00 PM EEST ENDED AT Sun 06 Oct 2024 03:12:00 PM EEST
… - all as expected (heft fail madgraph5#833, skip ggttggg madgraph5#933) ./tmad/allTees.sh -hip STARTED AT Sun 06 Oct 2024 03:12:36 PM EEST (SM tests) ENDED(1) AT Sun 06 Oct 2024 05:40:31 PM EEST [Status=0] (BSM tests) ENDED(1) AT Sun 06 Oct 2024 05:50:33 PM EEST [Status=0]
Revert "[grid] rerun 30 tmad tests on LUMI/HIP after merging banner+makefiles - all as expected (heft fail madgraph5#833, skip ggttggg madgraph5#933)" This reverts commit 0fe4544. Revert "[grid] rerun 96 tput builds and tests on LUMI worker node (small-g 72h) after merging banner+makefiles - all ok" This reverts commit e3ed264.
…makefiles - all ok STARTED AT Sun Oct 6 01:22:14 PM CEST 2024 ./tput/teeThroughputX.sh -dmf -hrd -makej -eemumu -ggtt -ggttg -ggttgg -gqttq -ggttggg -makeclean -cpponly ENDED(1) AT Sun Oct 6 01:35:24 PM CEST 2024 [Status=0] ./tput/teeThroughputX.sh -d_f -hrd -makej -eemumu -ggtt -ggttgg -inlonly -makeclean -cpponly ENDED(2) AT Sun Oct 6 01:40:38 PM CEST 2024 [Status=0] ./tput/teeThroughputX.sh -makej -eemumu -ggtt -ggttg -gqttq -ggttgg -ggttggg -d_f -bridge -makeclean -cpponly ENDED(3) AT Sun Oct 6 01:45:35 PM CEST 2024 [Status=0] ./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -d_f -rmbhst -cpponly ENDED(4) AT Sun Oct 6 01:47:03 PM CEST 2024 [Status=0] SKIP './tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -d_f -common -cpponly' ENDED(5) AT Sun Oct 6 01:47:03 PM CEST 2024 [Status=0] ./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -d_f -common -cpponly ENDED(6) AT Sun Oct 6 01:48:30 PM CEST 2024 [Status=0] ./tput/teeThroughputX.sh -dmf -hrd -makej -susyggtt -susyggt1t1 -smeftggtttt -heftggbb -makeclean -cpponly ENDED(7) AT Sun Oct 6 01:53:13 PM CEST 2024 [Status=0] No errors found in logs No FPEs or '{ }' found in logs
… all as expected STARTED AT Sun Oct 6 01:53:13 PM CEST 2024 (SM tests) ENDED(1) AT Sun Oct 6 04:50:19 PM CEST 2024 [Status=0] (BSM tests) ENDED(1) AT Sun Oct 6 04:55:35 PM CEST 2024 [Status=0]
…er+makefiles - all ok STARTED AT Sun Oct 6 01:19:50 PM CEST 2024 ./tput/teeThroughputX.sh -dmf -hrd -makej -eemumu -ggtt -ggttg -ggttgg -gqttq -ggttggg -makeclean ENDED(1) AT Sun Oct 6 01:44:11 PM CEST 2024 [Status=0] ./tput/teeThroughputX.sh -d_f -hrd -makej -eemumu -ggtt -ggttgg -inlonly -makeclean ENDED(2) AT Sun Oct 6 01:52:30 PM CEST 2024 [Status=0] ./tput/teeThroughputX.sh -makej -eemumu -ggtt -ggttg -gqttq -ggttgg -ggttggg -d_f -bridge -makeclean ENDED(3) AT Sun Oct 6 02:01:40 PM CEST 2024 [Status=0] ./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -d_f -rmbhst ENDED(4) AT Sun Oct 6 02:04:29 PM CEST 2024 [Status=0] ./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -d_f -curhst ENDED(5) AT Sun Oct 6 02:07:14 PM CEST 2024 [Status=0] ./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -d_f -common ENDED(6) AT Sun Oct 6 02:10:06 PM CEST 2024 [Status=0] ./tput/teeThroughputX.sh -dmf -hrd -makej -susyggtt -susyggt1t1 -smeftggtttt -heftggbb -makeclean ENDED(7) AT Sun Oct 6 02:19:40 PM CEST 2024 [Status=0] No errors found in logs No FPEs or '{ }' found in logs
I removed the link to #947 (this was fixed in mg5amcnlo/mg5amcnlo#125, submodule was updated in PR #949) |
… - all as expected (heft fail madgraph5#833)
…ranch grid This includes in summary - madgraph5#700 add cudacpp_helinl, cudacpp_hrdcod, rename cudacpp_fptype - madgraph5#945 add cudacpp_bldall for multi-backend gridpacks - madgraph5#957 enhance tlau tests: instrument python code for gridpack profiling, add test logs and scripts to parse them
Hi @oliviermattelaer as discussed via email. This is N=3 in the pipeline I would like to merge. I changed this to target the N=2 makefiles branch for easier reviewing, but I would then merge it against master once approved. Can you please tell me if this looks ok? Thanks |
(Note: this also addresses #700 but I did not manage to link it yet) |
Point by Point reviewmulti backend gridpacks (cudacpp_bldall runcard)No problem on that one, but maybe why including it by default in the run_card?
Would it make more sense to not display them at all (or in a independent hidden block "simd_advance"?) add cudacpp_helinl and cudacpp_hrdcod runcards for HELINL=1 and HRDCOD=1 modes
rename the FPTYPE runcard as cudacpp_fptypeThis is delicate since this is going to break retro-compatibility of the code which we have to preserve. instrument gridpacks with time profiling of the python/bash orchestrator (this is only in my "tlau" tests for the moment, eventually it can be moved in mg5amcnlo, but for the moment it is there so I can use it and further develop it)I do not see any change in the codegen directory so I guess that you edit manually a .tmad directory, which is fine since those directory needs anyway to be remove from the repo. Global CommentTouching the patch, should be avoid as much as possible, do you want to add new handle in the template? |
Hi @oliviermattelaer thanks :-) I will reply later. For the moment I moved back to draft in any case while working on the base PRs |
git checkout upstream/master $(git ls-tree --name-only upstream/master */CODEGEN*txt)
…and makefiles) into grid Fix conflicts: - epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/CHANGELOG.md (fix comment about floating_type card) - epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 (will regenerate anyway)
…/master with updated mg5amcnlo Only the following files are needed to build the patch: - 1 in patch.common: SubProcesses/makefile - 2 in patch.P1: driver.f, matrix1.f ./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch git diff --no-ext-diff -R gg_tt.mad/SubProcesses/makefile > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 git checkout gg_tt.mad
Hi Olivier thanks for the comments :-) Maybe easiest to have a short zoom when you have time tomorrow or another time. In short
Apart from this:
Lets dscuss the other points above by zoome (or just reply here in the meantime), tomorrow. Thanks! :-) |
This is a WIP PR with fixes and studies in generate_events
It includes a possible fix for #947