Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NOT TO BE MERGED - ("jthip24") Jorgen's earlier work on HIP before PR #774 and my fixes on that #800

Closed
wants to merge 546 commits into from

Conversation

valassi
Copy link
Member

@valassi valassi commented Jan 25, 2024

This is a temporary PR that I file here only for reference and to keep the relevant code. I will close it immediately after creating it.

This PR uses my "jthip24" branch, which is itself an addition over Jorgen's branch
https://github.com/Jooorgen/madgraph4gpu/tree/gpu_abstraction
Jorgen's "gpu_abstraction" includes HIP work but also CI/profiling work by Jorgen.
My "jthip24" branch includes the above, with the addition of some extra fixes which I was working on together with Jorgen in July-August (and some more recent ones).

Jorgen's "gpu_abstraction" branch never made it into a PR.
Instead, Jorgen stripped off a smaller/different "gpu_abstraction_only" branch which he used to open PR #774.
https://github.com/Jooorgen/madgraph4gpu/tree/gpu_abstraction_only

When I tried to work over PR #774, I had several issues from "gpu_abstraction_only", which I had already fixed with Jorgen in this "gpu_abstraction". Therefore I had to go back and forth between the two branches to compare and pick up the better code. This is in my "jt774" branch which I will use for an upcoming PR.

I file this PR (which I will immediately close) for two reasons

Note that I have merged jt774 into this jthip24, so now the two lines of developmente have converged again, just in case.

Note also that Jorgen's "gpu_abstraction" is now what he used for PR #718, which instead is based on his master.

valassi and others added 26 commits January 24, 2024 11:21
I am working now in branch valassi/jthip24.
This is my first commit over joorgen/gpu_abstraction branch (also known as valassi/jthip),
as of commit 229ffeb (Tue Aug 15 11:33:01 2023 +0200).

This branch contains fewer features than jooorgen/master (PR madgraph5#718),
but it is more advanced than joorgen/gpu_abstraction_only (PR madgraph5#774).
I will probably need some of the commits here to fix PR madgraph5#774 in branch valassi/jt774.

I regenerate the ten processes as follows:
  ./CODEGEN/generateAndCompare.sh ee_mumu --mad
  ./CODEGEN/generateAndCompare.sh gg_tt --mad
  ./CODEGEN/generateAndCompare.sh gg_ttg --mad
  ./CODEGEN/generateAndCompare.sh gg_ttgg --mad
  ./CODEGEN/generateAndCompare.sh gg_ttggg --mad
  ./CODEGEN/generateAndCompare.sh ee_mumu
  ./CODEGEN/generateAndCompare.sh gg_tt
  ./CODEGEN/generateAndCompare.sh gg_ttg
  ./CODEGEN/generateAndCompare.sh gg_ttgg
  ./CODEGEN/generateAndCompare.sh gg_ttggg
These are the five processes that would get conflicts when I merge upstream/master here,
so I guess that these are the only ten processes touched in this branch.
The fact that I can regenerate them and there are no real differences
(except for irrelevant stuff like me5_configuration.txt, aloha_file.inc, py3_model.pkl)
shows that ALL IMPORTANT CHANGES BY JORGEN HERE ARE IN THE CODEGEN.

I can therefore merge upstream/master, fix conflicts in CODEGEN and regenerate.
…else HIP else neither" in CODEGEN cudacpp.mk

(This is derived from this morning's commit d2e2f47 in branch jt774)
Manuall fix conflicts in CODEGEN and CI (reusing experience from branch jt774):
	.github/workflows/c-cpp.yml
	epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/madgraph/iolibs/template_files/gpu/Bridge.h
	epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/madgraph/iolibs/template_files/gpu/check_sa.cc
	epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/madgraph/iolibs/template_files/gpu/cudacpp.mk
	epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/madgraph/iolibs/template_files/gpu/cudacpp_src.mk
	epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/madgraph/iolibs/template_files/gpu/mgOnGpuConfig.h
	epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/output.py

Fix all other conflicts in ten code-generated code by using previous jthip24 versions:
  for p in ee_mumu gg_tt gg_ttg gg_ttgg gg_ttggg; do git checkout e3ca5d9 $p.sa ; done
  for p in ee_mumu gg_tt gg_ttg gg_ttgg gg_ttggg; do git checkout e3ca5d9 $p.mad ; done
…EGEN (code generation was failing clang formatting checks)
(NB: this code does not build instead in branch jt774 based on PR madgraph5#774)
…ors.h and process_matrix.inc as in branch jthip24

These are changes that in that branch I included in commitcommit 6e90139 (Tue Jul 18 18:25:34 2023 +0200),
which consisted in a backport to CODEGEN of earlier changes in ggttggg.mad.
…GONGPUCPP_GPUIMPL... not clear why this was not done yet

In branch jthip24, this is coming from Jorgen's commit 6741186 (Thu Jul 13 15:15:41 2023 +0200)
which includes many such changes
…ggttgg.mad on Tue Jul 18 18:11:04 2023 +0200)

Fix conflicts:
	epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/madgraph/iolibs/template_files/cpp_model_parameters_h.inc
	epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/madgraph/iolibs/template_files/gpu/check_sa.cc
	epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/madgraph/iolibs/template_files/gpu/mgOnGpuConfig.h

NB: this is very strange, because this same commit 1b5c0fd is already included in the jt774 branch earlier on...
…EGEN (code generation was failing clang formatting checks)

This is a cherry-pick of f44a9c7 from jthip24
Code generation is now succeeding (it was previously failing in clang-format)

git checkout f44a9c7 CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/madgraph/iolibs/template_files/gpu/check_sa.cc
…ip24

(I accidentally removed it in commit e32bc4e on Wed Jan 24 10:36:44 2024 +0100)
…t back to GpuAbstraction.h

Revert "[CODEGEN] Added HIP runtime include in mgOnGpuConfig.h in codegen"
This reverts Jorgen's commit 35913a3 (2023-07-13 15:15:41 +0200)
…t back to GpuAbstraction.h

Revert "[CODEGEN] Added HIP runtime include in mgOnGpuConfig.h in codegen"
This reverts Jorgen's commit 35913a3 (2023-07-13 15:15:41 +0200)

This is a cherry-pick in jt774 of d1131c1 from jthip24
…urrently af0f0d4)

This is meant to fix the build of the code generated ggtt.mad (it was failing before in jt774 while it succeeds in jthip24)
However code generation is now failing in clang formatting

git checkout af0f0d4 CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/madgraph/iolibs/template_files/cpp_model_parameters_h.inc
…revious commit

(undo the changes here from commit e32bc4e on Wed Jan 24 10:36:44 2024 +0100)

The code generated ggtt.mad is now finally succeeding in jt774, as it does in jthip24.
HOWEVER the two code branches are not identical yet, there is still a minor difference in makefiles
git checkout 4ba2133 CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/madgraph/iolibs/template_files/gpu/cudacpp.mk

NB: CODEGEN in jt774 is now identical to that in jthip24!
…(eg in pp_tt012j)

NB: Now all processes in the repo are the same as in jt774, except for codegen logs: will now copy over those codegen logs
…cesses - add to repo Gpu*.h when missing

*** NB Now all processes in the repo are the same as in jthip24 (including codegen logs as I copied those of jt774 to jthip24) ***

*** NB Now jthip24 is identical to jt774, except that jthip24 also contains extra files in .github/workflows and in tools for CI and profiling ***
…degen logs into jthip24 from jt774 (currently commit 464703b on Thu Jan 25 18:08:40 2024 +0100)

git checkout jt774 $(git ls-tree --name-only HEAD */CODEGEN*txt)

*** NB Now all processes in the repo are the same as in jt774 (including codegen logs) ***

*** NB Now jthip24 is identical to jt774, except that jthip24 also contains extra files in .github/workflows and in tools for CI and profiling ***

git diff jthip24 jt774 --name-only
.github/workflows/a100_profiler.yml
.github/workflows/c-cpp.yml
.github/workflows/mi250x_profiler.yml
.github/workflows/sycl.yml
.github/workflows/v100s_profiler.yml
tools/profiling/README.md
tools/profiling/buildCUDAProcess.sh
tools/profiling/buildSYCLProcess.sh
tools/profiling/container-README.md
tools/profiling/containerSetup.sh
tools/profiling/evaluation.py
tools/profiling/performanceProfiler.py
tools/profiling/profileconfig.ini
tools/profiling/sendData.py
This is essentially a no-op, as jthip24 is identical to jt774, except that jthip24 also contains extra changes in .github/workflows and in tools for CI and profiling
@valassi valassi self-assigned this Jan 25, 2024
@valassi
Copy link
Member Author

valassi commented Jan 25, 2024

As discussed, I am closing this immediately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants