Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with runcard includes: stability, dependencies #687

Open
valassi opened this issue Jun 9, 2023 · 12 comments
Open

Issues with runcard includes: stability, dependencies #687

valassi opened this issue Jun 9, 2023 · 12 comments
Assignees

Comments

@valassi
Copy link
Member

valassi commented Jun 9, 2023

Hi @oliviermattelaer @roiser

While integrating and extending Stefan's WIP MR #620 I noticed a couple of issues in the runcard handling, which I would say are issues in upstream.

Preliminary note: MR #520 adds three new parameters to the run_card.dat in gg_tt.mad. When rebuilding, this results in the following messages

treatcards run
Found unexpected entry in run_card: "exec_mode" with value "1 ".
  The type was assigned to int. 
  The definition of that variable will be automatically added to fortran file 
  The value of that variable will be passed to the fortran code via fortran file 
Found unexpected entry in run_card: "bridge_mode" with value "1 ".
  The type was assigned to int. 
  The definition of that variable will be automatically added to fortran file 
  The value of that variable will be passed to the fortran code via fortran file 
Found unexpected entry in run_card: "vector_size" with value "32 ".
  The type was assigned to int. 
  The definition of that variable will be automatically added to fortran file 
  The value of that variable will be passed to the fortran code via fortran file 

Now what happens here is that both run.inc and run_card.inc are modified

        modified:   ../../Source/run.inc
        modified:   ../../Source/run_card.inc

Issue 1, but this is about MR #520: as our run.inc and run_card.inc are in the repo, the new versions should be committed. I will fix this.

Issue 2 (or at least IMO this is an issue): while the changes to run_card.inc are minimal (just three lines with the new parameters are added), the changes to run.inc are massive. A lot of code that was previously lowercase is now uppercase. In addition, a lot of code that was previously longer than 72 characters is now fitting within 72 characters, with continuation characters added. And all these changes affect ALL parameters, not only the three newly added ones. I find this very annoying because it becomes impossible to understand what has changed. I may try to fix this by including run.inc that is already all uppercase? Not sure.

Issue 3. I modified the runcard.dat to remove the new parameters and rebuilt. The build fails, because the run.inc is NOT rebuilt. There is a missing dependency I think, only run_card.inc depends on run_card.dat, while also run.inc should depend on it.

Issue 4. I think that the "Found unexpected entry" are because in MR #520 one should also modify banner.py to make these parameters "default" parameters as opposed to new custom parameters. I will actually try to work around some of the previous problems this way. But some of them remain issues that should independently be fixed, probably...

@valassi
Copy link
Member Author

valassi commented Jun 9, 2023

About issue 3: no actually it is probably not a dpendency that is missing. Or maybe yes somehow, but it is more subtle... the failure is because now run.inc has some commons that should not be there in the first place

ccache /cvmfs/sft.cern.ch/lcg/releases/gcc/11.2.0-ad950/x86_64-centos8/bin/gfortran -w -fPIC -O3 -ffast-math -fbounds-check -ffixed-line-length-132  -c -o setrun.o setrun.f
run.inc:119:50:

  119 |       COMMON/USER_CUSTOM_RUN/EXEC_MODE,BRIDGE_MODE,VECTOR_SIZE
      |                                                  1
Error: Symbol ‘bridge_mode’ at (1) has no IMPLICIT type
run.inc:119:38:

  119 |       COMMON/USER_CUSTOM_RUN/EXEC_MODE,BRIDGE_MODE,VECTOR_SIZE
      |                                      1
Error: Symbol ‘exec_mode’ at (1) has no IMPLICIT type
run.inc:119:62:

  119 |       COMMON/USER_CUSTOM_RUN/EXEC_MODE,BRIDGE_MODE,VECTOR_SIZE
      |                                                              1
Error: Symbol ‘vector_size’ at (1) has no IMPLICIT type

@valassi
Copy link
Member Author

valassi commented Jun 9, 2023

I guess that I can workaround this by doing two things:

  • add code to banner.py so that the parameters are known (needed? will prevent total rewrite of run.inc?)
  • modify run.inc so that these parameters are always there, they are not user parameters...

@valassi
Copy link
Member Author

valassi commented Jun 9, 2023

About issue 3: no actually it is probably not a dpendency that is missing. Or maybe yes somehow, but it is more subtle... the failure is because now run.inc has some commons that should not be there in the first place

ccache /cvmfs/sft.cern.ch/lcg/releases/gcc/11.2.0-ad950/x86_64-centos8/bin/gfortran -w -fPIC -O3 -ffast-math -fbounds-check -ffixed-line-length-132  -c -o setrun.o setrun.f
run.inc:119:50:

  119 |       COMMON/USER_CUSTOM_RUN/EXEC_MODE,BRIDGE_MODE,VECTOR_SIZE
      |                                                  1
Error: Symbol ‘bridge_mode’ at (1) has no IMPLICIT type
run.inc:119:38:

  119 |       COMMON/USER_CUSTOM_RUN/EXEC_MODE,BRIDGE_MODE,VECTOR_SIZE
      |                                      1
Error: Symbol ‘exec_mode’ at (1) has no IMPLICIT type
run.inc:119:62:

  119 |       COMMON/USER_CUSTOM_RUN/EXEC_MODE,BRIDGE_MODE,VECTOR_SIZE
      |                                                              1
Error: Symbol ‘vector_size’ at (1) has no IMPLICIT type

Yes the problem here is that if I remove the parameters, then the rewriting of run.inc removes their type declaration, but does not remove them from the common... this is a clear BUG.

______________________________________________________________________________
git diff /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.mad/Source/run.inc
--> For patch diff: git diff --no-ext-diff [--no-color]
--> For tkdiff: git difftool
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
epochX/cudacpp/gg_tt.mad/Source/run.inc /tmp/hgteux_run.inc ce6aa8e65075a0f9f9294d5a3b6f826b0b3050c3 100644 epochX/cudacpp/gg_tt.mad/Source/run.inc 0000000000000000000000000000000000000000 100644
75c75
< C      subprocesses
---
> C     subprocesses
90c90
< C      procedure, i.e. what
---
> C     procedure, i.e. what
118,120d117
<       INTEGER EXEC_MODE  ! added by autodef
<       INTEGER BRIDGE_MODE  ! added by autodef
<       INTEGER VECTOR_SIZE  ! added by autodef
123a121
> 
______________________________________________________________________________
git diff /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.mad/Source/run_card.inc
--> For patch diff: git diff --no-ext-diff [--no-color]
--> For tkdiff: git difftool
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
epochX/cudacpp/gg_tt.mad/Source/run_card.inc /tmp/1HdZYw_run_card.inc 35f3a084be089a43b7e5735c0759260d351f76e0 100644 epochX/cudacpp/gg_tt.mad/Source/run_card.inc 0000000000000000000000000000000000000000 100644
365,370d364
<       EXEC_MODE = 1
< 
<       BRIDGE_MODE = 1
< 
<       VECTOR_SIZE = 32
< 

@valassi
Copy link
Member Author

valassi commented Jun 9, 2023

Anyway: in pur specific case of these three parameters, I think that these SHOULD NOT BE IN THE INC in the first place!

  • exec mode is a pure shell option about what to build and run
  • the pther two parameters are something that we want to dump to the inpit file read by madevent (at least initially)

I guess the solution is then:

  • do nothing about run.inc
  • add the parameters to banner.py, as "no include" (and as "hidden"? well for cudacpp we always want themm there, they are not optional)

@valassi valassi self-assigned this Jun 9, 2023
valassi added a commit to valassi/madgraph4gpu that referenced this issue Jun 9, 2023
…c and run_card.inc that result from the additions of 3 new parameters in run_card.inc (madgraph5#687)

treatcards run
Found unexpected entry in run_card: "exec_mode" with value "1 ".
  The type was assigned to int.
  The definition of that variable will be automatically added to fortran file
  The value of that variable will be passed to the fortran code via fortran file
Found unexpected entry in run_card: "bridge_mode" with value "1 ".
  The type was assigned to int.
  The definition of that variable will be automatically added to fortran file
  The value of that variable will be passed to the fortran code via fortran file
Found unexpected entry in run_card: "vector_size" with value "32 ".
  The type was assigned to int.
  The definition of that variable will be automatically added to fortran file
  The value of that variable will be passed to the fortran code via fortran file
valassi added a commit to valassi/madgraph4gpu that referenced this issue Jun 9, 2023
valassi added a commit to valassi/madgraph4gpu that referenced this issue Jun 9, 2023
…stead (madgraph5#687)

Revert "[runcard] in gg_tt.mad/Cards, comment out the 3 new parameters in run_card.dat"
This reverts commit 0b14451.

Revert "[runcard] in gg_tt.mad Source, commit the modified versions of run.inc and run_card.inc that result from the additions of 3 new parameters in run_card.inc (madgraph5#687)"
This reverts commit b5b1ffc.
valassi added a commit to valassi/madgraph4gpu that referenced this issue Jun 9, 2023
madgraph5#687)

The run.inc and run_card.inc files now are no longer modified - which is ok/better as these variables are not needed there
@oliviermattelaer
Copy link
Member

I guess that we are not yet merged with 3.5.0 since I have implement a lot of new feature for the handling of entry in the run_card which are not expected by MG5aMC, you can for example specify in which .inc they have to be written and also specify if you want to add them in the run.inc or not

@valassi
Copy link
Member Author

valassi commented Jun 9, 2023

I guess that we are not yet merged with 3.5.0 since I have implement a lot of new feature for the handling of entry in the run_card which are not expected by MG5aMC, you can for example specify in which .inc they have to be written and also specify if you want to add them in the run.inc or not

Hi, if you are talking about "include=False, hidden=True", then no actually this is already available, and it is precisely what I used in my latest commit! It seems very useful!

The problem initially was that the 3 parameters added by Stefan were considered "user parameters" (COMMON/USER_CUSTOM_RUN/) because they were not included in banner.py. Now I have added them in banner.py, with include=False, so they do not show up in run.inc (where we do not need them anyway).

One missing piece I am looking at now is that I need to read those values from runcard.dat and somehow transform them to values in the input file (Stefan used env variables, but the env variables are not connected to the runcard values and this needs fixing)

@oliviermattelaer
Copy link
Member

oliviermattelaer commented Jun 9, 2023 via email

@valassi
Copy link
Member Author

valassi commented Jun 9, 2023

But if they are in the run_card, they should be put in a global and not be pass in the input_XXXX (which are files that are super complicated to handle since those are designed to include information which are different for each of the various instance of madevent (i.e. which are different for G1 compare to G2, ...) and/or different between the various step (survey/refine) So my advise would be to put them in a include file (maybe not in run.inc if that's a problem for you) and use it from there.

Thanks I see what you mean. These are parameters that are common to all of those G1/G2 runs, while the input file is meant for stuff that differs betwen G1 and G2 etc.

However my main problem (maybe it's a question of taste, but I think it's more than that) is that I want to change those parameters at runtime. I dont want to have to rebuild the code every time. Another alternative is setting an environment variable, but that's quite ugly.

(By the way, the question of runtime vs buildtime is really important for the parameter card... this is essentially #660... in cudacpp we read parameters at runtime, but in the fortran machinery in any case something is done at buildtime... I still need to understand if all is ok or not)

@oliviermattelaer
Copy link
Member

Sorry to be stubborn here, but we have to keep the approach of our code here.
This is indeed not the ideal way to work for an IT perspective but we have to think for the final user not for us.

And in practise, we should now move to let madgraph handles the input_xxx file
and be used to run everything from the top command and just do two successive run with changing the run_card in between the two run (which anyway will lead to some recompilation even if the variable change is not in the include file.

(By the way, the question of runtime vs buildtime is really important for the parameter card... this is essentially
#660... in cudacpp we read parameters at runtime, but in the
fortran machinery in any case something is done at buildtime... I still need to understand if all is ok or not)

Indeed we have to change the cudacpp to not read the param_card anymore since this is too problematic on a HPC environment (IO bottleneck) and unsafe from user's mistake. So we should do the same as from fortran:
writte the content of the file as a formatted file that is compiled with the code.

I know that changing the input is convenient for us, but adding new variable into those files is not something trivial and certainly not needed/worthed here.

@valassi
Copy link
Member Author

valassi commented Jun 10, 2023

Thanks Olivier, no you are perfectly right to insist - you know much better how the code is used :-)

Indeed I must get more into my head that a rebuild does happen all the time anyway when runcards are changed, and then I must understand how to fit my dev workflows around that. Anyway we should discuss it in person in August. Until then, I will just try to get a full thing working somehow...

For the param cards, this is even trickier and we also must discuss that, especially keeping in mind all the discussions about reweighting. Here I just have not understood that yet how our handling of parameters is done in practice now that we mix reading at runtime and param.inc (#660)... I must do tests to find this out.

@roiser
Copy link
Member

roiser commented Jun 17, 2023 via email

@valassi
Copy link
Member Author

valassi commented Jun 17, 2023

There is one point that was mentioned a bit further down in the thread, i.e. that two parameters (exec_mode and vector_size) are being used from the runcard but ALSO as environment variables. This is needed because those env variable are used when calling refine.sh.

Hi Stefan, thanks. Actually in the meantime I removed two parameters from the runcard and from the madevent input files (see #658, now completed). I moved them to driver.f as env variables for "develper mode" only (see #693, which I am actaually starting to think may not even be necessary). These are actually the vecsize and the bridge mode. The exec mode (now called cudacpp_backend) is instead fully in the runcard and needed only there.

I am running some routine integrated tests of "generate_events", with good results. I think we made good progress, I will give recipes to the experiments next week (based for the moment on the TWO github repos, and generateAndCompare script, which is now generalised to accept any process). See test results in #711 and MR #709

The only problems for the experiments is that we still have many process-specific issues, not only BSM/SUSY/EFT, but also standard model, like #701, #655, #630. But the workflow is usable, I think there is not much more that is strictly needed.

Maybe this #687 can also be closed, I will review it after merging #709

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants