Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generate_events fails for the FORTRAN backend #690

Closed
valassi opened this issue Jun 11, 2023 · 3 comments · Fixed by #688
Closed

generate_events fails for the FORTRAN backend #690

valassi opened this issue Jun 11, 2023 · 3 comments · Fixed by #688
Assignees

Comments

@valassi
Copy link
Member

valassi commented Jun 11, 2023

I am continuing to debug my MR #688 based on Stefan's #620.

For the cuda and cpp backend, generate_events is ok in tlau/lauX.sh. For the fortran backend it fails.

    INFO: Running Survey
    Creating Jobs
    Working on SubProcesses
    INFO:     P1_gg_ttx
    INFO: Building madevent in madevent_interface.py with 'FORTRAN' matrix elements
    INFO:  Idle: 1,  Running: 0,  Completed: 0 [ current time: 19h44 ]
    INFO:  Idle: 0,  Running: 0,  Completed: 1 [  0.041s  ]
    INFO:  Idle: 0,  Running: 0,  Completed: 1 [  0.041s  ]
    INFO: End survey
    refine 10000
    Creating Jobs
    INFO: Refine results to 10000
    INFO: Generating 10000.0 unweighted events.
    Error when reading /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/G1/results.dat
    Command "generate_events -f" interrupted with error:
    FileNotFoundError : [Errno 2] No such file or directory: '/data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/G1/results.dat'
    Please report this bug on https://bugs.launchpad.net/mg5amcnlo
    More information is found in '/data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.mad/run_01_tag_1_debug.log'.
    Please attach this file to your report.
    quit
    INFO:

I need to debug this more than expected, so I open a ticket

@valassi valassi self-assigned this Jun 11, 2023
@valassi
Copy link
Member Author

valassi commented Jun 11, 2023

I think I understood it - the 'bridge' mode must be set to 0 for fortran. I will change that

@valassi
Copy link
Member Author

valassi commented Jun 11, 2023

Essentially, the bridge_mode currently should be 1 in cudacpp and 0 in fortran. This means that there is an interplay between the value of the cudacpp_backend (formerly exec_mode) card and the fbridgemode card, which is an issue.

In particular, these parameters must currently be set in two places

  • gen_ximprove.py, where they are read from the runcards (and so in principle one can tweak fbridge mode because the backend is also known)
  • refine.sh... here actually @roiser told me that we should use env variables (the current version I took from his wip WIP: towards a workflow #620 is hardcoded to 1)

As a very quick workaround: I will change driver.f to accept bridge mode =1 also in fortran.

Eventually: as suggested by @oliviermattelaer I should actually remove these extra parameters from the input file (#658). I can add them as env variables.

valassi added a commit to valassi/madgraph4gpu that referenced this issue Jun 12, 2023
…led as madgraph5#690)

INFO: Running Survey
Creating Jobs
Working on SubProcesses
INFO:     P1_gg_ttx
INFO: Building madevent in madevent_interface.py with 'FORTRAN' matrix elements
INFO:  Idle: 1,  Running: 0,  Completed: 0 [ current time: 19h44 ]
INFO:  Idle: 0,  Running: 0,  Completed: 1 [  0.041s  ]
INFO:  Idle: 0,  Running: 0,  Completed: 1 [  0.041s  ]
INFO: End survey
refine 10000
Creating Jobs
INFO: Refine results to 10000
INFO: Generating 10000.0 unweighted events.
Error when reading /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/G1/results.dat
Command "generate_events -f" interrupted with error:
FileNotFoundError : [Errno 2] No such file or directory: '/data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/G1/results.dat'
Please report this bug on https://bugs.launchpad.net/mg5amcnlo
More information is found in '/data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.mad/run_01_tag_1_debug.log'.
Please attach this file to your report.
quit
INFO:

For comparison, this was CPP

INFO: Running Survey
Creating Jobs
Working on SubProcesses
INFO:     P1_gg_ttx
INFO: Building madevent in madevent_interface.py with 'CPP' matrix elements
INFO:  Idle: 1,  Running: 0,  Completed: 0 [ current time: 19h40 ]
INFO:  Idle: 0,  Running: 0,  Completed: 1 [  0.48s  ]
INFO:  Idle: 0,  Running: 0,  Completed: 1 [  0.48s  ]
INFO: End survey
refine 10000
Creating Jobs
INFO: Refine results to 10000
INFO: Generating 10000.0 unweighted events.
sum of cpu time of last step: 1 seconds
INFO: Effective Luminosity 27.314506051301194 pb^-1
INFO: need to improve 2 channels
- Current estimate of cross-section: 439.327 +- 3.240257989049637
    P1_gg_ttx
Building madevent in madevent_interface.py with 'CPP' matrix elements
INFO:  Idle: 8,  Running: 5,  Completed: 0 [ current time: 19h40 ]
INFO:  Idle: 0,  Running: 0,  Completed: 13 [  1.6s  ]
INFO: Combining runs
sum of cpu time of last step: 11 seconds
INFO: finish refine
refine 10000 --treshold=0.9
No need for second refine due to stability of cross-section
INFO: Combining Events
valassi added a commit to valassi/madgraph4gpu that referenced this issue Jun 12, 2023
…ace bridge_mode by 0 hardcoded to show that this fixes lauX.sh for fortran madgraph5#690 - will revert because this is a ugly hack
valassi added a commit to valassi/madgraph4gpu that referenced this issue Jun 12, 2023
… and vecsizeused from the runcard and madevent input file

Revert "[runcard] in ggtt.mad refine.sh and gen_ximprove.py, TEMPORARILY replace bridge_mode by 0 hardcoded to show that this fixes lauX.sh for fortran madgraph5#690 - will revert because this is a ugly hack"
This reverts commit 5a3cc1a.
valassi added a commit to valassi/madgraph4gpu that referenced this issue Jun 12, 2023
… fbridge_mode and vecsize_used and replace them by env variables (madgraph5#658, also related to madgraph5#690)

The two env variables are CUDACPP_RUNTIME_FBRIDGEMODE and CUDACPP_RUNTIME_VECSIZEUSED.
These are meant to be used only by expert developers, not for general users.
valassi added a commit to valassi/madgraph4gpu that referenced this issue Jun 12, 2023
@valassi valassi linked a pull request Jun 12, 2023 that will close this issue
@valassi
Copy link
Member Author

valassi commented Jun 12, 2023

This is fixed by MR #688 - where in the end I removed the two extra parameters (#658)

I am closing this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant