-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for nprocesses>2 (i.e. beyond mirror processes) in cudacpp to speed up directory handling? #951
Comments
For SIMD, it should be easy to fix the issue since the only thing to do is to be able to link to multiple cpp library simultaneously and fortran can (already) automatically switch between those matrix-element. However if you link to CUDA, this will not work since each of those call will correspond to the a kernel call and therefore you can not scale this method to CUDA. One method, discussed at the meeting today, was to change the way the Z interaction is handle (at the model level). Note that putting this approach to the extreme (allowing for "zero" coupling in some of the matrix-element) could have allow to have the exact number of difectory as fortran with still only one matrix file. But this is not a super easy things to implement (i.e. after the release) |
I have done some additional pre-investigation here. They are two reasons that I was missing yesterday
This is still a nice factor of two but maybe not worthed it. So if we go into that direction, we should also include interaction with zero coupling such that we can merge process like |
Thanks Olivier!
Just one comment here. From my recent tests I do not have evidence yet that this causes runtime performance issues. The build is clearly slow, but the event generation seems to have the bottleneck inside madevent fortran (pdfs and random to momenta I guess), not python combine events. (Well for cuda the combine events does appear maybe). So, mainly build time for now? |
This is a followup to the old and recent discussions about nprocesses==1 (or at most nprocesses==2 with mirror processes) in cudacpp.
So far cudacpp always treats one subprocess at a time and splits them explicitly. For instance uux_xxx and uu_xxx and uc_xxx instead of a generic qq_xxx. This has so far allowed an implementation without arrays(nprocesses). From a functionality point of view, so far this works (modulo the few recent tweaks for mirror processes eg #872, which are being sorted out).
From a usability point of view, however, this is a nuisance.
This is a relatively big chunk of work, that touches essentially all the code we have. I open this to have it on the todo list...
PS For the full list of related issues see https://github.com/madgraph5/madgraph4gpu/issues?q=nprocesses
This includes for instance #272, #343, #534, #635...
The text was updated successfully, but these errors were encountered: