Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Fix 826" (actually: fix iconfig-channel mapping) #852

Merged
merged 8 commits into from
Jul 3, 2024
Merged

Conversation

oliviermattelaer
Copy link
Member

This is a proposal to fix the segfault issue for issue #826.
This will not fix the missmatch of cross-section (due to the coupling ordering missmatch) --but maybe we can open a different issue for that missmatch--.

Note that the issue was an out-of-bound access within the CPP part of the code. Related to the fact that the color information is designed to be for the channel information but that channelId contains the diagram number (which can be not identical).

Finally this require also a change in the handling of the second exporter on the mainstream madgraph to have an easy access to additional information to be sure that both cpp and madevent use the same convention.

@valassi
Copy link
Member

valassi commented May 31, 2024

Hi @oliviermattelaer thanks a lot, I am running a few tests.

I have a couple of commits to add for fixing clang formatting.

More importantly, the build fails for cuda

coloramps.h(14): error: dynamic initialization is not supported for a __device__ variable

I am having a look how to fix this too

valassi added a commit to valassi/madgraph4gpu that referenced this pull request May 31, 2024
valassi added a commit to valassi/madgraph4gpu that referenced this pull request May 31, 2024
valassi added a commit to valassi/madgraph4gpu that referenced this pull request May 31, 2024
valassi added a commit to valassi/madgraph4gpu that referenced this pull request May 31, 2024
…raph5#826 is NOT fixed) after Olivier's PR madgraph5#852 patch

./tmad/teeMadX.sh -susyggt1t1 +10x -makeclean
Copy link
Member

@valassi valassi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Olivier, thanks for asking the review. Various issues below.

Minor issue: clang formatting needed tweaks. Fixed in my #853.

Less minor issue: the std::map implementation does not work on cuda in device code. I tried to add constexpr but that is even worse. In the end I added a simple implementation based on arrays. Also in #853.

Major issue however: I still see no cross section. What is this PR fixing, exactly? Is there some tests that you and Stefan did which was fixed by this? Can you document it please? I see no improvement from this so far.

Related to that: I am puzzled by the naming, and by the code produced. The naming says "diag_to_channel", but actually the diag in the map goes up to 6, while there are only 6 diagrams. Is this Fortran numbering starting at 1? Do we need to remove 1? Specifically, your code gives
https://github.com/valassi/madgraph4gpu/blob/ade95e86ffe24c9adbb2a624b78f7ef6dca43bd1/epochX/cudacpp/susy_gg_t1t1.mad/SubProcesses/P1_gg_t1t1x/coloramps.h

  __device__ std::map<int, int> diag_to_channel = {
    { 2, 0 },
    { 3, 1 },
    { 4, 2 },
    { 5, 3 },
    { 6, 4 }, // note: a trailing comma in the initializer list is allowed
  };

Note that my coe equivalently gives this
https://github.com/valassi/madgraph4gpu/blob/a04a900b0ae6c36a103c85f10cb53c4e3f2c4382/epochX/cudacpp/susy_gg_t1t1.mad/SubProcesses/P1_gg_t1t1x/coloramps.h

  __device__ constexpr int diag_to_channel[7] = {
    -1, // 0 --> None
    -1, // 1 --> None
    +0, // 2 --> 0
    +1, // 3 --> 1
    +2, // 4 --> 2
    +3, // 5 --> 3
    +4  // 6 --> 4
  };

which fixes the std::map device, but gives wrong results. The fact that we need 7 entries looks weird to me.

Thanks!
Andrea

@oliviermattelaer
Copy link
Member Author

Thanks Andrea, (I thought that you were in Holliday)

Yes in my case in a haswell node, the code was compiling and producing (wrong) cross-section. (I did run it from the "madgraph" interface). The (wrong) cross-section was similar to the (wrong) cross-section on my mac that should be related to an ordering issue of the coupling that we still have to understand/fix.

The channelId indeed starts to count at 1 since 0 is used for "no multi-channel", so indeed if you want to use an array, you need to go up to seven. But in itself this is not directly related to fortran numbering scheme.

Now even if this does not merge #826 for you, I guess we should try to merge your #853 and then try to understand what it is still crashing for you?
In particular you/we should compile with the C equivalent of -fbounds-check which is super usefull to spot segfault who by definition are hardware specific.

@valassi
Copy link
Member

valassi commented Jun 1, 2024

Thanks Andrea, (I thought that you were in Holliday)

Hi Olivier, thanks to you. No actually I am here also on Monday and probably Tuesday (but I would really prefer to get other things done rather than debug this).

Yes in my case in a haswell node, the code was compiling and producing (wrong) cross-section. (I did run it from the "madgraph" interface). The (wrong) cross-section was similar to the (wrong) cross-section on my mac that should be related to an ordering issue of the coupling that we still have to understand/fix.

If you have a test that fails, can you try to make a reproducer please? Otherwise it is really difficult to communicate. Tell me the exact sequence that fails and where. Then I can try to find a haswell node for instance if it only fails there.

(Similarly, it would be great if you could tell me if my test scripts fail for you and why, and then we open a ticket to debug. Anyway, it is now quite obvious for me why my script fails: I call it for channel 1, and CPPProcess has no enhancement of channel 1, see below. I would like to fix that first, before any sigfpe crashes, which may actually be related. This is to say: if you do not manage to run the tmad scripts now, not a big problem, but eventually we should try to fix these).

The channelId indeed starts to count at 1 since 0 is used for "no multi-channel", so indeed if you want to use an array, you need to go up to seven. But in itself this is not directly related to fortran numbering scheme.

Anyway, let's go to the heart of this. I think I understand the problems a bit better (but I am still not sure if your patch makes sense or not). In my opinion there is certainly also a problem with fortran numbering.

To get the complete picture, there are many arrays and indices and objects involved here. I am talking about susy gg to t1t1.

  • There are 6 Feynman diagrams (https://github.com/valassi/madgraph4gpu/blob/7739f13b6cae4c5ed1d92b7b67f67385c270bc3e/epochX/cudacpp/susy_gg_t1t1.mad/SubProcesses/P1_gg_t1t1x/matrix1.pdf). Using Fortran/human numbering 1-6 as in the pdf and most code, diagram 1 is peculiar because it has a 4-particle vertex, so I imagine that this means that it does not qualify as a 'channel'.
  • There are 5 channels (because one of the six diagrams does not go through madevent single-diagram enhancement, probably the four-particle vertex diagram number 1). Throughout cudacpp code, variable channelId uses human/Fortran indexing 1-"N".
  • Then there is a variable channelIdC which uses C indexing 0-"N-1". This is what this line of code was doing, const unsigned int channelIdC = channelId - 1; // coloramps.h uses the C array indexing starting at 0, here
    const unsigned int channelIdC = channelId - 1; // coloramps.h uses the C array indexing starting at 0
    and this is what your patch changed. This is because channelIdC is then used to access C array icolamp, which uses C indexing.
  • The question here is: if there are 6 diagrams, and diagram 1 is not a channel, so there are 5 channels, are these channels (in Fortran/human/mg5amc indexing) called 1-5 (CASE1) or are they called 2-6 (CASE2)?. Note: a first bug in CPPProcess.cc is that it uses 2-6 in the calculation of the numerators
    if( channelId == 6 ) numerators_sv += cxabs2( amp_sv[0] );
    // Amplitude(s) for diagram number 6 [...] if( channelId == 6 ) numerators_sv += cxabs2( amp_sv[0] );, but then it has an icolamp with 5 elements only (i.e. an icolamp for channelId 1-5 which is channelIdC 0-4). Solution: either we must change the numbering in the numerators, or we must add one row in the icolamp array (or maybe use your diag_to_channel patch?), AND ALSO I have to stop using channel 1 in my tmad test (the original fundamental issue No cross section in SUSY gg_t1t1 log file #826) because channel 1 does not exist. (In that case I still have a question, would the G directoies in Fortran be called G1-G5 or G2-G6?).
  • To make things even more complex, there are 7 calculations of AMP (in Fortran) or amp_sv (aka amp_sv, in C++). The first diagram uses two of these AMP calculations. Anyway, by looking both at the Fortran and at cudacpp, this seems to look ok.

Anyway, as I read I kind of understand better your patch. It seems to work under CASE2. That is to say, channels are called 2-6 in Fortran and in input.txt, but then they must be translated to 0-4 in icolamp. @oliviermattelaer can you confirm that this is the case? Then in that case to solve my #826 I need to test channel 2 rather than channel 1. (Remember my tmad tests just run madevent with input.txt on one specific channel, and I have always used channel 1). (And then I also need to understand if G directories are called G1-G5 or G2-G6).

Now even if this does not merge #826 for you, I guess we should try to merge your #853

Before any merge, I'd like to understand the logic of the code and numbering, as above. Then coding it will become easier ;-)

By the way my PR 853 still has some CI failures, but maybe this is trivial clang formatting. I must have a look though.

and then try to understand what it is still crashing for you?

I repeat, I have two issues

  • 0 cross section, and this now I understand as caused by using channel 1, while I should use for instance 2
  • and then the crashes sigfpe, but I am not sure whether I still see some with your patch, I need to test again

In particular you/we should compile with the C equivalent of -fbounds-check which is super usefull to spot segfault who by definition are hardware specific.

Ah ok this is something else/new, interesting. Should we add this to all cuda/cpp compile options in your opionion?

Thanks again and sorry for the long post, but its'a c omplex issue with many variables.

Andrea

@valassi
Copy link
Member

valassi commented Jun 1, 2024

PS Looking more at the code in Fortran (this is master before any changes)

In https://github.com/madgraph5/madgraph4gpu/blob/aa297fe62c8d9d9026a8e2049b2b4e9a8840374d/epochX/cudacpp/susy_gg_t1t1.mad/SubProcesses/P1_gg_t1t1x/maxamps.inc

      INTEGER    MAXAMPS, MAXFLOW, MAXPROC, MAXSPROC
      PARAMETER (MAXAMPS=6, MAXFLOW=2)
      PARAMETER (MAXPROC=1, MAXSPROC=1)

In https://github.com/madgraph5/madgraph4gpu/blob/aa297fe62c8d9d9026a8e2049b2b4e9a8840374d/epochX/cudacpp/susy_gg_t1t1.mad/SubProcesses/P1_gg_t1t1x/matrix1.f

This is AMP2(MAXAMPS)

DOUBLE PRECISION AMP2(MAXAMPS), JAMP2(0:MAXFLOW)

This is AMP(2-6) so channels seem to be 2-6 (CASE2), using AMP(3-7), remember AMP(1-2) are from the first diagram

This is the numerator which seems to use AMP2(CHANNEL), so I would say that channel is indeed 2-6 (CASE2)

So I would indeed conclude that case2 is right, and I ned to use channel>1 to solve #826

@valassi
Copy link
Member

valassi commented Jun 1, 2024

Hi again, more debugging but it does not really advance.

I moved my tmad scripts from channel 1 to elsewhere

  • Now if I use channel 2 I get a crash (which before I was getting in channel 3?)
  • And if I use channel 3 I get no cross section in Fortran? Is it normal that madevent_fortran with input_app channel 3 gives 0 cross section?

I also realise that in any case the code must be better dcocumented in the diag_to_channel. With my current implementation, there are 7 array elements. Maybe better use only 6 (the number of diagrams), and keep the fact that fortran and c indexing are different. I mean this thing does two mappings: it skips one diagram with no channel, AND it also maps fortran to c indexing. Or maybe I have a bug there too, to be checked.

@valassi
Copy link
Member

valassi commented Jun 1, 2024

  • The question here is: if there are 6 diagrams, and diagram 1 is not a channel, so there are 5 channels, are these channels (in Fortran/human/mg5amc indexing) called 1-5 (CASE1) or are they called 2-6 (CASE2)?.

Ok I guessed another piece I think. The fortran part of mg5amc uses TWO DIFFERENT NOTATIONS internally

  • The channel number that gets sent to input_app.txt DOES always start with 1
  • Then the channelId used internally in the fortran can be two

Specific example here

  • My tmad tests were using "channel 1" in the file piped into madevent, for all processes
  • In almost all processes, this transaltes into fortran internals as channel=1
  • But in susy_gg_t1t1 (and I just found out also SM ggttgg!) this transaltes to channel 2

On current uupstream master

[avalassi@itscrd90 gcc11/usr] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp> grep ChannelId tmad/logs*/* | sort -u
tmad/logs_eemumu_mad/log_eemumu_mad_d_inl0_hrd0.txt: [XSECTION] ChannelId = 1
tmad/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0.txt: [XSECTION] ChannelId = 1
tmad/logs_eemumu_mad/log_eemumu_mad_m_inl0_hrd0.txt: [XSECTION] ChannelId = 1
tmad/logs_ggttggg_mad/log_ggttggg_mad_d_inl0_hrd0.txt: [XSECTION] ChannelId = 1
tmad/logs_ggttggg_mad/log_ggttggg_mad_f_inl0_hrd0.txt: [XSECTION] ChannelId = 1
tmad/logs_ggttggg_mad/log_ggttggg_mad_m_inl0_hrd0.txt: [XSECTION] ChannelId = 1
tmad/logs_ggttgg_mad/log_ggttgg_mad_d_inl0_hrd0.txt: [XSECTION] ChannelId = 2
tmad/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0.txt: [XSECTION] ChannelId = 2
tmad/logs_ggttgg_mad/log_ggttgg_mad_m_inl0_hrd0.txt: [XSECTION] ChannelId = 2
tmad/logs_ggttg_mad/log_ggttg_mad_d_inl0_hrd0.txt: [XSECTION] ChannelId = 1
tmad/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0.txt: [XSECTION] ChannelId = 1
tmad/logs_ggttg_mad/log_ggttg_mad_m_inl0_hrd0.txt: [XSECTION] ChannelId = 1
tmad/logs_ggtt_mad/log_ggtt_mad_d_inl0_hrd0.txt: [XSECTION] ChannelId = 1
tmad/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0.txt: [XSECTION] ChannelId = 1
tmad/logs_ggtt_mad/log_ggtt_mad_m_inl0_hrd0.txt: [XSECTION] ChannelId = 1
tmad/logs_gqttq_mad/log_gqttq_mad_d_inl0_hrd0.txt: [XSECTION] ChannelId = 1
tmad/logs_gqttq_mad/log_gqttq_mad_f_inl0_hrd0.txt: [XSECTION] ChannelId = 1
tmad/logs_gqttq_mad/log_gqttq_mad_m_inl0_hrd0.txt: [XSECTION] ChannelId = 1
tmad/logs_heftggbb_mad/log_heftggbb_mad_d_inl0_hrd0.txt: [XSECTION] ChannelId = 1
tmad/logs_heftggbb_mad/log_heftggbb_mad_f_inl0_hrd0.txt: [XSECTION] ChannelId = 1
tmad/logs_heftggbb_mad/log_heftggbb_mad_m_inl0_hrd0.txt: [XSECTION] ChannelId = 1
tmad/logs_smeftggtttt_mad/log_smeftggtttt_mad_d_inl0_hrd0.txt: [XSECTION] ChannelId = 1
tmad/logs_smeftggtttt_mad/log_smeftggtttt_mad_f_inl0_hrd0.txt: [XSECTION] ChannelId = 1
tmad/logs_smeftggtttt_mad/log_smeftggtttt_mad_m_inl0_hrd0.txt: [XSECTION] ChannelId = 1
tmad/logs_susyggt1t1_mad/log_susyggt1t1_mad_d_inl0_hrd0.txt: [XSECTION] ChannelId = 2
tmad/logs_susyggt1t1_mad/log_susyggt1t1_mad_f_inl0_hrd0.txt: [XSECTION] ChannelId = 2
tmad/logs_susyggt1t1_mad/log_susyggt1t1_mad_m_inl0_hrd0.txt: [XSECTION] ChannelId = 2
tmad/logs_susyggtt_mad/log_susyggtt_mad_d_inl0_hrd0.txt: [XSECTION] ChannelId = 1
tmad/logs_susyggtt_mad/log_susyggtt_mad_f_inl0_hrd0.txt: [XSECTION] ChannelId = 1
tmad/logs_susyggtt_mad/log_susyggtt_mad_m_inl0_hrd0.txt: [XSECTION] ChannelId = 1

The above is a printout which comes from fortran madevent

So it is neither case1 nor case2, it is something in between. And by the way ggttgg was working fine all the time

@valassi
Copy link
Member

valassi commented Jun 1, 2024

Note, channel id printed in XSECTION by madX.sh is what auto_dsig1.f prints as CHANNEL_ID

[avalassi@itscrd90 gcc11/usr] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg> grep CHANNEL_ID *
auto_dsig1.f:        WRITE (*,*) 'CHANNEL_ID =', CHANNEL
[avalassi@itscrd90 gcc11/usr] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg> grep CHANNEL_ID ../../../tmad/madX.sh 
  chid=$(cat ${tmp} | grep --binary-files=text 'CHANNEL_ID =' | awk '{print $NF}')

This means that a 1 in the input.txt file may become a 2 in CHANNEL_ID in auto_dsig1.f:

  • in this case "1" means "the first valid diagram", not diagram 1
  • in this case "2" in CHANNEL_ID does mean "diagram 2"

@valassi
Copy link
Member

valassi commented Jun 1, 2024

Ok so "1" in input.txt is iconfig. This is driver.f

      write(*,10) 'Enter Configuration Number: '
      read(*,*) dconfig
c     ncode is number of digits needed for the BW code
      ncode=int(dlog10(3d0)*(max_particles-3))+1
      iconfig = int(dconfig*(1+10**(-ncode)))
      write(*,12) 'Running Configuration Number: ',iconfig
      diag_number = iconfig

Notye: here it says diag_number. But this is NOT the diag number.
In the ggttgg and susyggt1t1 example, icnfig is 1, bt this is disagram 2...

@valassi
Copy link
Member

valassi commented Jun 1, 2024

Anyway this diag_number only happens in addmothers.f, so nnot very relevant

grep -i diag_number ../../ -r
../../SubProcesses/P1_gg_ttxgg/driver.f:      integer ncall,itmax,itmin,iconfig, diag_number
../../SubProcesses/P1_gg_ttxgg/driver.f:      common/to_diag_number/diag_number
../../SubProcesses/P1_gg_ttxgg/driver.f:      diag_number = iconfig
../../SubProcesses/addmothers.f:      integer diag_number
../../SubProcesses/addmothers.f:      common/to_diag_number/diag_number
../../bin/internal/hel_recycle.py:    def good_helicity(cls, wavs, graph, diag_number=None, all_hel=[], bad_hel_amp=[]):
../../bin/internal/hel_recycle.py:        if diag_number and this_comb_good and cls.ext_deps:
../../bin/internal/hel_recycle.py:            if (hel_number,diag_number) in bad_hel_amp:        

@valassi
Copy link
Member

valassi commented Jun 1, 2024

And channel comes from subdiag, which is somehow related to inconfig

grep -i subdiag ../.. -r
../../SubProcesses/P1_gg_ttxgg/auto_dsig1.f:      INTEGER SUBDIAG(MAXSPROC),IB(2)
../../SubProcesses/P1_gg_ttxgg/auto_dsig1.f:      COMMON/TO_SUB_DIAG/SUBDIAG,IB
../../SubProcesses/P1_gg_ttxgg/auto_dsig1.f:      CHANNEL = SUBDIAG(1)
../../SubProcesses/P1_gg_ttxgg/auto_dsig1.f:      INTEGER SUBDIAG(MAXSPROC),IB(2)
../../SubProcesses/P1_gg_ttxgg/auto_dsig1.f:      COMMON/TO_SUB_DIAG/SUBDIAG,IB
../../SubProcesses/P1_gg_ttxgg/auto_dsig1.f:        CHANNEL = SUBDIAG(1)
../../SubProcesses/P1_gg_ttxgg/auto_dsig.f:C     SUBDIAG is vector of diagram numbers for this config
../../SubProcesses/P1_gg_ttxgg/auto_dsig.f:      INTEGER SUBDIAG(MAXSPROC),IB(2)
../../SubProcesses/P1_gg_ttxgg/auto_dsig.f:      COMMON/TO_SUB_DIAG/SUBDIAG,IB
../../SubProcesses/P1_gg_ttxgg/auto_dsig.f:        SUBDIAG(I) = CONFSUB(I,SYMCONF(ICONF))
../../SubProcesses/P1_gg_ttxgg/auto_dsig.f:C     SUBDIAG is vector of diagram numbers for this config
../../SubProcesses/P1_gg_ttxgg/auto_dsig.f:      INTEGER SUBDIAG(MAXSPROC),IB(2)
../../SubProcesses/P1_gg_ttxgg/auto_dsig.f:      COMMON/TO_SUB_DIAG/SUBDIAG,IB
../../SubProcesses/P1_gg_ttxgg/auto_dsig.f:          SUBDIAG(I) = CONFSUB(I,SYMCONF(ICONF))
../../SubProcesses/P1_gg_ttxgg/auto_dsig.f:C     SUBDIAG is vector of diagram numbers for this config
../../SubProcesses/P1_gg_ttxgg/auto_dsig.f:      INTEGER SUBDIAG(MAXSPROC),IB(2)
../../SubProcesses/P1_gg_ttxgg/auto_dsig.f:      COMMON/TO_SUB_DIAG/SUBDIAG,IB
../../SubProcesses/P1_gg_ttxgg/auto_dsig.f:          SUBDIAG(I) = CONFSUB(I,SYMCONF(ICONF))
../../SubProcesses/reweight.f:      INTEGER SUBDIAG(MAXSPROC),IB(2)
../../SubProcesses/reweight.f:      COMMON/TO_SUB_DIAG/SUBDIAG,IB

@valassi
Copy link
Member

valassi commented Jun 1, 2024

Ok now I doscovered this one. Probably the relevant one (for ggttgg it is interesting but much longer)

[avalassi@itscrd90 gcc11/usr] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/susy_gg_t1t1.mad/SubProcesses/P1_gg_t1t1x> grep -i confsub ../.. -r |& grep -v binary
../../SubProcesses/P1_gg_t1t1x/matrix1.f:      INTEGER CONFSUB(MAXSPROC,LMAXCONFIGS)
../../SubProcesses/P1_gg_t1t1x/matrix1.f:          J = CONFSUB(1, I)
../../SubProcesses/P1_gg_t1t1x/config_subproc_map.inc:      DATA (CONFSUB(I,1),I=1,1)/2/
../../SubProcesses/P1_gg_t1t1x/config_subproc_map.inc:      DATA (CONFSUB(I,2),I=1,1)/3/
../../SubProcesses/P1_gg_t1t1x/config_subproc_map.inc:      DATA (CONFSUB(I,3),I=1,1)/4/
../../SubProcesses/P1_gg_t1t1x/config_subproc_map.inc:      DATA (CONFSUB(I,4),I=1,1)/5/
../../SubProcesses/P1_gg_t1t1x/config_subproc_map.inc:      DATA (CONFSUB(I,5),I=1,1)/6/

The question is, is this something that we should mimick in cudacpp? @oliviermattelaer is this confsub what your patch is emulating in diag_to_channel?

Anyway: I think we really need to clarify which variables exist and what their meaning is, before going on... And we should try to use the same names in fortran and cudacpp...

@valassi
Copy link
Member

valassi commented Jun 1, 2024

Last comment for the evening. I updated the branch for #853.

  • I made it clear in madX.sh that the last line is ICONFIG, not CHANNEL_ID
  • As such, ICONFIG=1 is always valid, there is no need to use a different ICONFIG>1 for susy_gg_t1t1
  • And then, the fact that I still find a 0 cross section in cudacpp for ICONFIG=1 is indeeed still a bug (No cross section in SUSY gg_t1t1 log file #826)
  • And note also that Olivier's diag_to_channel patch may fix another bug, but probably not this: it only touches icolamp which determines the event-by-event color in LHE files, but should have no influence on cross sections...

@valassi
Copy link
Member

valassi commented Jun 2, 2024

I propose to clean up the code to document the mappings better as follows, see #853

In coloramps.h

namespace mgOnGpu /* clang-format off */
{
  // Summary of numbering and indexing conventions for the relevant concepts (see issue #826 and PR #852)
  // - Diagram number (no variable) in [1, N_diagrams]: all values are allowed (N_diagrams distinct values)
  //   => this number is displayed for information before each block of code in CPPProcess.cc
  // - Channel number ("channelId" in C, CHANNEL_ID in F) in [1, N_diagrams]: not all values are allowed (N_config <= N_diagrams distinct values)
  //   => this number (with F indexing) is passed around as an API argument between cudacpp functions
  // - Channel number in C indexing: "channelIdC" = channelID - 1
  //   => this number (with C indexing) is used as the index of the channelIdC_to_iconfig array below
  // - Config number ("iconfig" in C, ICONFIG in F) in [1, N_config]: all values are allowed (N_config <= N_diagrams distinct values)
  // - Config number in C indexing: "iconfigC" = iconfig - 1
  
  // Map channelIdC (in C indexing, i.e. channelId-1) to iconfig (in F indexing)
  // This array has N_diagrams elements, but only N_config <= N_diagrams valid (non-zero) values
  __device__ constexpr int channelIdC_to_iconfig[6] = {
    0, // channelId=1 (diagram=1) i.e. channelIdC=0 --> iconfig=None
    1, // channelId=2 (diagram=2) i.e. channelIdC=1 --> iconfig=1
    2, // channelId=3 (diagram=3) i.e. channelIdC=2 --> iconfig=2
    2, // channelId=4 (diagram=4) i.e. channelIdC=3 --> iconfig=3
    3, // channelId=5 (diagram=5) i.e. channelIdC=4 --> iconfig=4
    4  // channelId=6 (diagram=6) i.e. channelIdC=5 --> iconfig=5
  };

  // Map iconfigC (in C indexing, i.e. iconfig-1) to the set of allowed colors
  // This array has N_config <= N_diagrams elements
  __device__ constexpr bool icolamp[5][2] = {
    {  true,  true }, // iconfig=1 i.e. iconfigC=0
    {  true,  true }, // iconfig=2 i.e. iconfigC=1
    {  true, false }, // iconfig=3 i.e. iconfigC=2
    {  true, false }, // iconfig=4 i.e. iconfigC=3
    { false,  true }  // iconfig=5 i.e. iconfigC=4
  };

} /* clang-format off */

In CPPProcess.cc for instance

#ifdef MGONGPU_SUPPORTS_MULTICHANNEL
    // Event-by-event random choice of color #402
    if( channelId != 0 ) // no event-by-event choice of color if channelId == 0 (fix FPE #783)
    {
      const unsigned int channelIdC = channelId - 1; // channelIdC_to_iconfig in coloramps.h uses the C array indexing starting at 0
      const unsigned int iconfig = mgOnGpu::channelIdC_to_iconfig[channelIdC]; // map N_diagrams to N_config <= N_diagrams configs (see #826 and #852)
      const unsigned int iconfigC = iconfig - 1; // icolamp in coloramps.h uses the C array indexing starting at 0
      fptype targetamp[ncolor] = { 0 };
      for( int icolC = 0; icolC < ncolor; icolC++ )
      {
        if( icolC == 0 )
          targetamp[icolC] = 0;
        else
          targetamp[icolC] = targetamp[icolC - 1];
        if( mgOnGpu::icolamp[iconfigC][icolC] ) targetamp[icolC] += jamp2_sv[icolC];
      }
      //printf( "sigmaKin: ievt=%4d rndcol=%f\n", ievt, allrndcol[ievt] );
      for( int icolC = 0; icolC < ncolor; icolC++ )
      {
        if( allrndcol[ievt] < ( targetamp[icolC] / targetamp[ncolor - 1] ) )
        {
          allselcol[ievt] = icolC + 1; // NB Fortran [1,ncolor], cudacpp [0,ncolor-1]
          break;
        }
      }
    }
#endif

I think that this is needed and is in my understanding what @oliviermattelaer you were correctly fixing. Does th eabove make sense? (At least: did I get the meaning of the numbers right?) Thanks.

The problem is however that, in the specific case of susy_gg_t1t1, this makes no difference, and I still get a zero cross section with iconfig=1 (ie channelId=2), there must be something else to fix.

And I have not tried codegen backports so I am not sure what it will give to other processes

@valassi
Copy link
Member

valassi commented Jun 2, 2024

In particular you/we should compile with the C equivalent of -fbounds-check which is super usefull to spot segfault who by definition are hardware specific.

Ah ok this is something else/new, interesting. Should we add this to all cuda/cpp compile options in your opionion?

I had a quick look. I am not sure this is possible in C++ at compile time. But valgrind should be good enough for similar runtime checks. I have run some tests for this in #855.

valassi added a commit to valassi/madgraph4gpu that referenced this pull request Jun 2, 2024
…SIGFPE madgraph5#855, and add volatile in aloha_functions.f to try to fix it

The SIGFPE crash madgraph5#855 does seem to disappear in
  ./tmad/madX.sh -ggttgg -iconfig 104 -makeclean
However, there is now a DIFFERENT issue, an lhe file mismatch between fortran and cpp (madgraph5#856)
This is probably due to the iconfig/channel mapping issue reported by Olivier in madgraph5#852
@oliviermattelaer
Copy link
Member Author

Hi Andrea,

Following our discussion, I have changed the name to match your convention and put your change in the auto-generated code mechanism.
So with this last push the code looks like this:

// Copyright (C) 2020-2024 CERN and UCLouvain.
// Licensed under the GNU Lesser General Public License (version 3 or later).
// Created by: A. Valassi (Dec 2022) for the MG5aMC CUDACPP plugin.
// Further modified by: A. Valassi (2022-2024) for the MG5aMC CUDACPP plugin.

#ifndef COLORAMPS_H
#define COLORAMPS_H 1

#include <map>

namespace mgOnGpu
{
  // Summary of numbering and indexing conventions for the relevant concepts (see issue #826 and PR #852)
  // - Diagram number (no variable) in [1, N_diagrams]: all values are allowed (N_diagrams distinct values)
  //   => this number is displayed for information before each block of code in CPPProcess.cc
  // - Channel number (CHANNEL_ID) in [0, N_diagrams]: not all values are allowed (N_config <= N_diagrams distinct values)
  //   => this number (with indexing like ps/pdf output) is passed around as an API argument between cudacpp functions
  //   0 is allowed to fallback to no multi-channel mode.
  // - Channel number in C indexing: "IconfiC", this is the equivalent of the Fortran iconfig
  //   iconfigC = iconfig -1
  //   provides a continuous index [0, N_config-1] for array
  //  iconfigC = ChannelId_to_iconfigC[channelId]
  //NOTE: All those ordering are event by event specific (with the intent to have those fix within a vector size/wrap

  // Map channelId to iconfigC
  // This array has N_diagrams+1 elements, but only N_config <= N_diagrams valid values
  // unvalid values are set to -1
  // The 0 entry is a fall back to still write events even if no multi-channel is setup (wrong color selected in that mode)
    __device__ constexpr int channelId_to_iconfigC[7] = {
     0, // channelId=0: This value means not multi-channel, color will be wrong anyway -> pick the first
     -1, // channelId=1 (diagram=1): Not consider as a channel of integration (presence of 4 point interaction?)
     0, // channelId=2 (diagram=2) i.e. channelIdC=0 --> iconfig=1
     1, // channelId=3 (diagram=3) i.e. channelIdC=1 --> iconfig=2
     2, // channelId=4 (diagram=4) i.e. channelIdC=2 --> iconfig=3
     3, // channelId=5 (diagram=5) i.e. channelIdC=3 --> iconfig=4
     4, // channelId=6 (diagram=6) i.e. channelIdC=4 --> iconfig=5
  };

  // Map iconfigC (in C indexing, i.e. iconfig-1) to the set of allowed colors
  // This array has N_config <= N_diagrams elements
    __device__ constexpr bool icolamp[5][2] = {
    {true, true}, // channelIdC=0
    {true, true}, // channelIdC=1
    {true, false}, // channelIdC=2
    {true, false}, // channelIdC=3
    {false, true}, // channelIdC=4
  };

}
#endif // COLORAMPS_H

With this change, the cross-section of G1 is still zero (same for G2) and the cross-section reported by G3 is wrong.
Those three issues should be look at when the issue with the coupling is understood

If possible, would you be able to comment on the above change? (In particular if you found them so bad that you do not want us to merge it in master.)

PS: I will check the testsuite to check the failure (I guess a lot of clang formatting will not go through)

@oliviermattelaer
Copy link
Member Author

I have tried to fix the clang formatting
and fix some more comment to the code (they were some of them that was not updated correctly in the text above.
Now the auto-generated code is like this:

// Copyright (C) 2020-2024 CERN and UCLouvain.
// Licensed under the GNU Lesser General Public License (version 3 or later).
// Created by: A. Valassi (Dec 2022) for the MG5aMC CUDACPP plugin.
// Further modified by: A. Valassi (2022-2024) for the MG5aMC CUDACPP plugin.

#ifndef COLORAMPS_H
#define COLORAMPS_H 1

#include <map>

namespace mgOnGpu
{
  // Summary of numbering and indexing conventions for the relevant concepts (see issue #826 and PR #852)
  // - Diagram number (no variable) in [1, N_diagrams]: all values are allowed (N_diagrams distinct values)
  //   => this number is displayed for information before each block of code in CPPProcess.cc
  // - Channel number (CHANNEL_ID) in [0, N_diagrams]: not all values are allowed (N_config <= N_diagrams distinct values)
  //   => this number (with indexing like ps/pdf output) is passed around as an API argument between cudacpp functions
  //   0 is allowed to fallback to no multi-channel mode.
  // - Channel number in C indexing: "IconfiC", this is the equivalent of the Fortran iconfig
  //   iconfigC = iconfig -1
  //   provides a continuous index [0, N_config-1] for array
  //  iconfigC = ChannelId_to_iconfigC[channelId]
  //NOTE: All those ordering are event by event specific (with the intent to have those fix within a vector size/wrap

  // Map channelId to iconfigC
  // This array has N_diagrams+1 elements, but only N_config <= N_diagrams valid values
  // unvalid values are set to -1
  // The 0 entry is a fall back to still write events even if no multi-channel is setup (wrong color selected in that mode)
    __device__ constexpr int channelId_to_iconfigC[7] = {
     0, // channelId=0: This value means not multi-channel, color will be wrong anyway -> pick the first
     -1, // channelId=1 (diagram=1): Not consider as a channel of integration (presence of 4 point interaction?)
     0, // channelId=2 (diagram=2) --> iconfig=1 (f77 conv) and iconfigC=0 (c conv)
     1, // channelId=3 (diagram=3) --> iconfig=2 (f77 conv) and iconfigC=1 (c conv)
     2, // channelId=4 (diagram=4) --> iconfig=3 (f77 conv) and iconfigC=2 (c conv)
     3, // channelId=5 (diagram=5) --> iconfig=4 (f77 conv) and iconfigC=3 (c conv)
     4, // channelId=6 (diagram=6) --> iconfig=5 (f77 conv) and iconfigC=4 (c conv)
  };

  // Map iconfigC (in C indexing, i.e. iconfig-1) to the set of allowed colors
  // This array has N_config <= N_diagrams elements
    __device__ constexpr bool icolamp[5][2] = {
    {true, true}, // iconfigC=0, diag=2
    {true, true}, // iconfigC=1, diag=3
    {true, false}, // iconfigC=2, diag=4
    {true, false}, // iconfigC=3, diag=5
    {false, true}, // iconfigC=4, diag=6
  };

}
#endif // COLORAMPS_H

@oliviermattelaer
Copy link
Member Author

Let me reply point by point to your "need change":

Less minor issue: the std::map implementation does not work on cuda in device code. I tried to add constexpr but that is even worse. In the end I added a simple implementation based on arrays. Also in #853.

Ok I have moved to the auto-generation generating the arrays mode

Major issue however: I still see no cross section. What is this PR fixing, exactly? Is there some tests that you and Stefan did which was fixed by this? Can you document it please? I see no improvement from this so far.

So, this does fix some SIGFPE (especially for G3) and likely fix some missmatch in the color of events that you report in other bug (but this needs to be confirmed).

Related to that: I am puzzled by the naming, and by the code produced. The naming says "diag_to_channel", but actually the diag in the map goes up to 6, while there are only 6 diagrams

Given our conversation today, I have changed the naming scheme to yours.
I have also added many comment information inside that file to make clear what is what.

The fact that we need 7 entries looks weird to me.

That's what i have kept since this is the range of channel_Id. And I would like to avoid to use a channel_IdC (as long as, they are no array associated to it) --which seems to be the case--

So I think that this is good for your to re-review it. (sorry to send so many answer to this thread in a row)

valassi added a commit to valassi/madgraph4gpu that referenced this pull request Jun 26, 2024
…g AS-IS Olivier's patches from the latest fix_826 branch for PR madgraph5#852

The gg_ttgg test still crashes (rotxxx madgraph5#855?)
./tmad/madX.sh -ggttgg -iconfig 104 -makeclean
  *** (2-none) EXECUTE MADEVENT_CPP x1 (create events.lhe) ***
  Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.
  Backtrace for this error:
   0  0x7fce5ec23860 in ???
   1  0x7fce5ec22a05 in ???
   2  0x7fce5e854def in ???
   3  0x44b5ff in ???
   4  0x4087df in ???
   5  0x409848 in ???
   6  0x40bb83 in ???
   7  0x40d1a9 in ???
   8  0x45c804 in ???
   9  0x434269 in ???
   10  0x40371e in ???
   11  0x7fce5e83feaf in ???
   12  0x7fce5e83ff5f in ???
   13  0x403844 in ???
   14  0xffffffffffffffff in ???
  ./tmad/madX.sh: line 387: 3913008 Floating point exception(core dumped) $timecmd $cmd < ${tmpin} > ${tmp}

The susy_gg_t1t1 test also still crashes (see madgraph5#826?), this looks like the same crash as ggttgg above
./tmad/madX.sh -susyggt1t1 -iconfig 2 -makeclean
  *** (2-none) EXECUTE MADEVENT_CPP x1 (create events.lhe) ***
  Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.
  Backtrace for this error:
   0  0x7f9f03423860 in ???
   1  0x7f9f03422a05 in ???
   2  0x7f9f03054def in ???
   3  0x43809f in ???
   4  0x40581f in ???
   5  0x4067b1 in ???
   6  0x408c71 in ???
   7  0x40a0a9 in ???
   8  0x444fdf in ???
   9  0x42bb38 in ???
   10  0x40371e in ???
   11  0x7f9f0303feaf in ???
   12  0x7f9f0303ff5f in ???
   13  0x403844 in ???
   14  0xffffffffffffffff in ???
  ./tmad/madX.sh: line 387: 3907179 Floating point exception(core dumped) $timecmd $cmd < ${tmpin} > ${tmp}

The gqttq test also still crashes intermittently, i.e. only on the second execution (madgraph5#845?)
./tmad/teeMadX.sh -gqttq +10x -fltonly -makeclean
./tmad/teeMadX.sh -gqttq +10x -fltonly
  Executing ' ./build.512z_f_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_gqttq_x1_cudacpp > /tmp/avalassi/output_gqttq_x1_cudacpp'
  Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.
  Backtrace for this error:
   0  0x7fbafa623860 in ???
   1  0x7fbafa622a05 in ???
   2  0x7fbafa254def in ???
   3  0x7fbafad24034 in ???
   4  0x7fbafa9a1575 in ???
   5  0x7fbafad20c89 in ???
   6  0x7fbafad2abfd in ???
   7  0x7fbafad30491 in ???
   8  0x43008b in ???
   9  0x431c10 in ???
   10  0x432d47 in ???
   11  0x433b1e in ???
   12  0x44a921 in ???
   13  0x42ebbf in ???
   14  0x40371e in ???
   15  0x7fbafa23feaf in ???
   16  0x7fbafa23ff5f in ???
   17  0x403844 in ???
   18  0xffffffffffffffff in ???
  ./madX.sh: line 387: 3922797 Floating point exception(core dumped) $timecmd $cmd < ${tmpin} > ${tmp}
  ERROR! ' ./build.512z_f_inl0_hrd0/madevent_cpp < /tmp/avalassi/input_gqttq_x1_cudacpp > /tmp/avalassi/output_gqttq_x1_cudacpp' failed
Copy link
Member

@valassi valassi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @oliviermattelaer thanks for asking the re-review :-)

I have done more tests, as discussed yesterday at the meeting.
The results are in PR #870 (which is NOT meant to be merged, it just includes code and test logs for reference).

The summary: NO GOOD.

Details in d8df7ce

My conclusion:

Thanks
Andrea

valassi added a commit to valassi/madgraph4gpu that referenced this pull request Jun 28, 2024
valassi added a commit to valassi/madgraph4gpu that referenced this pull request Jun 28, 2024
valassi added a commit to valassi/madgraph4gpu that referenced this pull request Jun 28, 2024
…graph5#826 is NOT fixed) after Olivier's first two commits in PR madgraph5#852

./tmad/teeMadX.sh -susyggt1t1 +10x -makeclean

(PS: it is NOW quite obvious to me that PR madgraph5#852 should be tested against different issues like ggttgg madgraph5#856, NOT against susy madgraph5#826)
(PS: will now therefore slowly move to different tests - first, however, recover my coloramps.h proposal in susy and Olivier's later changes)
valassi added a commit to valassi/madgraph4gpu that referenced this pull request Jun 28, 2024
…n and tmad logs

Revert "[color] temporarely regenerate susy_gg_t1t1 again, just to check all is ok - will revert"
This reverts commit 28a0177.

Revert "[color] rerun tmad test for susy_gg_t1t1: still no cross section (madgraph5#826 is NOT fixed) after Olivier's first two commits in PR madgraph5#852"
This reverts commit 6baf004.
valassi added a commit to valassi/madgraph4gpu that referenced this pull request Jun 28, 2024
…Formatting.sh

** NB: the code is now at the same level as Olivier's PR madgraph5#852, but also includes my relevant commit history from PR madgraph5#853 **
** NB: in particular, susy_gg_t1t1.mad includes my proposal for the coloramps.h mappings **

Revert "[susy] in CODEGEN/checkFormatting.sh, add coloramps.h to the list of files checked with clang formatting"
This reverts commit 08f4ab2.
@valassi valassi changed the title Fix 826 "Fix 826" (actually: fix iconfig-channel mapping) Jun 28, 2024
@valassi
Copy link
Member

valassi commented Jun 28, 2024

Hi @oliviermattelaer,

I have now almost completed MR #873, which includes all of your commits here in PR #852, with additional modifcations and fixes. I suggest to close this one, or otherwise leave it open until we merge PR #873 (at that point, this one will normally be automatically closed... I am not 100% sure because the submodule may complicate this, butnormally I took care of this I hope).

I suggest we move the discussion to #873 instead (and mg5amcnlo/mg5amcnlo#115)

Thanks again for the help on this!
Andrea

PS I took the liberty of renaming this, because I think this will not fix 826. It will normally fix 856 (together with my other patches), instead.

@valassi
Copy link
Member

valassi commented Jul 3, 2024

I will finally merge this now at last. Thanks Olivier for the patch and for the patience discussing further changes to this.

Further fixes and cosmetic improvements to this patch are included in #877, such as

Note a couple of points

Thanks again Olivier

@valassi valassi merged commit 3ffb82a into master Jul 3, 2024
150 of 169 checks passed
valassi added a commit to valassi/madgraph4gpu that referenced this pull request Jul 3, 2024
@oliviermattelaer
Copy link
Member Author

Thanks Andrea!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants