Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialize arrays and scalars to signaling NaNs for Intel in debug mode #5801

Merged
merged 8 commits into from
Sep 26, 2023

Conversation

amametjanov
Copy link
Member

@amametjanov amametjanov commented Jul 11, 2023

Add -init=snan,arrays to Intel's debug-compile flags.
Also, initialize EAM's and ELM's uninitialized arrays, which were flagged by compiler's run-time checks.

[BFB]

@amametjanov amametjanov added enhancement Machine Files BFB PR leaves answers BFB intel compiler Intel compilers labels Jul 11, 2023
@amametjanov amametjanov self-assigned this Jul 11, 2023
@amametjanov
Copy link
Member Author

This passes e3sm_integration in baseline-comparison mode: https://my.cdash.org/viewTest.php?buildid=2369388 .

@ndkeen
Copy link
Contributor

ndkeen commented Jul 18, 2023

Fixes #5597

Also re-Fixes #5636

@@ -1757,6 +1757,8 @@ subroutine diag_phys_writeout(state, psl)

!! Boundary layer atmospheric stability, temperature, water vapor diagnostics

p_surf_t1 = 0._r8
p_surf_t2 = 0._r8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, @amametjanov . In the same spirit, we probably should also initialize p_surf_q1 and p_surf_q2. vertinterp calculations only loop over ncol. Just like p_surf_t1 and p_surf_t2, p_surf_q1 and p_surf_q2 could also be invoked in full array operation when certain output variable is specified.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done.

@@ -1833,6 +1833,7 @@ subroutine clubb_tend_cam( &
! At each CLUBB call, initialize mean momentum and thermo CLUBB state
! from the CAM state

rvm = 0._r8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly an initialization of it has no harm. I do wonder if an error was detected with the new debug compilation options. Only 1 to ncol in rvm are used in calculation, and they always have proper values to begin with (from state1%q). If an invalid is reported without the rvm initialization, it seems an over-kill.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Array accesses are over pverp also in rvm(1:ncol,1:pverp):

1944    do i=1,ncol   ! loop over columns
...
2047       do k=1,pverp
...
2060          rvm_in(k)     = real(rvm(i,pverp-k+1), kind = core_rknd)

This was needed for SMS_D_Ln5.ne4_oQU240.F2010.chrysalis_intel.eam-clubb_sp. Without this initialization, error is

16: forrtl: error (182): floating invalid - possible uninitialized real/complex variable.
16: Image              PC                Routine            Line        Source
16: libpnetcdf.so.3.0  000015555101868C  for__signal_handl     Unknown  Unknown
16: libpthread-2.28.s  000015554E03DB20  Unknown               Unknown  Unknown
16: e3sm.exe           0000000002EECC7C  clubb_intr_mp_clu        2060  clubb_intr.F90
16: e3sm.exe           0000000003546686  physpkg_mp_tphysb        2712  physpkg.F90
16: e3sm.exe           000000000350D8CF  physpkg_mp_phys_r        1150  physpkg.F90
16: libiomp5.so        00001555515583F3  __kmp_invoke_micr     Unknown  Unknown
16: libiomp5.so        00001555514DC273  Unknown               Unknown  Unknown

and rvm = 0._r8 was the minimal initialization to get past this issue.

sh_cldice(i,k)=sh_icwmr(i,k)* fice(i,k) *sh_frac(i,k)
endif
end do
end do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to monitor tests with COSP on, esp. with MMF, as icmwr has 0s for configurations that use CLUBB or SHOC.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please clarify.

  1. If fice has a NaN, then sh_cldliq was also get assigned a NaN (and now getting a 0 instead).
  2. If no NaNs in fice, then sh_cldliq was and will be getting a 0 whenever sh_icwmr has a 0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wlin7 please see above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussion on telecon: integration passed and that includes COSP and MMF so this is ready.

@amametjanov amametjanov force-pushed the azamat/machines/add-init-snan-intel-debug branch from 8c657b4 to ba16f5b Compare September 22, 2023 02:59
amametjanov added a commit that referenced this pull request Sep 22, 2023
(PR #5801)

Initialize arrays and scalars to signaling NaNs for Intel in debug mode

Add `-init=snan,arrays` to Intel's debug-compile flags.
Also, initialize EAM's and ELM's uninitialized arrays, which
were flagged by compiler's run-time checks.

[BFB]
@amametjanov
Copy link
Member Author

Rebased onto today's master.
Fixed more uninitialized array references (new since July 10): in commit ba16f5b .
Re-checked with e3sm_integration: https://my.cdash.org/viewTest.php?buildid=2409146 .
Merged to next.

@amametjanov amametjanov merged commit 6174a70 into master Sep 26, 2023
2 of 3 checks passed
@amametjanov amametjanov deleted the azamat/machines/add-init-snan-intel-debug branch September 26, 2023 00:07
@amametjanov amametjanov restored the azamat/machines/add-init-snan-intel-debug branch September 26, 2023 19:19
amametjanov added a commit that referenced this pull request Sep 26, 2023
(PR #5801)

Initialize arrays and scalars to signaling NaNs for Intel in debug mode

Re-merge to bring the new commit since last merge.
@amametjanov amametjanov deleted the azamat/machines/add-init-snan-intel-debug branch September 26, 2023 19:28
@rljacob
Copy link
Member

rljacob commented Oct 12, 2023

Noticed something odd while using these new debugging options with the MOAB coupler on chrysalis.

The code runs fine without debugging but with it on, we got this stack trace:
1256 12: e3sm.exe 00000000009CDB47 seq_map_mod_mp_se 639 seq_map_mod.F90
1257 12: e3sm.exe 00000000006C04E5 prep_ocn_mod_mp_p 2770 prep_ocn_mod.F90
1258 12: e3sm.exe 000000000047E86A cime_comp_mod_mp_ 3014 cime_comp_mod.F90

line 639 in seq_map_mod.F90 is an executable statement but the other 2 files/lines are comments. Removing -init=snan,arrays let the code run to completion.

@amametjanov
Copy link
Member Author

Likely that some scalars or arrays at that line are uninitialized. When I added write prior to such lines, I always got a NaN confirming a missed initilization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants