Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

invalid memory reference during builds for s390x #348

Open
skriesch opened this issue Apr 10, 2022 · 8 comments
Open

invalid memory reference during builds for s390x #348

skriesch opened this issue Apr 10, 2022 · 8 comments

Comments

@skriesch
Copy link

Expected behavior

All builds for s390x should be successful for openSUSE Tumbleweed.

Actual behavior

arpack-ng packages for mpi are failing because of a invalid memory reference.

Error messages

arpack-ng:openmpi1

[   79s] 9/9 Test #9: issue46_tst ......................***Exception: SegFault  0.02 sec
[   79s] 
[   79s] Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
[   79s] 
[   79s] Backtrace for this error:
[   79s] #0  0x3ff9cfa378b in ???
[   79s] #1  0x3ff9cfa2719 in ???
[   79s] #2  0x3ff9e0fe487 in ???
[   79s] #3  0x3ff9c97fc08 in ???
[   79s] #4  0x3ff9c987391 in ???
[   79s] #5  0x3ff9c992ae5 in ???
[   79s] #6  0x3ff9c972197 in ???
[   79s] #7  0x3ff9ca4b41d in ???
[   79s] #8  0x3ff9ca699c7 in ???
[   79s] #9  0x3ff9cf5ac99 in ???
[   79s] #10  0x2aa2b1027ed in issue46
[   79s] 	at /home/abuild/rpmbuild/BUILD/arpack-ng-3.8.0/PARPACK/TESTS/MPI/issue46.f:15
[   79s] #11  0x2aa2b1013bf in main
[   79s] 	at /home/abuild/rpmbuild/BUILD/arpack-ng-3.8.0/PARPACK/TESTS/MPI/issue46.f:32

arpack-ng:openmpi2

[   73s] 8/9 Test #9: issue46_tst ......................***Failed    0.01 sec
[   73s] [s390zl28:02838] *** Process received signal ***
[   73s] [s390zl28:02838] Signal: Segmentation fault (11)
[   73s] [s390zl28:02838] Signal code: Address not mapped (1)
[   73s] [s390zl28:02838] Failing at address: 0xfffffffffffff000
[   73s] [s390zl28:02838] [ 0] linux-vdso64.so.1(__kernel_rt_sigreturn+0x0)[0x3ff9287e490]
[   73s] [s390zl28:02838] [ 1] /usr/lib64/mpi/gcc/openmpi2/lib64/libopen-pal.so.20(+0x8d408)[0x3ff9240d408]
[   73s] [s390zl28:02838] [ 2] /usr/lib64/mpi/gcc/openmpi2/lib64/libopen-pal.so.20(+0x94bd8)[0x3ff92414bd8]
[   73s] [s390zl28:02838] [ 3] /usr/lib64/mpi/gcc/openmpi2/lib64/libopen-pal.so.20(opal_hwloc1112_hwloc_topology_load+0xd6)[0x3ff92423d36]
[   73s] [s390zl28:02838] [ 4] /usr/lib64/mpi/gcc/openmpi2/lib64/libopen-pal.so.20(opal_hwloc_base_get_topology+0x78)[0x3ff923fe268]
[   73s] [s390zl28:02838] [ 5] /usr/lib64/mpi/gcc/openmpi2/lib64/openmpi/mca_ess_hnp.so(+0x5380)[0x3ff92205380]
[   73s] [s390zl28:02838] [ 6] /usr/lib64/mpi/gcc/openmpi2/lib64/libopen-rte.so.20(orte_init+0x25c)[0x3ff9271a29c]
[   73s] [s390zl28:02838] [ 7] /usr/lib64/mpi/gcc/openmpi2/lib64/libopen-rte.so.20(orte_daemon+0x1ce)[0x3ff92739ba6]
[   73s] [s390zl28:02838] [ 8] /lib64/libc.so.6(+0x33926)[0x3ff924b3926]
[   73s] [s390zl28:02838] [ 9] /lib64/libc.so.6(__libc_start_main+0xa0)[0x3ff924b3a08]
[   73s] [s390zl28:02838] [10] orted(+0x928)[0x2aa30b80928]
[   73s] [s390zl28:02838] *** End of error message ***
[   73s] [s390zl28:02836] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 582
[   73s] [s390zl28:02836] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 166

arpack-ng:openmpi3

[  118s] 9/9 Test #9: issue46_tst ......................***Failed    0.17 sec
[  118s] [s390zp25:02863] *** Process received signal ***
[  118s] [s390zp25:02863] Signal: Segmentation fault (11)
[  118s] [s390zp25:02863] Signal code: Address not mapped (1)
[  118s] [s390zp25:02863] Failing at address: 0xfffffffffffff000
[  118s] [s390zp25:02863] [ 0] linux-vdso64.so.1(__kernel_rt_sigreturn+0x0)[0x3ffb38fe490]
[  118s] [s390zp25:02863] [ 1] /usr/lib64/mpi/gcc/openmpi3/lib64/libopen-pal.so.40(+0x98666)[0x3ffb3498666]
[  118s] [s390zp25:02863] [ 2] /usr/lib64/mpi/gcc/openmpi3/lib64/libopen-pal.so.40(+0xa051a)[0x3ffb34a051a]
[  118s] [s390zp25:02863] [ 3] /usr/lib64/mpi/gcc/openmpi3/lib64/libopen-pal.so.40(opal_hwloc1117_hwloc_topology_load+0xf8)[0x3ffb34b0640]
[  118s] [s390zp25:02863] [ 4] /usr/lib64/mpi/gcc/openmpi3/lib64/libopen-pal.so.40(opal_hwloc_base_get_topology+0x4c8)[0x3ffb348b438]
[  118s] [s390zp25:02863] [ 5] /usr/lib64/mpi/gcc/openmpi3/lib64/openmpi/mca_ess_hnp.so(+0x5af4)[0x3ffb3105af4]
[  118s] [s390zp25:02863] [ 6] /usr/lib64/mpi/gcc/openmpi3/lib64/libopen-rte.so.40(orte_init+0x2ce)[0x3ffb380f3ce]
[  118s] [s390zp25:02863] [ 7] /usr/lib64/mpi/gcc/openmpi3/lib64/libopen-rte.so.40(orte_daemon+0x25e)[0x3ffb37be476]
[  118s] [s390zp25:02863] [ 8] /lib64/libc.so.6(+0x33926)[0x3ffb3533926]
[  118s] [s390zp25:02863] [ 9] /lib64/libc.so.6(__libc_start_main+0xa0)[0x3ffb3533a08]
[  118s] [s390zp25:02863] [10] orted(+0x928)[0x2aa16900928]
[  118s] [s390zp25:02863] *** End of error message ***
[  118s] [s390zp25:02861] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 532
[  118s] [s390zp25:02861] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 166

Where/how to reproduce the problem

  • arpack-ng: 3.8.0

  • OS: openSUSE Tumbleweed for the architecture s390x

  • compiler: gcc-c++-11-6.1 openmpi3-3.1.6-4.1 cmake-3.23.0-1.1 gcc-fortran-11-6.1

  • environment:
    export 'FFLAGS=-O2 -g -m64 -fmessage-length=0 -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -fPIC'
    export 'FCFLAGS=-O2 -g -m64 -fmessage-length=0 -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -fPIC'
    export 'CFLAGS=-O2 -g -m64 -fmessage-length=0 -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -fPIC'
    export 'CXXFLAGS=-O2 -g -m64 -fmessage-length=0 -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -fPIC'
    export LD_LIBRARY_PATH=/usr/lib64/mpi/gcc/openmpi3/lib64
    export CC=/usr/lib64/mpi/gcc/openmpi3/bin/mpicc
    export CXX=/usr/lib64/mpi/gcc/openmpi3/bin/mpic++
    export F77=/usr/lib64/mpi/gcc/openmpi3/bin/mpif77
    export MPIF77=/usr/lib64/mpi/gcc/openmpi3/bin/mpif77

  • configure: /usr/bin/cmake /home/abuild/rpmbuild/BUILD/arpack-ng-3.8.0/. '-GUnix Makefiles' -DCMAKE_INSTALL_PREFIX:PATH=/usr -DINCLUDE_INSTALL_DIR:PATH=/usr/include -DLIB_INSTALL_DIR:PATH=/usr/lib64 -DSYSCONF_INSTALL_DIR:PATH=/etc -DSHARE_INSTALL_PREFIX:PATH=/usr/share -DCMAKE_INSTALL_LIBDIR:PATH=lib64 -DCMAKE_INSTALL_LIBEXECDIR=/usr/libexec -DCMAKE_BUILD_TYPE=RelWithDebInfo '-DCMAKE_C_FLAGS=-O2 -g -m64 -fmessage-length=0 -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -fPIC -DNDEBUG' '-DCMAKE_CXX_FLAGS=-O2 -g -m64 -fmessage-length=0 -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -fPIC -DNDEBUG' '-DCMAKE_Fortran_FLAGS=-O2 -g -m64 -fmessage-length=0 -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -fPIC -DNDEBUG' '-DCMAKE_EXE_LINKER_FLAGS=-flto=auto -Wl,--as-needed -Wl,--no-undefined -Wl,-z,now' '-DCMAKE_MODULE_LINKER_FLAGS=-flto=auto -Wl,--as-needed' '-DCMAKE_SHARED_LINKER_FLAGS=-flto=auto -Wl,--as-needed -Wl,--no-undefined -Wl,-z,now' -DLIB_SUFFIX=64 -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON -DBUILD_SHARED_LIBS:BOOL=ON -DBUILD_STATIC_LIBS:BOOL=OFF -DCMAKE_COLOR_MAKEFILE:BOOL=OFF -DCMAKE_INSTALL_DO_STRIP:BOOL=OFF -DCMAKE_MODULES_INSTALL_DIR=/usr/lib64/cmake/parpack-openmpi3 -DCMAKE_INSTALL_PREFIX:PATH=/usr/lib64/mpi/gcc/openmpi3 -DCMAKE_INSTALL_LIBDIR:PATH=/usr/lib64/mpi/gcc/openmpi3/lib64 -DCMAKE_SKIP_RPATH:BOOL=OFF -DCMAKE_SKIP_INSTALL_RPATH:BOOL=ON -DCMAKE_CXX_COMPILER_VERSION=11.2.1 -DMPI:BOOL=ON -DPYTHON3:BOOL=OFF

- [   66s] -- Configuration summary for arpack-ng-3.8.0:
[   66s]    -- prefix: /usr/lib64/mpi/gcc/openmpi3
[   66s]    -- MPI: ON
[   66s]    -- ICB: OFF
[   66s]    -- INTERFACE64: 0
[   66s]    -- FC:      /usr/bin/gfortran
[   66s]    -- FCFLAGS: -O2 -g -DNDEBUG -O2 -g -m64 -fmessage-length=0 -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -fPIC -DNDEBUG -cpp -ffixed-line-length-none 
[   66s]    -- MPIFC:
[   66s]       -- compile: /usr/lib64/mpi/gcc/openmpi3/include
[   66s]       -- compile: /usr/lib64/mpi/gcc/openmpi3/lib64
[   66s]       -- link:    /usr/lib64/mpi/gcc/openmpi3/lib64/libmpi_usempif08.so
[   66s]       -- link:    /usr/lib64/mpi/gcc/openmpi3/lib64/libmpi_usempi_ignore_tkr.so
[   66s]       -- link:    /usr/lib64/mpi/gcc/openmpi3/lib64/libmpi_mpifh.so
[   66s]       -- link:    /usr/lib64/mpi/gcc/openmpi3/lib64/libmpi.so
[   66s]    -- BLAS:
[   66s]       -- link:    /usr/lib64/libopenblas.so
[   66s]    -- LAPACK:
[   66s]       -- link:    -lm
[   66s]       -- link:    -ldl
[   66s]       -- link:    BLAS::BLAS
[   66s] -- Configuring done

Steps to reproduce the problem

  • Build arpack-ng openmpi modules for s390x on openSUSE Tumbleweed
  • arpack-ng:opnmpi1 until arpack-ng:openmpi4 are failing.
  • The reason is a Segmentation fault because of a invalid memory reference.

Error message

[  118s] 9/9 Test #9: issue46_tst ......................***Failed    0.17 sec
[  118s] [s390zp25:02863] *** Process received signal ***
[  118s] [s390zp25:02863] Signal: Segmentation fault (11)
[  118s] [s390zp25:02863] Signal code: Address not mapped (1)
[  118s] [s390zp25:02863] Failing at address: 0xfffffffffffff000
[  118s] [s390zp25:02863] [ 0] linux-vdso64.so.1(__kernel_rt_sigreturn+0x0)[0x3ffb38fe490]
[  118s] [s390zp25:02863] [ 1] /usr/lib64/mpi/gcc/openmpi3/lib64/libopen-pal.so.40(+0x98666)[0x3ffb3498666]
[  118s] [s390zp25:02863] [ 2] /usr/lib64/mpi/gcc/openmpi3/lib64/libopen-pal.so.40(+0xa051a)[0x3ffb34a051a]
[  118s] [s390zp25:02863] [ 3] /usr/lib64/mpi/gcc/openmpi3/lib64/libopen-pal.so.40(opal_hwloc1117_hwloc_topology_load+0xf8)[0x3ffb34b0640]
[  118s] [s390zp25:02863] [ 4] /usr/lib64/mpi/gcc/openmpi3/lib64/libopen-pal.so.40(opal_hwloc_base_get_topology+0x4c8)[0x3ffb348b438]
[  118s] [s390zp25:02863] [ 5] /usr/lib64/mpi/gcc/openmpi3/lib64/openmpi/mca_ess_hnp.so(+0x5af4)[0x3ffb3105af4]
[  118s] [s390zp25:02863] [ 6] /usr/lib64/mpi/gcc/openmpi3/lib64/libopen-rte.so.40(orte_init+0x2ce)[0x3ffb380f3ce]
[  118s] [s390zp25:02863] [ 7] /usr/lib64/mpi/gcc/openmpi3/lib64/libopen-rte.so.40(orte_daemon+0x25e)[0x3ffb37be476]
[  118s] [s390zp25:02863] [ 8] /lib64/libc.so.6(+0x33926)[0x3ffb3533926]
[  118s] [s390zp25:02863] [ 9] /lib64/libc.so.6(__libc_start_main+0xa0)[0x3ffb3533a08]
[  118s] [s390zp25:02863] [10] orted(+0x928)[0x2aa16900928]
[  118s] [s390zp25:02863] *** End of error message ***
[  118s] [s390zp25:02861] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 532
[  118s] [s390zp25:02861] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a daemon on the local node in file ess_singleton_module.c at line 166

Notes, remarks

@fghoussen
Copy link
Collaborator

fghoussen commented Apr 10, 2022

Feel free to propose a PR. We can merge it when CI is back (CI down for now).
Back at time when CI was handled with Travis 51b299c : opensuse build was OK.
Is s390x this arch: https://en.wikipedia.org/wiki/IBM_System/390? If so, seems it's 32-bit arch but you compile with -m64.

@skriesch
Copy link
Author

skriesch commented Apr 11, 2022

IBM Systems is providing s390 and s390x. s390 is 32 bit. s390x is the chosen architecture and 64 bit. We build only for the 64 bit mainframe architecture. That is a good explanation about that: https://www.ibm.com/docs/en/cics-ts/5.6?topic=basics-24-bit-31-bit-64-bit-addressing

"64-bit architecture, uses 64-bit storage addresses and 64-bit integer arithmetic and logical instructions" is following from these slides: KIT Z Architecture lecture

That is a nice extension with a deeper insight into the memory management. I look, that I can find the issue this week based on these 2 tutorials/lecture slides.

@skriesch
Copy link
Author

The linux-vdso64.so is applied. The manpage is saying that about the s390x architecture:
The table below lists the symbols exported by the vDSO.

   symbol                   version
   ──────────────────────────────────────
   __kernel_clock_getres    LINUX_2.6.29
   __kernel_clock_gettime   LINUX_2.6.29
   __kernel_gettimeofday    LINUX_2.6.29

@fghoussen
Copy link
Collaborator

"64-bit architecture, uses 64-bit storage addresses and 64-bit integer arithmetic and logical instructions"

If s390x uses 64-bit integers, you may need to set INTERFACE64=1

@skriesch
Copy link
Author

I set the INTERFACE64=1 in our spec file.

I have got a new error message now:

[   21s] /home/abuild/rpmbuild/BUILD/arpack-ng-3.8.0/PARPACK/TESTS/MPI/issue46.f:16:26:
[   21s] 
[   21s]    16 |       call MPI_COMM_RANK( MPI_COMM_WORLD, myid, ierr )
[   21s]       |                          1
[   21s] ......
[   21s]   113 |       call MPI_COMM_RANK( comm, myid, ierr )
[   21s]       |                          2
[   21s] Error: Type mismatch between actual argument at (1) and actual argument at (2) (INTEGER(8)/INTEGER(4)).
[   21s] /home/abuild/rpmbuild/BUILD/arpack-ng-3.8.0/PARPACK/TESTS/MPI/issue46.f:17:26:
[   21s] 
[   21s]    17 |       call MPI_COMM_SIZE( MPI_COMM_WORLD, nprocs, ierr )
[   21s]       |                          1
[   21s] ......
[   21s]   114 |       call MPI_COMM_SIZE( comm, nprocs, ierr )
[   21s]       |                          2

@fghoussen
Copy link
Collaborator

Not even surprised! :D Years ago, I tried to PR a patch for this problem, but the CI was failing for reasons I never understood?!... I guess at the time, CI boxes where old and were shipped with openmpi version (module mpi_f08) that didn't fully implement the 2008 Fortran standard...

I should retrieve the commit and PR it soon: hope CI won't break this time and may fix this problem too...

@fghoussen
Copy link
Collaborator

@skriesch: if #368 does not break the CI, try to checkout the branch PR and test if this fix your issue

@skriesch
Copy link
Author

The patch was not compatible with the version 3.8.0. Therefore, I have tested it based on the master branch. We have got new error messages (and more).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants