Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MATRIX_MARKET tests failure with parallel make -jN check #439

Open
rathann opened this issue Oct 18, 2023 · 5 comments
Open

MATRIX_MARKET tests failure with parallel make -jN check #439

rathann opened this issue Oct 18, 2023 · 5 comments

Comments

@rathann
Copy link

rathann commented Oct 18, 2023

Expected behavior

All tests complete successfully.

Actual behavior

Two out of three from arpackmm, issue215 and issue401 tests fail if run with make -j2 or higher.

Where/how to reproduce the problem

  • arpack-ng: 3.9.1
  • OS: Fedora rawhide (but reproducible on 38 and 39, too)
  • compiler: gcc version 13.2.1 20231011 (Red Hat 13.2.1-4) (GCC)
  • environment: FFLAGS='-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -I/usr/lib64/gfortran/modules '
  • configure: ./configure --build=x86_64-redhat-linux --host=x86_64-redhat-linux --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --runstatedir=/run --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-static --with-blas=-lflexiblas --with-lapack=-lflexiblas --enable-eigen --enable-icb

Steps to reproduce the problem

cd EXAMPLES/MATRIX_MARKET
make check -j2
make check -j3

Error message

With make -j2 issue215 test passes and the other two fail.

$ make check -j2
make  arpackmm \
  arpackmm.sh issue401.sh issue215.sh An.mtx As.mtx Az.mtx B.mtx Bz.mtx issue401.mtx issue215.mtx
make[1]: Entering directory '/builddir/build/BUILD/arpack-3.9.1/src/EXAMPLES/MATRIX_MARKET'
make[1]: 'arpackmm' is up to date.
make[1]: Nothing to be done for 'arpackmm.sh'.
make[1]: Nothing to be done for 'issue401.sh'.
make[1]: Nothing to be done for 'issue215.sh'.
make[1]: Nothing to be done for 'An.mtx'.
make[1]: Nothing to be done for 'As.mtx'.
make[1]: Nothing to be done for 'Az.mtx'.
make[1]: Nothing to be done for 'B.mtx'.
make[1]: Nothing to be done for 'Bz.mtx'.
make[1]: Nothing to be done for 'issue401.mtx'.
make[1]: Nothing to be done for 'issue215.mtx'.
make[1]: Leaving directory '/builddir/build/BUILD/arpack-3.9.1/src/EXAMPLES/MATRIX_MARKET'
make  check-TESTS
make[1]: Entering directory '/builddir/build/BUILD/arpack-3.9.1/src/EXAMPLES/MATRIX_MARKET'
make[2]: Entering directory '/builddir/build/BUILD/arpack-3.9.1/src/EXAMPLES/MATRIX_MARKET'
FAIL: issue401.sh
FAIL: arpackmm.sh
PASS: issue215.sh
============================================================================
Testsuite summary for ARPACK-NG 3.9.1
============================================================================
# TOTAL: 3
# PASS:  1
# SKIP:  0
# XFAIL: 0
# FAIL:  2
# XPASS: 0
# ERROR: 0
============================================================================
See EXAMPLES/MATRIX_MARKET/test-suite.log
Please report to https://github.com/opencollab/arpack-ng/issues/
============================================================================
make[2]: *** [Makefile:741: test-suite.log] Error 1
make[2]: Leaving directory '/builddir/build/BUILD/arpack-3.9.1/src/EXAMPLES/MATRIX_MARKET'
make[1]: *** [Makefile:849: check-TESTS] Error 2
make[1]: Leaving directory '/builddir/build/BUILD/arpack-3.9.1/src/EXAMPLES/MATRIX_MARKET'
make: *** [Makefile:936: check-am] Error 2

With make -j3 or higher, arpackmm test passes and the other two fail:

$ make check -j3
make  arpackmm \
  arpackmm.sh issue401.sh issue215.sh An.mtx As.mtx Az.mtx B.mtx Bz.mtx issue401.mtx issue215.mtx
make[1]: Entering directory '/builddir/build/BUILD/arpack-3.9.1/src/EXAMPLES/MATRIX_MARKET'
make[1]: 'arpackmm' is up to date.
make[1]: Nothing to be done for 'arpackmm.sh'.
make[1]: Nothing to be done for 'issue401.sh'.
make[1]: Nothing to be done for 'issue215.sh'.
make[1]: Nothing to be done for 'An.mtx'.
make[1]: Nothing to be done for 'As.mtx'.
make[1]: Nothing to be done for 'Az.mtx'.
make[1]: Nothing to be done for 'B.mtx'.
make[1]: Nothing to be done for 'Bz.mtx'.
make[1]: Nothing to be done for 'issue401.mtx'.
make[1]: Nothing to be done for 'issue215.mtx'.
make[1]: Leaving directory '/builddir/build/BUILD/arpack-3.9.1/src/EXAMPLES/MATRIX_MARKET'
make  check-TESTS
make[1]: Entering directory '/builddir/build/BUILD/arpack-3.9.1/src/EXAMPLES/MATRIX_MARKET'
make[2]: Entering directory '/builddir/build/BUILD/arpack-3.9.1/src/EXAMPLES/MATRIX_MARKET'
FAIL: issue215.sh
FAIL: issue401.sh
PASS: arpackmm.sh
============================================================================
Testsuite summary for ARPACK-NG 3.9.1
============================================================================
# TOTAL: 3
# PASS:  1
# SKIP:  0
# XFAIL: 0
# FAIL:  2
# XPASS: 0
# ERROR: 0
============================================================================
See EXAMPLES/MATRIX_MARKET/test-suite.log
Please report to https://github.com/opencollab/arpack-ng/issues/
============================================================================
make[2]: *** [Makefile:741: test-suite.log] Error 1
make[2]: Leaving directory '/builddir/build/BUILD/arpack-3.9.1/src/EXAMPLES/MATRIX_MARKET'
make[1]: *** [Makefile:849: check-TESTS] Error 2
make[1]: Leaving directory '/builddir/build/BUILD/arpack-3.9.1/src/EXAMPLES/MATRIX_MARKET'
make: *** [Makefile:936: check-am] Error 2

Traces

make -j2

$ tail -n 300 /builddir/build/BUILD/arpack-3.9.1/src/EXAMPLES/MATRIX_MARKET/test-suite.log
============================================================
   ARPACK-NG 3.9.1: EXAMPLES/MATRIX_MARKET/test-suite.log
============================================================

# TOTAL: 3
# PASS:  1
# SKIP:  0
# XFAIL: 0
# FAIL:  2
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2

FAIL: arpackmm.sh
=================

./arpackmm --help

========================================================================================

./arpackmm --A As.mtx      --slv BiCG --slvItrTol 1.e-06 --slvItrMaxIt 150     --nbCV 6 --maxIt 200 --verbose 3 --debug 3

========================================================================================

./arpackmm --A As.mtx      --slv BiCG --slvItrTol 1.e-06 --slvItrMaxIt 150     --nbCV 6 --maxIt 200 --verbose 3 --debug 3 --restart

========================================================================================

./arpackmm --A As.mtx      --slv BiCG --slvItrTol 1.e-06 --slvItrMaxIt 150  --simplePrec   --nbCV 6 --maxIt 200 --verbose 3 --debug 3

========================================================================================

./arpackmm --A As.mtx      --slv BiCG --slvItrTol 1.e-06 --slvItrMaxIt 150  --simplePrec   --nbCV 6 --maxIt 200 --verbose 3 --debug 3 --restart
FAIL arpackmm.sh (exit status: 1)

FAIL: issue401.sh
=================

OPT: A issue401.mtx, B N.A., dense no, nbEV 1, nbCV 5, stdPb yes, symPb yes, cpxPb no, simplePrec no, mag LA
OPT: shiftReal no, sigmaReal 0, shiftImag no, sigmaImag 0, invert no, tol 1e-06, maxIt 100, Ritz vectors
OPT: slv BiCG, slvItrPC Diag, slvItrTol 1e-06, slvItrMaxIt 100
OPT: check yes, verbose 0, debug 0, restart no

INP: create A 0 s

OUT: mode 1, nb EV found 1, nb iterations 1
OUT: init mode solver 0 s, RCI time 0 s
OUT: full time 0 s

STAT: total number of user OP*x operation                         9
STAT: total number of user  B*x operation                         0
STAT: total number of reorthogonalization steps taken             4
STAT: total number of it. refinement steps in reorthogonalization 8
STAT: total number of restart steps                               3
OPT: A issue401.mtx, B N.A., dense no, nbEV 1, nbCV 5, stdPb yes, symPb yes, cpxPb no, simplePrec no, mag LA
OPT: shiftReal no, sigmaReal 0, shiftImag no, sigmaImag 0, invert no, tol 1e-06, maxIt 100, Ritz vectors
OPT: slv BiCG, slvItrPC Diag, slvItrTol 1e-06, slvItrMaxIt 100
OPT: check yes, verbose 0, debug 0, restart yes

INP: create A 0 s

OUT: mode 1, nb EV found 1, nb iterations 1
OUT: init mode solver 0 s, RCI time 0 s
OUT: full time 0 s

STAT: total number of user OP*x operation                         10
STAT: total number of user  B*x operation                         0
STAT: total number of reorthogonalization steps taken             5
STAT: total number of it. refinement steps in reorthogonalization 10
STAT: total number of restart steps                               4
OPT: A issue401.mtx, B N.A., dense no, nbEV 1, nbCV 5, stdPb yes, symPb yes, cpxPb no, simplePrec no, mag LA
OPT: shiftReal no, sigmaReal 0, shiftImag no, sigmaImag 0, invert no, tol 1e-06, maxIt 100, Ritz vectors
OPT: slv BiCG, slvItrPC Diag, slvItrTol 1e-06, slvItrMaxIt 100
OPT: check yes, verbose 0, debug 0, restart yes

INP: create A 0 s
Error: bad dim - restart KO
Error: bad restart (resid)
Error: arpack solve KO
Error: solve KO
Error: arpack solve KO
FAIL issue401.sh (exit status: 1)

make -j3

$ tail -n 300 /builddir/build/BUILD/arpack-3.9.1/src/EXAMPLES/MATRIX_MARKET/test-suite.log
============================================================
   ARPACK-NG 3.9.1: EXAMPLES/MATRIX_MARKET/test-suite.log
============================================================

# TOTAL: 3
# PASS:  1
# SKIP:  0
# XFAIL: 0
# FAIL:  2
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2

FAIL: issue401.sh
=================

OPT: A issue401.mtx, B N.A., dense no, nbEV 1, nbCV 5, stdPb yes, symPb yes, cpxPb no, simplePrec no, mag LA
OPT: shiftReal no, sigmaReal 0, shiftImag no, sigmaImag 0, invert no, tol 1e-06, maxIt 100, Ritz vectors
OPT: slv BiCG, slvItrPC Diag, slvItrTol 1e-06, slvItrMaxIt 100
OPT: check yes, verbose 0, debug 0, restart no

INP: create A 0 s

OUT: mode 1, nb EV found 1, nb iterations 1
OUT: init mode solver 0 s, RCI time 0 s
OUT: full time 0.001 s

STAT: total number of user OP*x operation                         9
STAT: total number of user  B*x operation                         0
STAT: total number of reorthogonalization steps taken             4
STAT: total number of it. refinement steps in reorthogonalization 8
STAT: total number of restart steps                               3
OPT: A issue401.mtx, B N.A., dense no, nbEV 1, nbCV 5, stdPb yes, symPb yes, cpxPb no, simplePrec no, mag LA
OPT: shiftReal no, sigmaReal 0, shiftImag no, sigmaImag 0, invert no, tol 1e-06, maxIt 100, Ritz vectors
OPT: slv BiCG, slvItrPC Diag, slvItrTol 1e-06, slvItrMaxIt 100
OPT: check yes, verbose 0, debug 0, restart yes

INP: create A 0 s
Error: bad dim - restart KO
Error: bad restart (resid)
Error: arpack solve KO
Error: solve KO
Error: arpack solve KO
FAIL issue401.sh (exit status: 1)

FAIL: issue215.sh
=================

OPT: A issue215.mtx, B N.A., dense no, nbEV 1, nbCV 4, stdPb yes, symPb yes, cpxPb no, simplePrec no, mag LM
OPT: shiftReal yes, sigmaReal 0.1, shiftImag no, sigmaImag 0, invert no, tol 1e-06, maxIt 100, Ritz vectors
OPT: slv BiCG, slvItrPC Diag, slvItrTol 1e-06, slvItrMaxIt 100
OPT: check yes, verbose 0, debug 0, restart no

INP: create A 0 s

OUT: mode 1, nb EV found 1, nb iterations 1
OUT: init mode solver 0 s, RCI time 0 s
OUT: full time 0.001 s

STAT: total number of user OP*x operation                         6
STAT: total number of user  B*x operation                         0
STAT: total number of reorthogonalization steps taken             4
STAT: total number of it. refinement steps in reorthogonalization 6
STAT: total number of restart steps                               1
OPT: A issue215.mtx, B N.A., dense no, nbEV 1, nbCV 4, stdPb yes, symPb yes, cpxPb no, simplePrec no, mag LM
OPT: shiftReal yes, sigmaReal 0.1, shiftImag no, sigmaImag 0, invert no, tol 1e-06, maxIt 100, Ritz vectors
OPT: slv BiCG, slvItrPC Diag, slvItrTol 1e-06, slvItrMaxIt 100
OPT: check yes, verbose 0, debug 0, restart yes

INP: create A 0 s
Error: bad dim - restart KO
Error: bad restart (resid)
Error: arpack solve KO
Error: solve KO
Error: arpack solve KO
FAIL issue215.sh (exit status: 1)

Callstack

N/A

Notes, remarks

Using make -j1 or no -j option works.

@sylvestre
Copy link
Contributor

Is it a regression new with 3.9.1?

@rathann
Copy link
Author

rathann commented Oct 18, 2023

These tests didn't exist in 3.9.0, so yes, it's new.

@fghoussen
Copy link
Collaborator

These tests are meant to be run sequentially: restart infos are stored into a file that do not support concurrent access

@rathann
Copy link
Author

rathann commented Dec 7, 2023

Ok. Could only those tests be run sequentially? make has special markers for targets that require sequential handling.

@fghoussen
Copy link
Collaborator

Ok. Could only those tests be run sequentially?

Sure

make has special markers for targets that require sequential handling.

No idea how

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants