The NAS Parallel Benchmarks for evaluating C++ parallel programming frameworks on shared-memory architectures
The NPB's Fortran codes were carefully ported to C++ and are fully compliant to the NPB3.4.1 version (NPB official webpage). Our paper contains abundant information on how the porting was conducted and discusses the outcome performance we obtained with NPB-CPP on different machines (Intel Xeon, AMD Epyc, and IBM Power8) and compilers (GCC, ICC, and Clang). Results showed that we achieved similar performance with NPB-CPP compared to the original NPB. You can use our paper, along with the official reports, as a guide to assess performance using the NPB suite.
[DOI] J. Löff, D. Griebler, G. Mencagli et al., The NAS Parallel Benchmarks for evaluating C++ parallel programming frameworks on shared-memory architectures, Future Generation Computer Systems (FGCS) (2021)
This is a repository aimed at providing parallel codes with different C++ parallel programming APIs for the NAS Parallel Benchmarks (NPB). You can also contribute with this project, writing issues and pull requests.
The conventions we used in our porting can be found here
===================================================================
NAS Parallel Benchmarks in C++ using OpenMP, FastFlow and Intel TBB
This project was conducted in the Parallel Applications
Modelling Group (GMAP) at PUCRS - Brazil.
GMAP Research Group leader:
Luiz Gustavo Leão Fernandes
Code contributors:
Dalvan Griebler (PUCRS)
Gabriell Araujo (PUCRS)
Júnior Löff (PUCRS)
In case of questions or problems, please send an e-mail to us:
[email protected]
[email protected]
[email protected]
We would like to thank the following researchers for the
fruitful discussions:
Gabriele Mencagli (UNIPI)
Massimo Torquati (UNIPI)
Marco Danelutto (UNIPI)
===================================================================
NPB-SER - This directory contains the sequential version.
NPB-OMP - This directory contains the parallel version implemented with OpenMP (based in the original NPB version).
NPB-TBB - This directory contains the parallel version implemented with Threading Building Blocks.
NPB-FF - This directory contains the parallel version implemented with FastFlow.
Each directory is independent and contains its own implemented version of the kernels and pseudo-applications:
EP - Embarrassingly Parallel, floating-point operation capacity
MG - Multi-Grid, non-local memory accesses, short- and long-distance communication
CG - Conjugate Gradient, irregular memory accesses and communication
FT - discrete 3D fast Fourier Transform, intensive long-distance communication
IS - Integer Sort, integer computation and communication
BT - Block Tri-diagonal solver
SP - Scalar Penta-diagonal solver
LU - Lower-Upper Gauss-Seidel solver
Tip: The pseudo-applications' performance is bounded to the sequential partial differential equation (PDE) solver
Warning: our tests were made with GCC-9 and ICC-19
Enter the directory from the version desired and execute:
$ make _BENCHMARK CLASS=_WORKLOAD
_BENCHMARKs are:
EP, CG, MG, IS, FT, BT, SP and LU
_WORKLOADs are:
Class S: small for quick test purposes
Class W: workstation size (a 90's workstation; now likely too small)
Classes A, B, C: standard test problems; ~4X size increase going from one class to the next
Classes D, E, F: large test problems; ~16X size increase from each of the previous Classes
Command example:
$ make ep CLASS=A
Binaries are generated inside the bin folder
Command example:
$ ./bin/ep.A
Each folder contains a default compiler configuration that can be modified in the config/make.def
file.
You must use this file if you want to modify the target compiler, flags or links that will be used to compile the applications.
The repository already has an additional directory libs
with the FastFlow and Intel TBB libraries.
For TBB you need to compile the library and load the environment variables, therefore, enter libs/tbb-2020.1
and execute the following command:
$ make
This command will generate a folder inside libs/tbb-2020.1/build
. Finally, you can load TBB vars within the script tbbvars.sh
, for example, executing the following command in your terminal:
$ source libs/tbb-2020.1/build/linux_intel64_gcc_cc7.5.0_libc2.27_kernel4.15.0_release/tbbvars.sh
The degree of parallelism can be set using the *RUNTIME*_NUM_THREADS
environment variable.
Command example:
$ export OMP_NUM_THREADS=32
or
TBB_NUM_THREADS
and FF_NUM_THREADS