- Compile & Run
- Read the installation documentation
- Compile and install
- Find optimal parameters for running
- Compare the performance of difference of libraries and find the best library
- Analysis of computing resource allocation efficiency
Number of cores -> Number of cores per node -> Number of nodes
- Application performance analysis
-
Intel Compiler analysis
-profile-functions
-profile-loops=all
-
Use Intel VTune to detect hotspots
Use it to get a performance snapshot
- MPI load balancing
- OpenMP load balancing
- Memory stalls
- FPU Utilization
- I/O
-
Turn off IPO and function inline while analyzing performance
- Compilation Optimization
Build without optimization -> General optimization(-o1 -o2 -o3
) -> Architecture related optimization -> Interprocedural Optimization -> Vectorization -> Automatic Parallelization
for (i=0; i<n; i++)
c[i] = a[i] + b[i];
c1=a1+b1
c2=a2+b2
...
- Advanced Optimization
- Read algorithm documentation
- Read the source code and find parallelizable areas
- Parallelize the hotspots areas according to performance analysis results
- OpenMP
- OpenACC
- CUDA
- Intel Cilk
- TBB
- Balance the workload of MPI processes and threads
Ubuntu 18.04.2 LTS, 4.18.0-15-generic, x86_64, VirtualBox, 2GB Memory, HPCG 3.1
sudo apt-get install -y mpich
cd ~
wget http://www.hpcg-benchmark.org/downloads/hpcg-3.1.tar.gz
gunzip hpcg-3.1.tar.gz; tar -xvf hpcg.tar
# Working Directory: hpcg-3.1/setup
cd hpcg-3.1/setup
cp Make.Linux_MPI Make.linux
# These Files will be Generated:
# hpcg-3.1/setup/Make.linux # Compilation Configuration File
MPdir = /usr/bin
MPinc = -I /usr/include/mpich/
MPlib = /usr/lib/x86_64-linux-gnu/libmpich.so
# Working Directory: hpcg-3.1/
mkdir build
cd build
../configure linux
make -j $(nproc)
# These Files will be Generated:
# hpcg-3.1/build/bin/hpcg.dat # Benchmark Configuration File
# hpcg-3.1/build/bin/xhpcg # Executable File
HPCG benchmark input file
Sandia National Laboratories; University of Tennessee, Knoxville
16 16 16
60
16 16 16
is problem size, limited by system Memory.60
is test time.60
for optimization,1860
for official performance test.
# Working Dir: hpcg-3.1/build/bin
mpirun -np 4 ./xhpcg
# These Files will be Generated:
# hpcg-3.1/build/bin/hpcg20191202T131917.txt
# hpcg-3.1/build/bin/HPCG-Benchmark_3.1_2019-12-02_13-20-15.txt
Final Summary::HPCG result is VALID with a GFLOP/s rating of=13.9936
Using Spack!
A flexible package manager that supports multiple versions, configurations, platforms, and compilers.
spack install mpileaks
spack install [email protected] %[email protected] +threads cppflags="-O3 -g3" target="skylake" ^mpich3.3 // customer. version, compiler, build options, compiler flags, target mircoarchitecture, dependencies
Write python code and yaml to build your own package.
- Get familiar with working environment
- Learn basic MPI, OpenMP
- Run benchmarks and choose one to optimize it.