GitHub - thu-scc/SC20-miniVite

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
FAQS		FAQS
LICENSE		LICENSE
Makefile		Makefile
README		README
README-SC20.md		README-SC20.md
dspl.hpp		dspl.hpp
graph.hpp		graph.hpp
main.cpp		main.cpp
utils.hpp		utils.hpp

Repository files navigation

************************
miniVite (/mini/ˈviːte/)
************************

*******
-------
ABOUT
-------
*******
miniVite[*] is a proxy app that implements a single phase of Louvain
method in distributed memory for graph community detection. Please
refer to the following paper for a detailed discussion on
distributed memory Louvain method implementation:
https://ieeexplore.ieee.org/abstract/document/8425242/

Apart from real world graphs, users can use specific options
to generate a Random Geometric Graph (RGG) in parallel.
RGGs have been known to have good community structure:
https://arxiv.org/pdf/1604.03993.pdf

The way we have implemented a parallel RGG generator, vertices
owned by a process will only have cross edges with its logical
neighboring processes (each process owning 1x1/p chunk of the
1x1 unit square). If MPI process mapping is such that consecutive
processes (for e.g., p and p+1) are physically close to each other,
then there is not much communication stress in the application.
Therefore, we allow an option to add extra edges between randomly
chosen vertices, whose owners may be physically far apart.

We require the total number of processes to be a power of 2 and
total number of vertices to be perfectly divisible by the number of
processes when parallel RGG generation options are used.
This constraint does not apply to real world graphs passed to miniVite.

We also allow users to pass any real world graph as input. However,
we expect an input graph to be in a certain binary format, which
we have observed to be more efficient than reading ASCII format
files. The code for binary conversion (from a variety of common
graph formats) is packaged separately with Vite, which is our
full implementation of Louvain method in distributed memory.
Please follow instructions in Vite README for binary file
conversion.

Vite could be downloaded from (please don't use the past
PNNL/PNL link to download Vite, the following GitHub
link is the correct one):
https://github.com/Exa-Graph/vite

Unlike Vite, we do not implement any heuristics to improve the
performance of Louvain method. miniVite is a baseline parallel
version, implementing only the first phase of Louvain method.

This code requires an MPI library (preferably MPI-3 compatible)
and C++11 compliant compiler for building.

Please contact the following for any queries or support:

Sayan Ghosh, PNNL (sg0 at pnnl dot gov)
Mahantesh Halappanavar, PNNL (hala at pnnl dot gov)

[*] Ghosh S, Halappanavar M, Tumeo A, Kalyanaraman
A, Gebremedhin AH. miniVite: A graph analytics benchmarking tool
for massively parallel systems. In 2018 IEEE/ACM Performance Modeling,
Benchmarking and Simulation of High Performance Computer Systems
(PMBS) 2018 Nov 12 (pp. 51-56). IEEE.

Please '*' this repository on GitHub if the code is useful to you.

*************
-------------
COMPILATION
-------------
*************
Please update the Makefile with compiler flags and use a C++11 compliant
compiler of your choice. Invoke `make clean; make` after setting paths
to MPI for generating the binary. Use `mpirun` or `mpiexec` or `srun`
to execute the code with specific runtime arguments mentioned in the
next section.

Pass -DPRINT_DIST_STATS for printing distributed graph
characteristics.

Pass -DDEBUG_PRINTF if detailed diagonostics is required along
program run. This program requires OpenMP and C++11 support,
so pass -fopenmp (for g++)/-qopenmp (for icpc) and -std=c++11/
-std=c++0x.

Pass -DUSE_32_BIT_GRAPH if number of nodes in the graph are
within 32-bit range (2 x 10^9), else 64-bit range is assumed.

Pass -DOMP_SCHEDULE_RUNTIME if you want to set OMP_SCHEDULE
for all parallel regions at runtime. If -DOMP_SCHEDULE_RUNTIME
is passed, and OMP_SCHEDULE is not set, then the default schedule will
be chosen (which is most probably "static" or "guided" for most of
the OpenMP regions).

Communicating vertex-community information (per iteration)
is the most expensive step of our distributed Louvain
implementation. We use the one of the following MPI communication
primitives for communicating vertex-community during a Louvain
iteration, that could be enabled by passing predefined
macros at compile time:

1. MPI Collectives: -DUSE_MPI_COLLECTIVES
2. MPI Send-Receive: -DUSE_MPI_SENDRECV
3. MPI RMA: -DUSE_MPI_RMA (using -DUSE_MPI_ACCUMULATE
additionally ensures atomic put)
4. Default: Uses MPI point-to-point nonblocking API in communication
intensive parts..

Apart from these, we use MPI (blocking) collectives, mostly
MPI_Alltoall.

There are other predefined macros in the code as well for printing
intermediate results or checking correctness or using a particular
C++ data structure.

***********************
-----------------------
EXECUTING THE PROGRAM
-----------------------
***********************

E.g.:
mpiexec -n 2 bin/./minivite -f karate.bin
mpiexec -n 2 bin/./minivite -l -n 100
mpiexec -n 2 bin/./minivite -n 100
mpiexec -n 2 bin/./minivite -p 2 -n 100

Possible options (can be combined):

1. -f <bin-file> : Specify input binary file after this argument.
2. -b : Only valid for real-world inputs. Attempts to distribute approximately
equal number of edges among processes. Irregular number of vertices
owned by a particular process. Increases the distributed graph creation
time due to serial overheads, but may improve overall execution time.
3. -n <vertices> : Only valid for synthetically generated inputs. Pass total number of
vertices of the generated graph.
4. -l : Use distributed LCG for randomly choosing edges. If this option
is not used, we will use C++ random number generator (using
std::default_random_engine).
5. -p <percent> : Only valid for synthetically generated inputs. Specify percent of overall
edges to be randomly generated between processes.
6. -t <threshold> : Specify threshold quantity (default: 1.0E-06) used to determine the
exit criteria in an iteration of Louvain method.
7. -w : Only valid for synthetically generated inputs. Use Euclidean distance as edge weight.
If this option is not used, edge weights are considered as 1.0. Generate
edge weight uniformly between (0,1) if Euclidean distance is not available.
8. -r <nranks> : This is used to control the number of aggregators in MPI I/O and is
meaningful when an input binary graph file is passed with option "-f".
naggr := (nranks > 1) ? (nprocs/nranks) : nranks;
9. -s : Print graph data (edge list along with weights).