NeutronBench is a GNN system evaluation framework
built on NeutronStar.
Dependencies
- cmake (>=3.14.2).
- mpich (>=3.3.3) for inter-process communication.
- libnuma for NUMA-aware memory allocation.
- cub for GPU-based graph propagation.
- libtorch version > 1.7 with gpu support for nn computation.
Building
First clone the repository and initialize the submodule:
git clone https://github.com/iDC-NEU/NeutronBench.git
cd NeutronBench
git submodule update --init --recursive
# or just use one command
git clone --recurse-submodules https://github.com/iDC-NEU/NeutronBench.git
To build:
mkdir build && cd build
cmake ..
make -j 10
To run:
# This is an example (you need to prepare a data, refer to the dataset section below).
./run_nts.sh 1 ./cfgs/gcn_sample_demo.cfg
All datasets we used:
Datasets | Nodes | Edges | #F | #L | #hidden |
---|---|---|---|---|---|
232.96K | 114.85M | 602 | 41 | 128 | |
OGB-Arxiv | 169.34K | 2.48M | 128 | 40 | 128 |
OGB-Products | 2.45M | 126.17M | 100 | 47 | 128 |
OGB-Papers | 111.06M | 1.6B | 128 | 172 | 128 |
Amazon | 1.57M | 264,34M | 200 | 107 | 128 |
LiveJournal | 4.85M | 90.55M | 600 | 60 | 128 |
Lj-large | 7.49M | 232.1M | 600 | 60 | 128 |
Lj-links | 5.2M | 205.25M | 600 | 60 | 128 |
Enwiki-links | 13.59M | 1.37B | 600 | 60 | 128 |
we provide a python script to generate the data files:
# craete a python enviroments
conda create -n neutronbench python=3.9 -y
conda activate neutronbench
# instll python dependencies
pip install -r ./data/requirements.txt
# process the dataset
python ./data/generate_nts_dataset.py --dataset ogbn-arxiv
For graph datasets that lack ground-truth attributes, we randomly generate features and labels, and split the data into training (65%), validation (25%), and testing (10%) sets.
We provide Google Drive link for downloading the Amazon
, LiveJournal
, Lj-large
, Lj-links
, and Enwiki-links
datasets.
Data partitioning experiments
# partitioning
python ./exp/exp-partition/exp-partition.py
Batch preparation experiments
# batch size
python ./exp/exp-batch-size/exp-batch-size.py
# sample rate
python ./exp/exp-sample-rate/sample-rate.py
Data Transferring experiments
# data partitioning
python ./exp/exp-partition/exp-partition.py
# batch size
python ./exp/exp-batch-size/exp-batch-size.py
# different optimization
python ./exp/exp-diff-optim/exp-diff-optim.py
# hybrid transfer
python ./exp/exp-hybrid-trans/exp-hybrid-trans.py
# pipeline
python ./exp/exp-diff-optim/exp-diff-pipe.py
# gpu cache
python ./exp/exp-gpu-cache/exp-gpu-cache.py
If you find NeutronBench useful or relevant to your research, please cite our paper as below:
@article{yuan2024comprehensive,
author = {Hao Yuan and Yajiong Liu and Yanfeng Zhang and Xin Ai and Qiange Wang and Chaoyi Chen and Yu Gu and Ge Yu},
title = {Comprehensive Evaluation of GNN Training Systems: A Data Management Perspective},
journal = {Proc. VLDB Endow.},
volume = {17},
number = {6},
pages = {1241--1254},
year = {2024},
url = {https://www.vldb.org/pvldb/vol17/p1241-yuan.pdf},
}
For any questions or feedback, feel free to contract Hao Yuan or create an issue in this repository.