This repo provides a reference implementation of SI-HDGNN.
Heterogeneous Dynamical Academic Network for Learning Scientific Impact Propagation
Xovee Xu, Ting Zhong, Ce Li, Goce Trajcevski, and Fan Zhou
Knowledge-Based Systems, vol. 238, pp. 107839, Feb 2022
The code was tested with Python 3.7
, tensorflow-gpu 2.4
, torch 1.8.1
, cudnn 8.0
and cudatoolkit 11.0
.
Install the dependencies via Anaconda:
# create conda virtual environment
conda create --name si-hdgnn -c conda-forge cudatoolkit=11.0 cudnn=8.0
# activate environment
conda activate si-hdgnn
# install other dependencies
pip install -r requirements.txt
Hint: pay attention to the versions of cudatoolkit
and cudnn
, tensorflow
and torch
rely on certain versions
of them for GPU/TPU acceleration.
APS and its preprocessd data can be downloaded in Google Drive.
You can access the original APS dataset here. (Released by American Physical Society, obtained at Jan 17, 2019)
Or DBLP-Citation-network V10, and ACM-Citation-network V9 here. (Released by Aminer)
For a given scientific dataset, you should:
- Construct a heterogeneous graph
- Get node embeddings
- Generate scientific information cascades
- Training & Evaluating
Detailed pre-process files information can be found here.
This stage may costs a large amount of RAM (~64GB with millions of nodes/edges in graph).
# build a heterogeneous graph
> python codes/gnn_pre/graph_sample.py
# heterogeneous neighboring node sampling save and run
> python codes/gnn_pre/save_rwr.py
> python codes/gnn_pre/run_rwr.py
After graph construction, we now learn node embeddings via a heterogeneous graph neural network.
> python codes/gnn_train/pre_train_files.py
> python codes/gnn_train/gene_node_embeddings.py --train_iter_n 30
Once we got the node embeddings, we can generate cascades and corresponding training/validation/test data.
> python codes/predict_paper/1_load_emb.py
> python codes/predict_paper/2_construct_cascade.py
> python codes/predict_author/1_load_emb.py
> python codes/predict_author/2_x_y.py
> python codes/predict_paper/paper_prediction.py
> python codes/predict_author/author_prediction.py
You may change the model settings manually in config.py
or directly into the codes.
If you find SI-HDGNN useful for your research, please consider citing us 😘 :)
@article{xu2021heterogeneous,
title = {Heterogeneous Dynamical Academic Network for Learning Scientific Impact Propagation},
author = {Xovee Xu and Ting Zhong and Ce Li and Goce Trajcevski and Fan Zhou},
journal = {Knowledge-Based Systems},
year = {2022},
numpages = {20},
issue = {238},
pages = {107839},
}
If you have any questions, feel free to contact us, emails: [email protected]
or [email protected]
.