ELTRA: An Embedding Method based on Learning-to-Rank to Preserve Asymmetric Information in Directed Graphs
This repository provides (1) the reference implementation of ELTRA, (2) the Python implementation of CRW, (3) the data, and (4) the proof for CRW scores properties.
ELTRA is a novel similarity-based double-vector Embedding method based on listwise Learning-To-Rank (LTR) that preserves Asymmetric information in directed graphs. It is straightforward embedding method implemented by a simple deep neural network consisting of only a projection layer and an output layer. In order to run ELTRA, the following packages are required:
Python >= 3.8
tensorflow >= 2.2
networkx =2.6.*
numpy =1.21.*
scipy =1.7.*
scikit-learn =1.0.*
ELTRA can be run directly from the command line or migrated to your favorite IDE.
A graph must be represented as a text file under the edge list format in which, each line corresponds to an edge in the graph, tab is used as the separator of the two nodes, and the node index is started from 0.
Although ELTRA is originally a double-vector embedding method for directed graphs, it is applicable to undirected graphs as well. In this case:
(1) The symmetric scores are computed as follows
(2) The top-t closet nodes are selected based on the above symmetric scores without applying the
(3) Each node is represented by two latent vectors (i.e., source and target).
ELTRA has the following parameters:
--graph: Input graph
--dataset_name: dataset name
--result_dir: Destination to save the embedding result, default is "output/" in the root directory
--dim: The embedding dimension, default is 128
--topk: Number of nodes in topK; default is -1 (i.e., topK is automatically computed as explained in the paper), otherwise set the value manually
--itr: Number of Iterations to compute AP-Scores, default is 3
--epc: Number of Epochs for training, default is 150
--bch: batch size, default is -1 (i.e., it is automatically computed as explained in the paper), otherwise set the value manually
--lr: Learning rate, default is 0.0025
--reg: Regularization parameter which is suggested to be 0.001 and 0.0001 with directed and undirected graphs, respectively; default is 0.001
--early_stop: The flag indicating to stop the training process if the loss stops improving, default is True
--wait_thr: Number of epochs with no loss improvement after which the training will be stopped, default is 20
--gpu: The flag indicating to run ELTRA on GPU, default is True
--graph_type: Indicates the graph type (directed or undirected), default is directed
-- impr: Importance factor of out-links over in-links to compute AP-Scores, default is 0.6
1- It is worth to note that topk plays an important role in ELTRA; therefore, in order to obtain the best effectiveness with any datasets, it is recommended to perform an appropriate parameter tuning on topk as explained in the paper, Section 4.3.1.
2- It is recommended to set the regularization parameter as 0.001 and 0.0001 with directed and undirected graphs, respectively.
- topk and bch parameters are set automatically
python ELTRA.py --graph data/DBLP/DBLP_directed_graph.txt --dataset_name DBLP
- topk and bch parameters are set manually
python ELTRA.py --graph data/DBLP/DBLP_directed_graph.txt --dataset_name DBLP --topk 50 --bch 1024
In this Appendix, we prove that the CRW scores are asymmetric, bounded, monotonic, unique, and always existent.
(1) Asymmetry: for every node-pair
Proof : According to Equation (2), if
(2) Bounding: for all
Proof : As mentioned in Section 3.1.2, if
also, according to the assumption, we have
since
(3) Monotonicity: for every node-pair
Proof: If
according to the assumptions,
(4) Existence: the fixed point
Proof: By the bounding and monotonicity properties, for any node-pairs
(5) Uniqueness: the solution for the fixed-point
Proof: Suppose that
thus,
Since
Masoud Reyhani Hamedani, Jin-Su Ryu, and Sang-Wook Kim. 2023. An Embedding Method based on Learning-to-Rank to Preserve Asymmetric Information in Directed Graphs. In Proceedings of the 32th ACM International Conference on Information and Knowledge Management, October 2023, Pages xx–xx. https://doi.org/10.1145/3583780.3614862