This repository is the extension of implementation of the paper:
GAP: Differentially Private Graph Neural Networks with Aggregation Perturbation for my course project of DS603 (Advance ML). You can fing the PDF report here.
This code is implemented in Python 3.9 using PyTorch-Geometric 2.1.0 and PyTorch 1.12.1. Refer to requiresments.txt to see the full list of dependencies.
-
The code includes a custom C++ operator for faster edge sampling required for the node-level DP methods. PyTorch will automatically build the C++ code at runtime, but you need to have a C++ compiler installed (usually it is handled automatically if you use conda).
-
We use Weights & Biases (WandB) to track the training progress and log experiment results. To replicate the results of the paper as described in the following, you need to have a WandB account. Otherwise, if you just want to train and evaluate the model, a WandB account is not required.
-
We use Dask to parallelize running multiple experiments on high-performance computing clusters (e.g., SGE, SLURM, etc). If you don't have access to a cluster, you can also simply run the experiments sequentially on your machine (see usage section below).
-
The code requires autodp version 0.2.1b or later. You can install the latest version directly from the GitHub repository using:
pip install git+https://github.com/yuxiangw/autodp
To reproduce the paper's results, please follow the below steps:
-
Set your WandB username in wandb.yaml (line 7). This is required to log the results to your WandB account.
-
Execute the following python script:
python experiments.py --generate
This creates the file "jobs/experiments.sh" containing the commands to run all the experiments.
-
If you want to run the experiments on your own machine, run:
sh jobs/experiments.sh
This trains all the models required for the experiments one by one. Otherwise, if you have access to a supported HPC cluster, first configure your cluster setting (
~/.config/dask/jobqueue.yaml
) according to Dask-Jobqueue's documentation. Then, run the following command:python experiments.py --run --scheduler <scheduler>
where
<scheduler>
is the name of your scheduler (e.g.,sge
,slurm
, etc). The above command will submit all the jobs to your cluster and run them in parallel. -
Use results.ipynb notebook to visualize the results as shown in the paper. Note that we used the Linux Libertine font in the figures, so you either need to have this font installed or change the font in the notebook.
Run the following command to see the list of available options for training individual models:
python train.py --help
Privacy Level | Method | Amazon | |||
---|---|---|---|---|---|
None | GAP - |
80.0 |
99.4 |
91.2 |
|
None | SAGE - |
83.2 |
99.1 |
92.7 |
|
Edge | GAP-EDP | 4 | 76.3 |
98.7 |
83.8 |
Edge | SAGE-EDP | 4 | 50.4 |
84.6 |
68.3 |
Edge | MLP | 0 | 50.8 |
82.4 |
71.1 |
Node | GAP-NDP | 8 | 63.2 |
94.0 |
77.4 |
Node | SAGE-NDP | 8 | 37.2 |
60.5 |
27.5 |
Node | MLP-DP | 8 | 50.2 |
81.5 |
73.6 |
@inproceedings {sajadmanesh2023gap,
title = {GAP: Differentially Private Graph Neural Networks with Aggregation Perturbation},
author={Sajadmanesh, Sina and Shamsabadi, Ali Shahin and Bellet, Aur{\'e}lien and Gatica-Perez, Daniel},
booktitle = {32nd USENIX Security Symposium (USENIX Security 23)},
year = {2023},
address = {Anaheim, CA},
publisher = {USENIX Association},
month = aug,
}