On Minimizing the Impact of Dataset Shifts on Actionable Explanations

Code for On Minimizing the Impact of Dataset Shifts on Actionable Explanations, UAI 2023, by Anna P. Meyer, Dan Ley, Suraj Srinivas, and Himabindu Lakkaraju


Our preprocessed data is located in the data folder.

The code assumes that there exists <file_base>_train.csv and <file_base>_test.csv files, where <file_base> is the second command-line argument and can be, for example, data/adult if the files data/adult_train.csv and data/adult_test.csv exist.


Our code relies on standard python libraries like numpy, pandas, and pytorch. We also use the Captum library to compute explanations.

To replicate our results:

For the WHO results, run CSV files will be saved as ft_res_who.csv for fine-tuning (Sec 4.1) and rt_res_who for retraining (Sec 4.2).

For Adult and HELOC, similarly, run and, respectively, for a subset of the experiments (all experiments with added synthetic noise with a standard deviation of 0.001.)

To run custom experiments:

The code is broken into a few pieces, depending on your goals.

Real-world distribution shifts

The code assumes that there will be two data files, named filename_orig_train.csv and filename_shift_train.csv (and likewise for the test sets).

Retraining models

Use this section as a guide if you want to compare models that were retrained from scratch.

Start with to generate the models, predictions, and test-set gradients. This file can be run as follows:

python3 <dataset> <file_base> <run_id> --dataset_shift=1

Dataset is the name of the dataset (e.g., whobin, adult, or heloc). Filebase is the filename up through _train.csv. run_id is any unique string corresponding to this set of experiments.

These other parameters can be changed from the defaults, if desired:


  • --output_dir default .. Where to save output files
  • --label_col default label. Column name of the output variable in the dataset
  • --variations default 10. How many trials to execute
  • --fixed_seed default false. If true, use the same random model initialization for the original and shifted models

Training parameters

  • --lr default 0.2. Learning rate
  • --lr_decay default 0.8. Learning rate decay
  • --epochs list of epochs (in ascending order) at which to calculate model explanations. the final (maximum) value is the total number of training epochs. Default [20].
  • --batch_size default 128
  • --weight_decay default 0
  • --dropout default 0
  • --optimizer default none, corresponding to SGD. Other options are amsgrad (Adam with amsgrad) or adam

Model architecture

  • --nodes_per_layer default 50. Nodes per hidden layer
  • --num_layers default 5. Number of hidden layers
  • --activation default relu. Activation function, use leak for leaky relu and soft for softplus
  • --beta default 5. If using softplus, beta parameter

Extra These parameters can also be used, but are not used for any of the experiments in the paper.

  • --adversarial default false. If true, use adversarial training while training the models
  • --epsilon (default 0.5, not used). Epsilon for constructing adversarial examples
  • --linear default false. If true, train a linear model instead of a neural network

A number of files will be saved after running model parameters (.pt files), accuracy and loss (.npy files), and gradients (.npy files) for various attribution techniques. Several columns compare different data (original vs shifted), as follows:

  • files with shiftcompare in their titles compare the original and updated models directly
    • files with shiftcompare and full in the title compare the original and shifted model as evaluated on the updated test dataset
    • files without full compare the models as evaluated on the original test dataset
  • Files without shiftcompare are the standard comparison of behavior across multiple random seeds
    • Files with shift in the title use the updated test data to evaluate the models trained on the updated training data
    • The orig files use the original test data to evaluate the models trained using the original training data

To post-process these files into useful data, run, as follows:

python3 <files_location> <output_file> --run_id <run_id1> <run_id2> <run_idn> --epochs <e1> <e2>

files_location is where all of the .npy files live, i.e., the output_dir parameter from (default .). output_file is the name of the csv file in which to store the results. --run_id takes a list of run_id's from potentially multiple runs of with different settings (however, all trials must have same dataset_shift value and fixed_seed value). Epochs is a list of epochs at which data was recorded (ascending order) will save a CSV file containing aggregate information about explanation robustness.

Fine-tuning models

Use to run fine-tuning experiments on real-world data shifts.

python3 <dataset> <file_base> <run_id>

The same command-line parameters as for can be used, and have the same defaults, except for --epochs whose default is 1000. There is one additional command-line parameter, --finetune_epochs (default 250), which is the number of additional epochs for fine-tuning.

To post-process the raw output, run, e.g.,

python3 <files_location> <output_file> --run_id <run_id1> <run_id2> <run_idn> --epochs <e1> --finetune_epochs <f1>

Note that you must specify the epoch(s) at which the data was measured.

Synthetic dataset shift (Gaussian noise)

Retraining models

To compare models that are retrained from scratch, follow the same results as for real-world dataset shift retrained models (i.e., run as described above, but omit the dataset_shift command-line parameter.) Make sure to specify fixed_seed 1 as a command-line parameter, otherwise, results will vary a lot based on using different random seeds to train the networks.

These additional command-line parameters will be useful.

  • --threshold default 0. Standard deviation of gaussian noise to add (for continuous features), or probability of modifying each feature (for binary features)
  • --base_repeats default 10. How many "base models" to use. Exact implementation depends on whether fixed_seed is True (it is True in all of the paper experiments), as described below.
    • For example, if fixed_seed is false, --variations=5 and --base_repeats=10, we will train 10 base models and 5 modified models, for a total of 15 models. We compare each of the modified models with each of the base models. All 15 models that we train will use different random initializations.
    • For example, if --fixed_seed=1, --variations=5, and --base_repeats=10, we will train 10 base models and 5 modified models for each base model, for a total of 60 models. Each base model will be trained using a different model initialization, and all of the 5 modified models corresponding to it will use the same model initialization.

Note that our experiments only vary base_repeats, however, we leave variations in the code as a way to make more comparisons more quickly in the future.

Fine-tuning models

For fine-tuning experiments on synthetic data shifts, use to run the experiments and to postprocess the results. E.g., heloc <path_to_data> <run_id> --threshold 0.1

python . <output_file> --run_id <run_id> --epochs <e1> --finetune_epochs <f1>