2011 census microdata play #68

edwardchalstrey1 · 2021-07-14T15:24:29Z

As part of the Synthetic Data and Privacy Preservation - Turing/ONS partnership project 3, we're trying out the QUIPP pipeline on this dataset.

Note: may or may not need to ever merge this - just putting up so @ots22 can easily pull the branch

@ots22 I've attempted to modify the existing examples to run the different synth-method choices with stock parameters, only changing the parts referring to column names. Example 4, the SGF one, worked without any errors (I've set this one to enabled: true) - if you pull the branch and set enabled: false for any of the others you should hopefully get the errors I got for those.

On the SGF one, it seems to have generated a synthetic dataset! Only there are no values for the 2nd column (possible I wrongly chose categorical type for the column in the dataset json here, not sure)

Also, I created an issue #67 for the error I got on the CTGAN one - as I noticed the same error when I tried to run the existing CTGAN example from run-inputs

ots22 · 2021-07-15T10:37:12Z

From our discussion in-person just now:

we're planning to drop CTGAN for now
we fixed a few errors in the synthpop parameters, and now a 'bootstrap' synthesis works
the classifiers run for a long time (to investigate)

gmingas · 2021-07-15T11:09:02Z

I think classifiers run for a long time when no specific classifier with specific hyperparamters is passed in the run-inputs file. In this case, a number of classifiers are tested with many combinations of hyperparameters each. I recommend using something like this to reduce time. It uses only logistic regression with defined params.

review-notebook-app · 2021-07-15T14:23:21Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

… around limitation on JADE)

2011 census microdata

edwardchalstrey1 added 3 commits July 14, 2021 16:09

census microdata run inputs

86366c2

add census microdata dataset

fda7a59

get synthpop working

3c5ff84

edwardchalstrey1 added 5 commits July 15, 2021 13:38

fix microdata

934f444

specify classifier

dc3f9a5

fix column error

493eff6

small dataset

9cb2bd7

hist plots notebook

dfc0cd5

edwardchalstrey1 and others added 19 commits July 15, 2021 15:24

use small dataset census synthpop

34ffb30

synthpop with cart multiple synth datasets

62baa6c

categorical columns

925c2b6

heatmaps

cbee886

add categorical columns explanation

8fe9d51

some edits

c9167e6

fix ctgan import

1a1dd4d

make examples to run false by default

27f6021

standardise census dataset example runs

d558f05

notebook comparing methods

3d3c735

add disclosure risk comparison

86f81cf

utility comparison

782da0d

notes

a2098ae

update columns

12e4743

Add helper 'run' targets to Makefile

e6a1809

Fix to Makefile run dependencies

987929a

Adjust dependencies of Makefile 'run' target

0ed0b5a

Switch to git+https protocol for DataSynthesizer requirement (to work…

57d0693

… around limitation on JADE)

change utility metric and try variants census ds

c3885a5

edwardchalstrey1 and others added 8 commits August 19, 2021 15:45

extra input columns utility classifier

cde10bc

Add a 'privacy metric' that produces synthetic data with leaked records

7f48fc4

Makefile rules for leaky output

b2b72c7

Update Makefile all target

7cfff55

Fix typo in Makefile

c4dd3a9

Add introductory notebook

a940ddb

Move census notebooks

258eb4c

Add Sharepoint links

9eb7492

ots22 marked this pull request as ready for review September 2, 2021 17:17

ots22 and others added 6 commits September 6, 2021 10:56

Fix Sharepoint links

2e766e1

Update Overview.ipynb

9f23739

Add notebook example

a451f90

add some explainers

e616777

additional explanation about file naming

a7e4348

Merge pull request #74 from callummole/2011-census-microdata

396590d

2011 census microdata

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2011 census microdata play #68

2011 census microdata play #68

edwardchalstrey1 commented Jul 14, 2021

ots22 commented Jul 15, 2021

gmingas commented Jul 15, 2021

review-notebook-app bot commented Jul 15, 2021

2011 census microdata play #68

Are you sure you want to change the base?

2011 census microdata play #68

Conversation

edwardchalstrey1 commented Jul 14, 2021

ots22 commented Jul 15, 2021

gmingas commented Jul 15, 2021

review-notebook-app bot commented Jul 15, 2021