This repo contains a number of demonstration resources for performing common campaign-related data science tasks. Much of the code in this repo relies on Civis Platform, a research infrastructure built on AWS, often utilized for political campaigns. The SQL code in the repo is Redshift SQL and many of the modeling functions utilize the Civis API. Also, several of the scripts reference data available through the DNC Phoenix database, supplementary geotables (available here), and proprietary campaign data. For this reason most of the code will require some modification, but can serve as a useful blueprint for setting up a data science capability on a campaign.
Here is a summary of some of the demostration code available in this repo:
- SQL to build a covariate matrix with over 500+ ML-ready predictors
- SQL to build an analysis table that applies demographic and socioeconomic labels to all individuals in the Phoenix database
- Python notebooks to build ML models in Civis Platform and generate predictions
- SQL to organize campaign data into a voter contact universe table
- A Python notebook to assign travel times to individuals that live within a fixed distance to campaign related events.
Note, all of the above code is not actively maintained and will not run out of the box due to database dependencies. Please leave feedback via issues and we will do our best to respond.
- This script creates a modeling frame consisting of over 500 covariates at the individual and small area level. The build process assumes that your campaign has access to DNC Phoenix, but changes may be needed subject to schema naming conventions or if there are revisions to the source tables. The SQL script uses AWS Redshift syntax.
- The table build process also pulls in geotables from public data sources which you can download from this link.
- These dev scripts were used to clean and imput missing data in the geotables.
- This script creates an analytics frame and labels individual-level data into common socioeconomic and demographic classifications. Again, this script requires access to DNC Phoenix, which may be subject to changes in schema naming conventions and revisions to source tables.
- See this data dictionary for descriptions related to the modeling frame and analytics frame.
- The modeling folder contains a few scripts to train models in Civis Platform using the CivisML API. This Python notebook contains a demonstration workflow that trains several models on a series of dependent variables, builds a table of validation metrics, and generates predictions in a model output table with percentiled scores partitioned on states.
- Visit the universes folder for info about the
base_universe
table for cutting GOTV lists and volunteer / event recruitment. - To enable location based targeting here is a workflow that calculates the travel time between each voter within a given commuting radius of a set of points of interest.
Organizing Analytics Team. Nicholas Marchio, Data Science Engineer.
Please feel free to leave an issue to get in touch about how to get involved, improving this demonstration codebase, or other open source political data projects.