Skip to content

Script to automate processing extracts of commuter data through a community-detection algorithm

Notifications You must be signed in to change notification settings

garrettdashnelson/commute-partitioning

Repository files navigation

Commute partitioning automation script

About

This set of scripts simplifies the process of running the Combo community-detection algorithm (developed by Sobolevsky et al.) using commuter-flow data from the American Community Survey (ACS) in order to produce algorithmically-evaluated regionalizations of the United States. It is based on the process used by Nelson and Rae in the 2016 article "A New Economic Geography of the United States: From Commutes to Megaregions".

Selecting extracts and running

The source file data-src/commutes.csv is a CSV file containing commuter flows between US Census Tracts derived from the ACS. You will need to download this file separately and place it in data-src, since it is too large for GitHub. This file has been scrubbed to remove ultra-long-distance commutes, commutes with origins or destinations outside of the Lower 48 states, and commutes for which the origin and destination lie in the same Census tract. Tracts are identified by their 10-digit FIPS code.

1. Create a list of selected FIPS codes.

The file data-src/subselection.txt is a listing of Census tracts, one per line, by FIPS code. Modify this file to include the FIPS codes of the Census tracts which you wish to analyze. The example selection is all of the Census tracts in the states of New Hampshire and Vermont. The Census Bureau has gazeteer files for 2010 Census tracts available here.

Note 1: if the file data-src/subselection.txt is absent, the script will operate on all available tract-to-tract commute data.

Note 2: if you produce this file by exporting from Excel for Mac to CSV, you will run into a common error: Excel produces files with \r newlines instead of \n. You'll need to find-and-replace \r with \n.

2. Run the preprocessor script.

$ python combo-preprocessor.py

The result of this script is two files: data-stage1/commutes.net, a Pajek network file suitable for input into Combo, and data-stage1/fips_table.csv, a lookup table which will be used later to match FIPS codes to the serialized id numbers used by Combo.

3. Run Combo.

$ ./comboCPP ./data-stage1/commutes.net [max-communities]

The build of Combo included here is compiled for OS X. If it does not run correctly, you will have to download the Combo source code and compile it yourself.

The variables [max-communities] may be left blank; if provided, Combo will limit to a given number of detected output communities.

The result of running Combo is a file, data-stage1/commutes_comm_comboC++.txt, with community assignments whose line numbers match the serialized ids in the Pajek file. Combo also writes the "modularity score" of the partitioning process to stdout.

Note 1: This operation is computationally expensive; for computations of over 5,000 tracts, you may need a high-performance computer.

4. Reassemble the tracts lookup table.

$ python combo-postprocessor.py

The result of this script is data-final/fips_table_with_community_assignments.csv, which is a CSV file containing the FIPS codes of the input Census tracts, the serial id produced for Combo (which is useful only for debugging purposes), and, most usefully, the detected-communtiy id (community numbering begins at 0).

You can now take this CSV file and join it to a spatial file of Census tracts or the point-to-point flows Shapefile used in our mapping.

Contact information

Garrett Dash Nelson — [email protected] · @en_dash · http://people.matinic.us/garrett

Alasdair Rae — [email protected] · @undertheraedar · http://statsmapsnpix.com

About

Script to automate processing extracts of commuter data through a community-detection algorithm

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages