Identify the core datasets that we want to include in the inventory #1

waldoj · 2015-01-07T20:48:52Z

No description provided.

waldoj · 2015-01-12T19:19:17Z

At U.S. Open Data, we've identified five core datasets that we advise every state government to publish:

registered corporations
legislation and legislators
laws and regulations
address points
campaign finance

Although each of these datasets are of varying value on its own, what makes each of these datasets valuable is that they make it possible to connect other datasets together, substantially increasing their value. For instance, without a list of registered corporations (and data about each of those corporations), it's impossible to know that a campaign contribution by Acme, LLC actually came from John Smith of Springfield (who owns 100% of the shares in Acme). And without data about every law, it's impossible to know what it manes that court ruling struck down §1.23-45. And so on. See the linked U.S. Open Data page for more about this.

So, those are five recommendations for datasets to include in this census.

waldoj · 2015-02-03T16:40:00Z

I don't know what to do with transportation (transit and otherwise) data. Some states simply have no public transit (beyond a couple of cities with buses), and most states have nothing in the way of commuter rail, and those states that do have such services generally run them on a municipal level, rather than as a state program (which is to say that it's out of states' hands as to whether that data exists). There are a lot of transportation data sets that are important, so I think it's going to be complicated to figure out which ones to look for, how to score, etc.

Solvable, but a challenge.

waldoj · 2015-02-03T17:14:43Z

I'm interested in nuts-and-bolts data, like the state budget, agency checkbooks, the org chart of all agencies and employees, their blue book, all boards and their members, FOIA requests/responses, all state websites, attorney general opinions, etc. I don't think that any of these warrant being their own category (I could be persuaded otherwise re: budget/checkbook), but I think there's sense in having a category that lumps all of these together. A lot of work with open data is impossible or impractical without having this kind of nuts-and-bolts data to connect it to.

emily878 · 2015-02-06T19:43:11Z

States are a lot closer to countries than cities are, so we could use more of the G8 National Action Plan definition of "high value datasets":

Data Category	Example datasets
Companies	Company/business register
Crime and Justice	Crime statistics, safety
Earth observation	Meteorological/weather, agriculture, forestry, fishing, and hunting
Education	List of schools; performance of schools, digital skills
Energy and Environment	Pollution levels, energy consumption
Finance and contracts	Transaction spend, contracts let, call for tender, future tenders, local budget, national budget (planned and spent)
Geospatial	Topography, postcodes, national maps, local maps
Global Development	Aid, food security, extractives, land
Government Accountability and Democracy	Government contact points, election results, legislation and statutes, salaries (pay scales), hospitality/gifts
Health	Prescription data, performance data
Science and Research	Genome data, research and educational activity, experiment results
Statistics	National Statistics, Census, infrastructure, wealth, skills
Social mobility and welfare	Housing, health insurance and unemployment benefits
Transport and Infrastructure	Public transport timetables, access points broadband penetration

emily878 · 2015-02-06T19:45:53Z

If we went with the full G8 list, then we could do a test with a data-friendly state to find out what kind of specific datasets we could, under ideal conditions, currently expect to get from these categories.

waldoj · 2015-02-06T19:49:10Z

I hadn't realized that the core state-level datasets that I've identified are all found within the G8 list, at least as subsets of them. 👍

rebeccawilliams · 2015-02-13T23:17:19Z

Yes, a State Census!

Looking at this list, I think you'll need a ton of experts and/or 14 different Censuses. Modifying the Census for nesting I think would be useful overall and is something @ondrae and I had discussed back in the day, maybe there could be a joint effort? Or perhaps the Local Censuses could get the upgraded Global Open Data Index design treatment.

In addition to the G8 Categories, were other categories explored? Can data that sits squarely in between the US City Open Data Census and Global Open Data Index datasets be prioritized for review?

For Companies, Open Corporates already collects a lot of state info, can new information be suggested to them rather than having a separate place for review?

I think you two both have these notes, but if helpful, some of this would apply to states as well:

Geospatial, an expert from NSGIC would be good.
~~Legislative~~ Between Open States and the State Decoded work, y'all have this.
Transit, perhaps APTA would be a good partner? @dsmorgan77 thoughts? See Find subject-matter experts to offer guidance #7

waldoj · 2015-02-18T02:53:57Z

Modifying the Census for nesting I think would be useful overall and is something @ondrae and I had discussed back in the day, maybe there could be a joint effort?

That's out of the scope of what we're doing in this initial effort but, yeah, that sure would be helpful. I've been frustrated by the limitations of Open Data Census, but that's probably just because I'm trying to do things with it that it wasn't meant for. :) At this point, I think we're going to have to try to muddle by with the existing architecture.

In addition to the G8 Categories, were other categories explored?

Just what you see above in this thread.

Can data that sits squarely in between the US City Open Data Census and Global Open Data Index datasets be prioritized for review?

I'm not sure what you mean—could you explain?

For Companies, Open Corporates already collects a lot of state info, can new information be suggested to them rather than having a separate place for review?

It'd be impossible for us to provide comprehensive scores for each state without including business data. (We can't just say "see this other ranking") But it seems to me that we'd want to just incorporate Open Corporates' metrics, rather than trying to do our own thing there.

dsmorgan77 · 2015-02-18T03:12:25Z

Doing my part to be an expert in transportation and infrastructure here...the issue you raise about "public transport timetables" is interesting. The thing is, in the United States, transit is very rarely a state government matter. According to the National Transit Database (http://www.ntdprogram.gov/ntdprogram/datbase/2013_database/2013%20Agency%20Information.xls), only 20 of the 857 reporting transit agencies are state governments.

The top two types of transit agencies are city/county/local government (430), independent authority (255). It breaks down really quickly after that. non-profit corporations (32), metropolitan planning organizations & councils of government (30), private for-profit corporations (25) and state governments (20).

Given this, I would recommend against including transit in a state-level census.

waldoj · 2015-02-18T03:16:00Z

👍 That was my gut feeling on transit, but having hard numbers is wonderful. Thank you!

rebeccawilliams · 2015-02-18T04:44:21Z

Can data that sits squarely in between the US City Open Data Census and Global Open Data Index datasets be prioritized for review?

I'm not sure what you mean—could you explain?

With this list, there are 134 discrete types of data you are looking to evaluate (and a lot of those could be broken down further, e.g. weather). I am assuming that these won't happen in the same pass. With that in mind, I think it'd be really useful to prioritize the areas that were covered by the Global Open Data Index and the US City Open Data Census because perhaps we'd see whole verticals of open data emerge.

I made a table, with total overlap in bold and some overlap in italics:

US Index	US State Census Possibilities	US City Census
	Government Accountability and Democracy: asset disclosure	Asset Disclosure
Government Budget	Finance and contracts: add budget to the list	Budget
Company Register	Companies / see Open Corporates.	Business Listings
	Government Accountability and Democracy: campaign finance / see Open Secrets	Campaign Finance Contributions
		Code Enforcement Violations
		Construction Permits
	Crime and Justice: crime statistics	Crime
	~~Earth observation~~
	~~Economic Development~~
	~~Education~~
Election Results	Government Accountability and Democracy: election results, see Open Elections
	~~Health~~
Legislation	Government Accountability and Democracy: legislation / see Open States
	Government Accountability and Democracy: lobbyist activity	Lobbyist Activity
National Map	Geospatial: maps you list
National Statistics	Statistics
	Geospatial: parcel maps	Parcels (shapefiles)
Pollutant Emissions	Energy and Environment: air quality?
Postcodes / Zipcodes	Geospatial: postcodes
	Finance and contracts: contracts	Procurement Contracts
		Property Assessment
		Property Deeds
		Public Buildings
		Restaurant Inspections
	~~Science and Research~~
		Service Requests (311)
	~~Social mobility and welfare~~
Government Spending	Finance and contracts: expenditures / see OSPIRG	Spending
Transport Timetables	Transport and Infrastructure (though this gets regional fast)	Transit
		Zoning (shapefiles)
		Web Analytics

So support/coordinate with:

Open Corporates in company data assessment
Open Elections in election result data assessment
Open Secrets in state campaign finance data assessment (is this complete?) @boblannon?
Open States in legislative data assessment (would anything be additive here?)
OSPIRG in spending data/checkbook assessment

Honestly, after writing those all down, creating a site that points to all of those open state data efforts would be useful in and of itself I think as a form of Open Everything Advocacy.

Based on overlap, it'd be great to prioritize:

open budget data, under Finance and contracts; it's not currently listed? Do most states have it?
Transport and Infrastructure -- @dsmorgan77 is right, though I worry the regional authorities will never be assessed FWIW

Then:

Crime and Justice: crime statistics
Energy and Environment: air quality
Finance and contracts:
- contracts
- procurement processes
Geospatial [w/ NSGIC and/or @sbma44]:
- parcels
- post codes
- etc
Government Accountability and Democracy:
- asset disclosure
- lobbyist activity
Statistics - though I'm not sure this one is clear

And very last, not because they aren't important, but because there is less open data assessment overlap currently, these categories:

Earth observation
Economic Development
Education
Health
Science and Research
Social mobility and welfare

dsmorgan77 · 2015-02-18T13:16:21Z

We should talk to NSGIC about overlap between some of the transport datasets we have and their efforts with Transportation for the Nation (http://www.nsgic.org/transportation-for-the-nation). Some, but not all, of the transportation data sets in this Census are geospatial and should rightly be expected to be published via State geospatial portals.

The work we've done to get full roadway centerline information from the states via the Highway Performance Monitoring System has been pretty helpful here (@sbma44 should know, cuz MapBox has written about it).

As for partners in getting the transportation bits of the Census done, I'd recommend AASHTO (http://www.transportation.org/Pages/Default.aspx), the association of all the State DOTs. They have a bunch of committees, like GIS-T (http://www.gis-t.org/) and I know NSGIC coordinates with them.

As for regional (transit) authorities, this is super complicated. There is plenty of regional transit that crosses state boundaries (and here I'm limiting the discussion to fixed-route service ... human services transportation, especially on-demand services does cross state boundaries). Obviously, we know WMATA/VRE, but there's also PATCO/SEPTA (greater Philly; PA/NJ), PATH (greater NYC; NY/NJ), MBTA (greater Boston, including bits of NH, RI), etc. Then there are independent transit agencies that don't cross state lines, and each of their charters are different. For instance, Chicago Transit Authority is created by Illinois state law. Similarly, Atlanta's MARTA was created by state legislation and approved by the counties that were impacted. For situations where an independent transit authority was established by State law, I could see reasons for including them in the Census - nevertheless, a State may have multiple transit authorities, and States don't always have to pass laws to form independent authorities. Sound Transit in Seattle was formed by the Snohomish, King, and Pierce County Councils. All that to say this: there's no one-size fits all approach to this issue, but it's clearly not just a State census issue. We could have a lot of fun investigating the various quasi-governmental entities, but I think that's an entirely different Census and set of issues.

rebeccawilliams · 2015-02-18T16:37:14Z

Thanks @dsmorgan77! Maybe @kpwebb's & friends will have the motivation for a transit deep dive that addresses all the forms of authority some day. Some folks assessed some regional transit on the US City Census, but, yes, it was all over the place, and the authority matters for advocacy.

waldoj · 2015-02-18T19:48:00Z

Honestly, after writing those all down, creating a site that points to all of those open state data efforts would be useful in and of itself I think as a form of Open Everything Advocacy.

I think you're right. It does look awfully useful. :) I imagine that was a lot of work!

We're definitely going for an iterative process here. That is, our aspirations are high, we'll start with something meh, and we'll improve from there. So maybe, at first, we identify just one core dataset for each area—the one dataset that defines it. Or maybe we change the scoring criteria, so that we instead score on the basis of whether a series of datasets exist. I don't know. But it seems best to start with something limited, and then work up to something complete!

waldoj · 2015-03-12T18:56:13Z

We've broken this up in a bunch of different issues now—closing as a duplicate.

waldoj closed this as completed Mar 12, 2015

rebeccawilliams mentioned this issue Apr 29, 2015

Revamp Local Government Topic to feature national data initiatives GSA/datagov-wptheme#626

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Identify the core datasets that we want to include in the inventory #1

Identify the core datasets that we want to include in the inventory #1

waldoj commented Jan 7, 2015

waldoj commented Jan 12, 2015

waldoj commented Feb 3, 2015

waldoj commented Feb 3, 2015

emily878 commented Feb 6, 2015

emily878 commented Feb 6, 2015

waldoj commented Feb 6, 2015

rebeccawilliams commented Feb 13, 2015

waldoj commented Feb 18, 2015

dsmorgan77 commented Feb 18, 2015

waldoj commented Feb 18, 2015

rebeccawilliams commented Feb 18, 2015

dsmorgan77 commented Feb 18, 2015

rebeccawilliams commented Feb 18, 2015

waldoj commented Feb 18, 2015

waldoj commented Mar 12, 2015

Identify the core datasets that we want to include in the inventory #1

Identify the core datasets that we want to include in the inventory #1

Comments

waldoj commented Jan 7, 2015

waldoj commented Jan 12, 2015

waldoj commented Feb 3, 2015

waldoj commented Feb 3, 2015

emily878 commented Feb 6, 2015

emily878 commented Feb 6, 2015

waldoj commented Feb 6, 2015

rebeccawilliams commented Feb 13, 2015

waldoj commented Feb 18, 2015

dsmorgan77 commented Feb 18, 2015

waldoj commented Feb 18, 2015

rebeccawilliams commented Feb 18, 2015

dsmorgan77 commented Feb 18, 2015

rebeccawilliams commented Feb 18, 2015

waldoj commented Feb 18, 2015

waldoj commented Mar 12, 2015