-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Identify the core datasets that we want to include in the inventory #1
Comments
At U.S. Open Data, we've identified five core datasets that we advise every state government to publish:
Although each of these datasets are of varying value on its own, what makes each of these datasets valuable is that they make it possible to connect other datasets together, substantially increasing their value. For instance, without a list of registered corporations (and data about each of those corporations), it's impossible to know that a campaign contribution by Acme, LLC actually came from John Smith of Springfield (who owns 100% of the shares in Acme). And without data about every law, it's impossible to know what it manes that court ruling struck down §1.23-45. And so on. See the linked U.S. Open Data page for more about this. So, those are five recommendations for datasets to include in this census. |
I don't know what to do with transportation (transit and otherwise) data. Some states simply have no public transit (beyond a couple of cities with buses), and most states have nothing in the way of commuter rail, and those states that do have such services generally run them on a municipal level, rather than as a state program (which is to say that it's out of states' hands as to whether that data exists). There are a lot of transportation data sets that are important, so I think it's going to be complicated to figure out which ones to look for, how to score, etc. Solvable, but a challenge. |
I'm interested in nuts-and-bolts data, like the state budget, agency checkbooks, the org chart of all agencies and employees, their blue book, all boards and their members, FOIA requests/responses, all state websites, attorney general opinions, etc. I don't think that any of these warrant being their own category (I could be persuaded otherwise re: budget/checkbook), but I think there's sense in having a category that lumps all of these together. A lot of work with open data is impossible or impractical without having this kind of nuts-and-bolts data to connect it to. |
States are a lot closer to countries than cities are, so we could use more of the G8 National Action Plan definition of "high value datasets":
|
If we went with the full G8 list, then we could do a test with a data-friendly state to find out what kind of specific datasets we could, under ideal conditions, currently expect to get from these categories. |
I hadn't realized that the core state-level datasets that I've identified are all found within the G8 list, at least as subsets of them. 👍 |
Yes, a State Census! Looking at this list, I think you'll need a ton of experts and/or 14 different Censuses. Modifying the Census for nesting I think would be useful overall and is something @ondrae and I had discussed back in the day, maybe there could be a joint effort? Or perhaps the Local Censuses could get the upgraded Global Open Data Index design treatment. In addition to the G8 Categories, were other categories explored? Can data that sits squarely in between the US City Open Data Census and Global Open Data Index datasets be prioritized for review? For Companies, Open Corporates already collects a lot of state info, can new information be suggested to them rather than having a separate place for review? I think you two both have these notes, but if helpful, some of this would apply to states as well:
|
That's out of the scope of what we're doing in this initial effort but, yeah, that sure would be helpful. I've been frustrated by the limitations of Open Data Census, but that's probably just because I'm trying to do things with it that it wasn't meant for. :) At this point, I think we're going to have to try to muddle by with the existing architecture.
Just what you see above in this thread.
I'm not sure what you mean—could you explain?
It'd be impossible for us to provide comprehensive scores for each state without including business data. (We can't just say "see this other ranking") But it seems to me that we'd want to just incorporate Open Corporates' metrics, rather than trying to do our own thing there. |
Doing my part to be an expert in transportation and infrastructure here...the issue you raise about "public transport timetables" is interesting. The thing is, in the United States, transit is very rarely a state government matter. According to the National Transit Database (http://www.ntdprogram.gov/ntdprogram/datbase/2013_database/2013%20Agency%20Information.xls), only 20 of the 857 reporting transit agencies are state governments. The top two types of transit agencies are city/county/local government (430), independent authority (255). It breaks down really quickly after that. non-profit corporations (32), metropolitan planning organizations & councils of government (30), private for-profit corporations (25) and state governments (20). Given this, I would recommend against including transit in a state-level census. |
👍 That was my gut feeling on transit, but having hard numbers is wonderful. Thank you! |
With this list, there are 134 discrete types of data you are looking to evaluate (and a lot of those could be broken down further, e.g. weather). I am assuming that these won't happen in the same pass. With that in mind, I think it'd be really useful to prioritize the areas that were covered by the Global Open Data Index and the US City Open Data Census because perhaps we'd see whole verticals of open data emerge. I made a table, with total overlap in bold and some overlap in italics:
So support/coordinate with:
Honestly, after writing those all down, creating a site that points to all of those open state data efforts would be useful in and of itself I think as a form of Open Everything Advocacy. Based on overlap, it'd be great to prioritize:
Then:
And very last, not because they aren't important, but because there is less open data assessment overlap currently, these categories:
|
We should talk to NSGIC about overlap between some of the transport datasets we have and their efforts with Transportation for the Nation (http://www.nsgic.org/transportation-for-the-nation). Some, but not all, of the transportation data sets in this Census are geospatial and should rightly be expected to be published via State geospatial portals. The work we've done to get full roadway centerline information from the states via the Highway Performance Monitoring System has been pretty helpful here (@sbma44 should know, cuz MapBox has written about it). As for partners in getting the transportation bits of the Census done, I'd recommend AASHTO (http://www.transportation.org/Pages/Default.aspx), the association of all the State DOTs. They have a bunch of committees, like GIS-T (http://www.gis-t.org/) and I know NSGIC coordinates with them. As for regional (transit) authorities, this is super complicated. There is plenty of regional transit that crosses state boundaries (and here I'm limiting the discussion to fixed-route service ... human services transportation, especially on-demand services does cross state boundaries). Obviously, we know WMATA/VRE, but there's also PATCO/SEPTA (greater Philly; PA/NJ), PATH (greater NYC; NY/NJ), MBTA (greater Boston, including bits of NH, RI), etc. Then there are independent transit agencies that don't cross state lines, and each of their charters are different. For instance, Chicago Transit Authority is created by Illinois state law. Similarly, Atlanta's MARTA was created by state legislation and approved by the counties that were impacted. For situations where an independent transit authority was established by State law, I could see reasons for including them in the Census - nevertheless, a State may have multiple transit authorities, and States don't always have to pass laws to form independent authorities. Sound Transit in Seattle was formed by the Snohomish, King, and Pierce County Councils. All that to say this: there's no one-size fits all approach to this issue, but it's clearly not just a State census issue. We could have a lot of fun investigating the various quasi-governmental entities, but I think that's an entirely different Census and set of issues. |
Thanks @dsmorgan77! Maybe @kpwebb's & friends will have the motivation for a transit deep dive that addresses all the forms of authority some day. Some folks assessed some regional transit on the US City Census, but, yes, it was all over the place, and the authority matters for advocacy. |
I think you're right. It does look awfully useful. :) I imagine that was a lot of work! We're definitely going for an iterative process here. That is, our aspirations are high, we'll start with something meh, and we'll improve from there. So maybe, at first, we identify just one core dataset for each area—the one dataset that defines it. Or maybe we change the scoring criteria, so that we instead score on the basis of whether a series of datasets exist. I don't know. But it seems best to start with something limited, and then work up to something complete! |
We've broken this up in a bunch of different issues now—closing as a duplicate. |
No description provided.
The text was updated successfully, but these errors were encountered: