Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify our criteria for grading each dataset #2

Open
waldoj opened this issue Jan 7, 2015 · 9 comments
Open

Identify our criteria for grading each dataset #2

waldoj opened this issue Jan 7, 2015 · 9 comments

Comments

@waldoj
Copy link
Contributor

waldoj commented Jan 7, 2015

And, relatedly, whether it's possible for us to use our own criteria. CKAN's platform may not support using anything other than their own criteria.

@waldoj
Copy link
Contributor Author

waldoj commented Jan 7, 2015

I'd like to make the verifiability of datasets a small grading factor. If there is some mechanism to determine that a copy of a dataset is the same as the original (e.g., a hash), and SSL is in place to ensure that it's not tampered with in transit, that's got to be worth something.

@waldoj
Copy link
Contributor Author

waldoj commented Jan 12, 2015

This is the scoring methodology used by the OKFN's global census:

Question Details Weighting
Does the data exist? Does the data exist at all? The data can be in any form (paper or digital, offline or online etc). If it is not, then all the other questions are not answered. 5
Is data in digital form? This question addresses whether the data is in digital form (stored on computers or digital storage) or if it only in e.g. paper form. 5
Publicly available? This question addresses whether the data is “public”. This does not require it to be freely available, but does require that someone outside of the government can access it in some form (examples include if the data is available for purchase, if it exist as PDFs on a website that you can access, if you can get it in paper form - then it is public). If a freedom of information request or similar is needed to access the data, it is not considered public. 5
Is the data available for free? This question addresses whether the data is available for free or if there is a charge. If there is a charge, then that is stated in the comments section. 15
Is the data available online? This question addresses whether the data is available online from an official source. In the cases that this is answered with a ‘yes’, then the link is put in the URL field below. 5
Is the data machine readable? Data is machine readable if it is in a format that can be easily structured,by a computer. Data can be digital but not machine readable. For example, consider a PDF document containing tables of data. These are definitely digital but are not machine-readable because a computer would struggle to access the tabular information (even though they are very human readable!). The equivalent tables in a format such as a spreadsheet would be machine readable. Note: The appropriate machine readable format may vary by type of data – so, for example, machine readable formats for geographic data may be different than for tabular data. In general, HTML and PDF are not machine-readable. 15
Available in bulk? Data is available in bulk if the whole dataset can be downloaded or accessed easily. Conversely it is considered non-bulk if the citizens are limited to just getting parts of the dataset (for example, if restricted to querying a web form and retrieving a few results at a time from a very large database). 10
Openly licensed? This question addresses whether the dataset is open as per http://opendefinition.org. It needs to state the terms of use or license that allow anyone to freely use, reuse or redistribute the data (subject at most to attribution or sharealike requirements). It is vital that a licence is available (if there’s no licence, the data is not openly licensed). Open licences which meet the requirements of the Open Definition are listed at http://opendefinition.org/licenses/. 30
Is the data provided on a timely and up to date basis? This question addresses whether the data is up to date and timely - or long delayed. For example, for election data that it is made available immediately or soon after the election or if it is only available many years later. Any comments around uncertainty are put in the comments field 10

The Local Open Data Census does not have a "Methodology" section, at least that I can find, but it seems to use the same criteria. I do not know if it uses the same weighting, and I do not know if it is possible for us to use different criteria or different weighting.

@waldoj
Copy link
Contributor Author

waldoj commented Jan 23, 2015

I worry about scoring. At first blush, I'm not sure that a stated open license is really twice as important as being machine readable. I don't know if charging for data is necessarily a binary thing (yes, 0 points, no, 15 points)—if data costs $1M, that seems like it'd be worth 0 points, but if it costs 5¢, that's not great, but a different level of not-great than $1M.

It seems like it'd be good to actually gather some data, test out the scoring, see how the results look, and then fiddle with how things are weighed. But I don't think it looks great right now.

@waldoj
Copy link
Contributor Author

waldoj commented Jan 23, 2015

Here are the two additional metrics that I'd like to score against:

  • Is there a mechanism in place to ensure that the data is not tampered with during transfer (e.g., SSL)?
  • Is it possible for somebody to validate their copy of a dataset against a master?

I don't think these are terribly important (yet), so I imagine I'd award just 5 points apiece.

@waldoj
Copy link
Contributor Author

waldoj commented Feb 6, 2015

Another one: Is it available in their central repository?

@emily878
Copy link
Contributor

This is definitely flexible - see how Code for America has modified the questions for their Local Digital Services Census: https://service-census.herokuapp.com/ (rolled out for CodeAcross 2015.)

@waldoj
Copy link
Contributor Author

waldoj commented Feb 11, 2015

Over on issue #5, I've come to the same conclusion. As soon as OKFN gives our account the thumbs-up, I think we can start entering any criteria that we like.

@emily878
Copy link
Contributor

emily878 commented Mar 6, 2015

Does it include key elements? (Version of: is it complete?)

@emily878
Copy link
Contributor

emily878 commented Mar 6, 2015

For future: does it adhere to the accepted schema?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants