Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure API to import data from another portal - phase 1 #69

Open
bgajdero opened this issue Nov 2, 2023 · 1 comment
Open

Configure API to import data from another portal - phase 1 #69

bgajdero opened this issue Nov 2, 2023 · 1 comment
Assignees

Comments

@bgajdero
Copy link
Contributor

bgajdero commented Nov 2, 2023

This project is Phase 1 of the API import project.
It will result in an API-Configuration file that can be customized for any end-point.
It will reuse bulk upload and import script code.

  1. define API for importing from another CKAN instance
  2. define mappings of fields
  3. define access auth requirements
  4. Import-start function, first version will have a button on the Config page. Eventually we will use s timer to automatically do it.
  5. define end-point fields:
  • url
  • architecture (CKAN, Socrata, Dataverse, etc)
  • auth tokens required
  • metadata mapping file
  • catalogue selection criteria (either list of catalogue entries or one config per entry)
  • auxiliary data from additional point, such as quality metrics

Start with City of Toronto Open Data Portal:
There are a lot of ways to configure these datastore_search calls – more info here: https://docs.ckan.org/en/2.9/maintaining/datastore.html#ckanext.datastore.logic.action.datastore_search

There are lots more API endpoints you can call that are documented here: https://docs.ckan.org/en/2.9/api/

API END POINTS

List packages
To get a list of all package names from our CKAN instance:
https://ckan0.cf.opendata.inter.prod-toronto.ca/api/3/action/package_list

List Resources

(for reference, https://ckan0.cf.opendata.inter.prod-toronto.ca/api/3/action/ is the base URL for 99% of the API endpoints you’ll hit on our portal)

Show Package
Package_show will return a JSON object containing high level information about the data on this page (the data owner, the last refreshed date, associated topics and civic issues, etc). This JSON will also contain high level information for each “resource” on this page. A “resource” is one concrete data thing (like a file, or a database table), and its object this API response will contain information you’ll need to grab its contents.

Accessing Aux Data
To get records from a dataset, like the data quality scoring dataset:

  1. get the package metadata:
    https://ckan0.cf.opendata.inter.prod-toronto.ca/api/3/action/package_show?id=catalogue-quality-scores

  2. get the “id” from the resource names quality-scores-explanation-codes-and-scores and plug it into the below datastore_search API call:
    https://ckan0.cf.opendata.inter.prod-toronto.ca/api/3/action/datastore_search?id=6d999ad7-d83c-4515-afc7-cae7ea85a1a8

  3. Each record in the “records” sub-object in the response should be a row in the spreadsheet I showed you today. You can match its package_name and resource_name attributes to a package and resource from a package_show call

@bgajdero
Copy link
Contributor Author

For custom metadata processing, add a graph mapping configuration function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants