A centralized repo to handle data update/publishing between Labs and EDM. Note that we are checking (diffing) daily for any file difference, and all datasets pending updates will have a issue created automatically to facilitate and record the update process.
Usage:
./run.sh [install, show, publish, delete, diff, diff_list, list]
Commands:
install: Install minio and configure host -- spaces
show: show available versions and files e.g. ./run.sh show <dataset> --production|--staging
publish: publish a given dataset from a given candidate version (default candidate is "staging")
delete: deleting a version, by default production and staging cannot be deleted
diff: detecting if any file difference between production and staging. e.g. ./run.sh diff <dataset>
diff_list: listing all dataset names that are out of sync
list: listing all dataset names
If a change is detected between the datasets in the staging
and production
folders an issue is automatically opened in this repo. Since data update often happen in groups (as defined in metadata.json), you can filter labels to get all datasets that fall under a category. Then you can select the datasets to update and apply the publish label to trigger a bulk update for all selected datasets.
Note that this method is currently supported but not prefered, and we will deprecate this method eventually.
If a change is detected between the datasets in the staging
and production
folders an issue is automatically opened in this repo.
Review the files in the staging
environment. If the files pass your review, comment [publish] as a comment in the issue.
The comment triggers a GitHub Action to move the staging files to production. Then, close the issue.
Staging applications point to datasets in the staging
folder, which are synced with the general Carto instance, while production applcations point to datasets in the production
folder, which are synced with the Planning Labs Carto instance. Carto syncs are scheduled to run daily; therefore, it may take up to 24 hours for a dataset that is in the production
folder to be synced with Carto and go live in the production application. To make sure that a dataset is reflected in the application right after being updated you can trigger a manual sync in Carto, by clicking on the dataset and clicking "Sync now."
We are checking (diffing) daily for any file difference, and all datasets pending updates will have a issue created automatically to facilitate and record the update process; however, if you would like to manually trigger a diffing workflow you can navigate to the Actions
tab, select Production Diff Staging
, then click "Run Workflow" off of the main branch.
You can use this table to know which applications use which datatables. You need to make sure that 1) the dataset appears and 2) in some cases (i.e. PLUTO) that it's the latest dataset by spot checking some values.