edm-data-operations

A centralized repo to handle data update/publishing between Labs and EDM. Note that we are checking (diffing) daily for any file difference, and all datasets pending updates will have a issue created automatically to facilitate and record the update process.

CLI Instructions

Usage:
./run.sh [install, show, publish, delete, diff, diff_list, list]

Commands:
   install:   Install minio and configure host -- spaces
   show:      show available versions and files e.g. ./run.sh show <dataset> --production|--staging
   publish:   publish a given dataset from a given candidate version (default candidate is "staging")
   delete:    deleting a version, by default production and staging cannot be deleted
   diff:      detecting if any file difference between production and staging. e.g. ./run.sh diff <dataset>
   diff_list: listing all dataset names that are out of sync
   list:      listing all dataset names

Publishing workflow

Publish By Labeling a Issue

If a change is detected between the datasets in the staging and production folders an issue is automatically opened in this repo. Since data update often happen in groups (as defined in metadata.json), you can filter labels to get all datasets that fall under a category. Then you can select the datasets to update and apply the publish label to trigger a bulk update for all selected datasets.

Publish via Issue Comments

Note that this method is currently supported but not prefered, and we will deprecate this method eventually.

If a change is detected between the datasets in the staging and production folders an issue is automatically opened in this repo.

Review the files in the staging environment. If the files pass your review, comment [publish] as a comment in the issue.

The comment triggers a GitHub Action to move the staging files to production. Then, close the issue.

Staging applications point to datasets in the staging folder, which are synced with the general Carto instance, while production applcations point to datasets in the production folder, which are synced with the Planning Labs Carto instance. Carto syncs are scheduled to run daily; therefore, it may take up to 24 hours for a dataset that is in the production folder to be synced with Carto and go live in the production application. To make sure that a dataset is reflected in the application right after being updated you can trigger a manual sync in Carto, by clicking on the dataset and clicking "Sync now."

Diffing workflow

We are checking (diffing) daily for any file difference, and all datasets pending updates will have a issue created automatically to facilitate and record the update process; however, if you would like to manually trigger a diffing workflow you can navigate to the Actions tab, select Production Diff Staging , then click "Run Workflow" off of the main branch.

Review workflow

You can use this table to know which applications use which datatables. You need to make sure that 1) the dataset appears and 2) in some cases (i.e. PLUTO) that it's the latest dataset by spot checking some values.

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
README.md		README.md
convert.py		convert.py
metadata.json		metadata.json
metadata.yml		metadata.yml
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

edm-data-operations

CLI Instructions

Publishing workflow

Publish By Labeling a Issue

Publish via Issue Comments

Diffing workflow

Review workflow

About

Releases

Packages

Contributors 5

Languages

NYCPlanning/edm-data-operations

Folders and files

Latest commit

History

Repository files navigation

edm-data-operations

CLI Instructions

Publishing workflow

Publish By Labeling a Issue

Publish via Issue Comments

Diffing workflow

Review workflow

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages