Skip to content

A centralized repo to handle data update/publishing between Labs and EDM

Notifications You must be signed in to change notification settings

NYCPlanning/edm-data-operations

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

edm-data-operations

A centralized repo to handle data update/publishing between Labs and EDM. Note that we are checking (diffing) daily for any file difference, and all datasets pending updates will have a issue created automatically to facilitate and record the update process.

CLI Instructions

Usage:
./run.sh [install, show, publish, delete, diff, diff_list, list]

Commands:
   install:   Install minio and configure host -- spaces
   show:      show available versions and files e.g. ./run.sh show <dataset> --production|--staging
   publish:   publish a given dataset from a given candidate version (default candidate is "staging")
   delete:    deleting a version, by default production and staging cannot be deleted
   diff:      detecting if any file difference between production and staging. e.g. ./run.sh diff <dataset>
   diff_list: listing all dataset names that are out of sync
   list:      listing all dataset names

Publishing workflow

Publish By Labeling a Issue

If a change is detected between the datasets in the staging and production folders an issue is automatically opened in this repo. Since data update often happen in groups (as defined in metadata.json), you can filter labels to get all datasets that fall under a category. Then you can select the datasets to update and apply the publish label to trigger a bulk update for all selected datasets. image

Publish via Issue Comments

Note that this method is currently supported but not prefered, and we will deprecate this method eventually.

If a change is detected between the datasets in the staging and production folders an issue is automatically opened in this repo. Screen Shot 2021-04-21 at 10 33 44 AM

Review the files in the staging environment. If the files pass your review, comment [publish] as a comment in the issue. Screen Shot 2021-04-21 at 10 35 49 AM

The comment triggers a GitHub Action to move the staging files to production. Then, close the issue.

Staging applications point to datasets in the staging folder, which are synced with the general Carto instance, while production applcations point to datasets in the production folder, which are synced with the Planning Labs Carto instance. Carto syncs are scheduled to run daily; therefore, it may take up to 24 hours for a dataset that is in the production folder to be synced with Carto and go live in the production application. To make sure that a dataset is reflected in the application right after being updated you can trigger a manual sync in Carto, by clicking on the dataset and clicking "Sync now."

Screen Shot 2021-04-21 at 1 38 48 PM

Diffing workflow

We are checking (diffing) daily for any file difference, and all datasets pending updates will have a issue created automatically to facilitate and record the update process; however, if you would like to manually trigger a diffing workflow you can navigate to the Actions tab, select Production Diff Staging , then click "Run Workflow" off of the main branch.

Review workflow

You can use this table to know which applications use which datatables. You need to make sure that 1) the dataset appears and 2) in some cases (i.e. PLUTO) that it's the latest dataset by spot checking some values.

About

A centralized repo to handle data update/publishing between Labs and EDM

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published