Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GIS needs POPS as zipped shapefile and csv #1231

Open
damonmcc opened this issue Nov 6, 2024 · 6 comments
Open

GIS needs POPS as zipped shapefile and csv #1231

damonmcc opened this issue Nov 6, 2024 · 6 comments
Assignees

Comments

@damonmcc
Copy link
Member

damonmcc commented Nov 6, 2024

a GIS script pulls POPS data from edm-recipes/datasets/dcp_pop/latest/ and expects to find dcp_pops.shp.zip and dcp_pops.csv.zip

however we stopped archiving source data to those formats, so there is currently only dcp_pops.csv, dcp_pops.parquet, dcp_pops.sql in that folder. the last time we archived a POPs shapefile was version 20240131

@fvankrieken
Copy link
Contributor

Is there a reason they need both the csv and the shapefile?

Also, is this need driven by ArcPy, or more that this actual shapefile needs to be distributed? If the former, could arcpy just pull in the geoparquet file instead?

@damonmcc
Copy link
Member Author

damonmcc commented Nov 6, 2024

@caseysmithpgh I used the csv from dcp_pop/20240814/ to generate dcp_pops.shp.zip and dcp_pops.csv.zip and put them in both dcp_pop/20240814/ and dcp_pop/latest/. I ran the code below in a jupyter notebook to create the shapefile:

import geopandas as gpd
gdf = gpd.read_file(r'/Users/damonmccullough/Downloads/dcp_pops.csv')
gdf.to_file("dcp_pops.shp/dcp_pops.shp")

If either of those files are giving your scripts issues, happy to troubleshoot

@caseysmithpgh
Copy link

I need to take a closer look at the utils script this is referencing--but tagging @jackrosacker just for an FYI

Image

@damonmcc
Copy link
Member Author

damonmcc commented Nov 6, 2024

@fvankrieken
Is there a reason they need both the csv and the shapefile?

Also, is this need driven by ArcPy, or more that this actual shapefile needs to be distributed? If the former, could arcpy just pull in the geoparquet file instead?

I'm not sure, haven't see their script(s) yet but would love to change this dependency on a recipes shapefile. seems like two long-term options are:

  1. DE publishes POPS to edm-publishing/ for GIS scripts to get a "transformed" version of POPS
  2. GIS script pulls in dcp_pops.csv from edm-recipes and generates the shapefile

@damonmcc damonmcc self-assigned this Nov 6, 2024
@jackrosacker
Copy link

jackrosacker commented Nov 6, 2024

Our script expects to take a csv and shapefile, and output a csv, shapefile, and geodatabase for publication. The code as written takes the shapefile, converts to feature class, changes dataset schema, and then exports a fresh shapefile. This results in an unchanged csv, and a feature class and shapefile each with the new dataset schema.

If I'm remembering correctly, we chose this route to avoid using geopandas, as there are occasionally some conda environment issues on our end when we introduce geopandas. We can happily work around that now/soon.

For this release, I'm hoping that changing the zipped shapefile on your end solves the immediate problem so that we can publish this release without changing code. Then for next release we can use geopandas to ingest csv/geoparquet > apply schema changes to the geodataframe > write to shapefile and feature class.

@fvankrieken
Copy link
Contributor

Regenerated zipped csv and shapefile using data library to ensure access level is correct

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 📬 Next
Development

No branches or pull requests

4 participants