-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for geo data #91
Comments
Thanks for raising this, Robin. Integration with the duckdb spatial extension would be a really cool feature, but also a lot of work. Do we need to figure out how to translate sf data frames into something that the duckdb spatial extension understands, and vice versa? Adding support for functions is then "only" a matter of diligence: https://github.com/duckdblabs/duckplyr/pull/179/files#diff-a202cfba76540d6822868ac7755edd4945b6344057d78e0092f4836e33c0d4eaR11 . |
I imagine so, and given that everything other than the geometry column is already sorted, it's just the geometry that needs converting (safe to assume just 1 geometry column in 99% of use cases I think). |
Seems like DuckDB -> sf has been implemented here: https://github.com/cboettig/duckdbfs/blob/main/R/to_sf.R Not sure how hard the other way would be let alone how to make it fast. |
The duckdb -> sf conversion there is mostly solid, but could be a bit better. Currently there's a couple different ways in which geospatial data is stored in duckdb:
Re sf -> duckdb, I don't think this is much of an issue, though there are various ways to do it depending on precisely what you mean by "to duckdb". Specifically, I think the best thing to do is simply have sf write out as a geoparquet file to disk. (this assumes sf is built with recent gdal that has arrow support of course!). Since presumably this use case means the data is small enough to fit in RAM, writing out as, say, geodatabase is probably just as good (maybe better given the issue noted above), and then have duckdb read that in. It is possible to write to duckdb's native database format with DBI instead (i.e. with the WKB-binary column), and then you'd need the extra coercion once in duckdb to make it into duckdb's internal spatial type, but I don't see the use for that. (For most users I think it's actually better to pretend that duckdb's native database doesn't exist and work directly against flat files). Sorry, long story short, I think duckdbfs should handle both cases (simply noting that sf should serialize to disk in any standard spatial format), modulo this edge case about geoparquet. |
I would use {wk} for the sf<->wkb<->blob conversion, it supports a wide range of other conversions already (not terra::vect sadly). Should BLOB type be already supported? I see ## wget https://data.source.coop/fused/overture/2024-02-15-alpha-0/theme=admins/type=administrativeBoundary/0.parquet
duckplyr::duckplyr_df_from_parquet("0.parquet")
Error: rel_to_altrep: Unknown column type for altrep: BLOB This would otherwise look like this arrow::open_dataset("0.parquet") |> dplyr::select(geometry) |> dplyr::collect() |> dplyr::mutate(geometry = wk::wkb(geometry))
# A tibble: 2,587 × 1
geometry
<wk_wkb>
1 <LINESTRING (-175.3083 -21.12098, -175.3094 -21.12427, -175.3098 -21.12571, …
2 <LINESTRING (-175.2667 -21.14462, -175.2673 -21.14619, -175.2681 -21.14822, …
3 <LINESTRING (-175.2686 -21.12686, -175.2684 -21.12997, -175.2692 -21.13471, …
I don't think any of the spatial stuff belongs here, unless an import of wk is welcome ... I suggest return the binary as-is, or as {blob}. For sf itself it has For general read via GDAL, I would look at the vector support in {gdalraster} and (we can do it!) work on a lazy vctrs form for the OGR pointer type, alternatively GDAL can provide geos pointers directly to {geos}. sf doesn't have any capacity for these lazy or alternative/intermediate forms for the geometry from general sources so I don't think it's a good thing to focus on always (it's well supported by conversions already). |
It seems the goal for duckplyr for spatial should aim to expose to the R user the spatial abilities of duckdb directly. The |
Cool stuff, keeping a beady eye on this conversation, thanks for keeping it rolling forward. |
This looks great! One feature request I have in mind is support (either via new functions/functionality or via documentation if it works out of the boxx) for spatial data. See this by @cboettig for inspiration: https://github.com/cboettig/duckdbfs#spatial-data
Another potential source of inspiration is
sf
's support fortidy
operations, it great howsummarise()
and other functions 'just work' with tidy verbs: https://r-spatial.github.io/sf/reference/tidyverse.htmlThe text was updated successfully, but these errors were encountered: