-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add geo-traits crate #1157
Add geo-traits crate #1157
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for driving this initiative @kylebarron! I'm excited to use these traits in my own projects.
This pull request looks mergeable to me! With the understanding it is not yet ready to be published. I want to hear from @michaelkirk and @urschrei before merging though.
In the medium term (in 2024) I would love to add 3d support to geoarrow, so I want to keep thinking about xyz geometry traits, but I'm pretty happy with these for 2d geometries. They've worked well so far in geoarrow. |
Thanks for keeping this up to date WRT to main @kylebarron! I said in #1011 (and still believe) that having some examples of algorithms that use these trait definitions would be key to evaluating them. Are there examples anywhere that utilize these traits? @frewsxcv - have you tried, or would you be willing to see how this code works in your project? |
Are the usages in geoarrow-rs sufficient? Or are you looking for something else? Also would love to hear why you find this a blocker to merging these in to the repository. |
In rgis I would like to be able to read geospatial data and perform operations on the underlying data without having to convert through intermediary representations like |
Examples of how they're used today in geoarrow-rs:
@JosiahParry has also implemented these traits in serde_esri, and I believe he's tested against geoarrow. |
Thanks for the examples @kylebarron! I'll take a look. And relevant to @frewsxcv question:
I continue to think it's reasonable that code in main is intended to be useful, and that seeing code actually used in pursuit of the problem it purports to solve seems like a reasonable bar. I look forward to looking at @kylebarron's examples to that end.
I understand that's the theoretical goal, and a worthy one. I'm asking to see if this actually works towards that goal. Care to give it a shot? Is there something that keeps you from building against this branch, for example? |
@JosiahParry - can you elaborate on what these traits enable you to do? Presumably it's useful to you or you wouldn't have done it 😉 |
Nope! There is nothing blocking me from building on-top off a branch. The point of merging the code into |
@michaelkirk Correct me if this is wrong, but it sounds like you are skeptical of the usefulness of these traits without actually saying that. Am I mind reading correctly? I feel like there's an unusual and surprising amount of resistance to merging this in. |
I feel a bit attacked! But it's true - I am skeptical of the traits code. Truly though, I am skeptical of all code. I am admittedly bit extra skeptical of the geo-traits code in that it's not just a single isolated algorithm, it's a cross cutting concern. I feel like I've said similar things before, but maybe not clearly enough. I appreciate @kylebarron patience with me.
This implementation looks good to me! You're using the trait accessors to implement a useful algorithm and it doesn't seem overly ceremonious compared to the implementation on geo-types.
Ugh, these ones on the other hand, seem a bit unfortunate. Is there a way to make these look more like the BoundingRect one? Or is it reasonable to assume that most algorithms implemented on geo-traits will need to have a separate trait definition ( I haven't yet grokked why these implementations can't look more like the BoundingRect one - something about having a dedicated struct for the algorithm (i.e. Also, am I reading it correctly that the "trait based" implementation for FrechetDistance is simply converting to a geo-types and then calling FrechetDistance on that? Like we're not actually implementing the algorithm using the trait here in any meaningful way - or am I misunderstanding? Is there something preventing a geo-types free implementation? All the conversion API's seem reasonable at first blush, but can you connect the dots for me a little bit? let line_string_1_wkb: &[u8] = "..."
let line_string_bbox = ???
let line_string_2_wkb: &[u8] = "..."
let frechet_distance = ??? |
All algorithm traits from geo need a separate implementation in geoarrow because geo's traits always return scalar objects while geoarrow's traits need to return arrays or chunked arrays. I would love to have only one algorithm trait, e.g. just
In BoundingRect I'm reimplementing the core algorithm in a native way on the traits. I do this for The copy to a
I added a test for the first one in this example PR: geoarrow/geoarrow-rs#542 // A builder for a columnar WKB arrays
let mut wkb_builder = WKBBuilder::<i32>::new();
// Add a geo polygon to the WKB array
// This uses geo-traits to serialize to WKB and adds the binary to the array
wkb_builder.push_polygon(Some(&p0()));
// Finish the builder, creating an array of logical length 1.
let wkb_arr = wkb_builder.finish();
// Access the WKB scalar at position 0
// This is a reference onto the array. At this point the WKB is just a "blob" with no other
// information.
let wkb_scalar = wkb_arr.value(0);
// This is a "parsed" WKB object. The [WKBGeometry] type is an enum over each geometry
// type. WKBGeometry itself implements GeometryTrait but we need to unpack this to a
// WKBPolygon to access the object that has the PolygonTrait impl
let wkb_object = wkb_scalar.to_wkb_object();
// This is a WKBPolygon. It's already been scanned to know where each ring starts and ends,
// so it's O(1) from this point to access any single coordinate.
let wkb_polygon = match wkb_object {
WKBGeometry::Polygon(wkb_polygon) => wkb_polygon,
_ => unreachable!(),
};
// Add this wkb object directly into the BoundingRect
let mut bounding_rect = BoundingRect::new();
bounding_rect.add_polygon(&wkb_polygon);
assert_eq!(bounding_rect.minx, -111.);
assert_eq!(bounding_rect.miny, 41.);
assert_eq!(bounding_rect.maxx, -104.);
assert_eq!(bounding_rect.maxy, 45.); Run with
|
I think part of why these implementations look the way they do is because I want to have compile-time safety across geometry types. E.g. I don't want to have (The chunked array abstraction, where you might have 10 million geometries in 100 separate chunks where each of those chunks is a contiguous buffer holding 100,000 geometries, is quite useful and is currently my unit of "automatic parallelism". I.e. a rayon But the largest area of complexity in geoarrow-rs is handling these combinations. I think it's useful for UX to be able to call an operation like |
I would quite like this! I think it would be really neat if the geo-type traits also had a Take this super derived example (i adjusted @kylebarron's geo-traits/src/line_string.rs file) https://gist.github.com/JosiahParry/f3469f8df4706a4d36f8a47db9ae3e09) trait StartEnd where Self: LineStringTrait {
fn start_end(&self) -> Self::Owned {
let start = self.coord(0).unwrap();
let end = self.coord(self.num_coords() - 1).unwrap();
Self::new(vec![start, end])
}
}
impl<T: CoordNum> StartEnd for LineString<T> {}
#[test]
fn test_start_end() {
let coords = vec![
Coord {x: 1.0, y: 2.0},
Coord {x: 3.0, y: 4.0},
Coord {x: 5.0, y: 6.0},
];
let line_string = LineString::new(coords);
let start_end = line_string.start_end();
println!("{:?}", start_end);
assert_eq!(start_end.num_coords(), 2);
}
|
I think that trait definition has the same downsides as mentioned elsewhere, where you'll need a different trait definition for each geometry type. (You'd need a I also think that it's just extra unnecessary complexity, because I argue that the implementation of the algorithm should choose the most appropriate return type, as long as the geometry trait is also implemented on that return type. That means, for any fn convert_to_my_repr(geom: &LineStringTrait) -> MyLineStringStruct {} But for other libraries, especially in the context of libraries like geoarrow where you want to optimize the vectorized execution and not the scalar execution, it's much better to defer the data transformation until you have an array of geometries. |
I need something like this for Geodesy, but obviously only need the First, given your comments, @kylebarron, about wanting to extend towards 3D, I find the name of the Both because for my TeX-eyes, x_y looks like x with subscript y, and because with 3D, 4D and/or Measure extensions, it will be too much of an eyecatcher to spot Second, I find it reasonable to extract the Also, if I understand correctly, most of the reservations expressed above would not apply to Third, as a general remark of the Georust naming conventions Geodetically speaking, coordinates are ordered, and can be referred to as 1st, 2nd, 3rd and 4th, and as you never know what is the CRS specific convention regarding (N,E) vs (E,N), any kind of implied (mis)understanding is unfortunate. I suggest to let the trait implementer provide x(), y(), z(), t(), and let the trait autoprovide first(), second(), third(), fourth() (or preferably the other way round), and then just postulate that the naming is an internal convention, not to be confused with any external conventions. Evidently, trying to extend this to the entire georust ecosystem would neither be valuable or reasonable - wherever the culture descends from an "earth-is-flat-as-my-monitor" computer graphics world, x and y are obviously the more useful convention. But in a EDIT 2: On second thoughts, I actually think the Third item can be ignored: It doesn't matter how the "this is the internal convention" is communicated, and first()...fourth() is probably a unnecessary complication, that impedes communication. Fourth, I have implemented an extended version of As you can see, I took the liberty to extend to 4D+M, but probably not in a very elegant way. Also, I need the I would find it awesome to be able to implement a tighter integration between Geodesy and the Georust universe. And a somewhat geodesy-aware I know, however, that I am stomping on your grass, but please consider my suggestions - and I would be very happy for any comments regarding how I am wrong, and how I could work further towards a Georust<->Geodesy bridge. |
I'm interested in multi-dimensional geometry support, and think that handling it in traits could be a good way to incrementally add support. But I think it would probably be ideal if the dimension could be a generic, so that you'd statically know what type of coordinate you have, and which type of operations could be done on it. I'm happy to switch
geo has had some discussion about whether to merge Point and Coord, and so it's not 100% clear that we do want both |
That would require a good deal more genericity than just the dimension. In PROJ, we check the pipelines upon construction, such that the output type of step That's why I entirely dropped the "different coordinate types" in Rust Geodesy, and only support different dimensionalities: It just doesn't make sense - coordinates are interpreted by operators which expect a specific input type, and the gamut of possible types is infinite. No reason to try to make the type system reject having me feeding a utm zone 32 coordinate into a pipeline expecting some other type of projected coordinate (or even geographical or geocentric cartesians): The output becomes garbage, but its consistent garbage :-) and there are just too many kinds of parametrizations, each of which would lead to principally different gamuts for valid operations. So it's a tough job! |
@kylebarron My current, and much improved version, takes off from a much less modified version of your original pub trait CoordTrait {
type T: CoordNum;
const DIMENSION: usize;
const MEASURE: bool;
/// Accessors for the coordinate tuple components
fn x(&self) -> Self::T;
fn y(&self) -> Self::T;
fn z(&self) -> Self::T;
fn t(&self) -> Self::T;
fn m(&self) -> Self::T;
/// Returns a tuple that contains the two first components of the coord.
fn x_y(&self) -> (Self::T, Self::T) {
(self.x(), self.y())
}
} |
I had been thinking more along the lines of the proposal in this PR, where the struct is generic over a In particular, when Or maybe the better way of looking at this is that |
Essentially, I believe it is up to the person implementing the trait for a concrete data type to decide: It's your trait, but their data! In Geodesy, I use NaN for t and 0 for z, because that fits well with geodetical reasoning: If you have no height information, your coordinate is probably assumed to be placed directly on the ellipsoid, whereas if you have no time coordinate, but accidentally call a time dependent operation on the coordinate, you actually want the NaN value to spill out all over your coordinate, to indicate its invalidity ("stomping on it with the x-large NaN boots"). In a sense, NaN is the IEEE 754 equivalent of |
In general, I disagree because I think it's important to have a well-specified data contract for these traits. If
In this case I think it would be better to have |
I appreciate the detailed discussion and the unique perspectives each of you brings to the table—it’s clear that everyone’s expertise has been invaluable in shaping this proposal. I understand that it’s frustrating not getting everything perfect the first time, but I’m concerned we might be overthinking this for a 0.1 release. The current implementation is strong and would provide a lot of value, even if it doesn’t cover every use case perfectly. I suggest we proceed with merging now and open a follow-up ticket to continue discussing and refining this further. What do you all think? I’m happy to assist with any follow-up work after the initial release. |
I'm ok with merging but I'd also be happy to update with this latest proposal now while we're talking about it. I'd like to get to it later this afternoon |
Thanks for trying to move things forward @frewsxcv. I appreciate that the perfect can be the enemy of the good and that we can iterate. A long time has passed since this PR opened, but really not that much time has passed since the major design decision to let points be empty was proposed, so I think we're making progress, not just moving laterally. I'd be interested (excited even?) to implement some On the other hand, if we were to merge Maybe I have my blinders on, but I think that we're really close to having something useful for geo here. |
Did we decide whether |
Litmus test: Since that's valid, I think that implies edit: Wait, is that valid? I don't actually know. Let me look at the spec... |
Updated georust/wkt#123 for commit |
I think my litmus test was valid, but I didn't actually read the test results. 😆 From what I can tell a MULTIPOINT cannot contain an empty point? |
I'm working on updating geoarrow-rs to this latest commit in geoarrow/geoarrow-rs#839. So far I really like the clarity of knowing a |
regarding: FWIW shapely explodes when you try to put an empty point into a MultiPoint:
So that's just one (albeit important) implementation, but if it's true in general that a MultiPoint cannot contain a I don't really have time to dig into the implications of this further, but I am OK with either approach. Unlike the "should we have a As an example: whereas I think that:
behave not entirely identically, but pretty similarly. |
I agree, it feels like there's a lot less at stake, and feels a lot easier to change down the road. |
FWIW the same limitation exists in the R ecosystem’s {sf} package. pnt <- st_point()
st_multipoint(list(pnt)) this returns an error complaining that the point is not numeric |
Should we update this |
Dealers choice - maybe someone else has an opinion. |
/// Access the n'th (0-based) element of the CoordinateTuple. | ||
/// May panic if n >= DIMENSION. | ||
/// See also [`nth()`](Self::nth). | ||
fn nth_unchecked(&self, n: usize) -> Self::T; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I first read this, I thought this would behave similarly to Vec#get_unchecked
which is an unsafe
method which does not do a bounds check, hence it being marked as unsafe
. In fact, it looks like all functions named *_unchecked
Rust's standard library are marked as unsafe
, which to me implies they skip the bounds check.
My first though was, what if we had a naming convention like this:
fn nth_checked(n) -> Option<T> {}
fn nth(n) -> T {} // Panics
unsafe fn nth_unchecked(n) -> T
But that may psychologically push people to use the panic'ing version.
I don't know what the right answer is, just wanted to call out that I was surprised when implementing CoordTrait
that nth_unchecked
was not marked with unsafe
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option 🤷🏻
fn nth(n) -> Option<T> {}
fn nth_or_panic(n) -> T {} // Panics
unsafe fn nth_unchecked(n) -> T
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we also definitely don't need to have an unsafe
method, in fact probably better to leave it out for this initial implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn nth(n) -> Option<T> {}
fn nth_or_panic(n) -> T {} // Panics
That's nice. 👏
- coord.nth_unchecked(3);
+ coord.nth(3).unwrap();
We could also just leave out the panic
flavor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I didn't find it in the stdlib, apparently there's quite a bit of precedent for _or_panic
in 3rd party crates
https://github.com/search?q=or_panic+language%3ARust&type=code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open to suggestions here. I suppose it's worthwhile having a variant that is actually unsafe (e.g. doesn't even do the bounds check), for situations where you know you're in bounds (like the iterator)
Co-authored-by: Corey Farwell <[email protected]>
I'm happy with all the big structural pieces. I'm not too worried about the naming, since I feel like it should be straight forward to revisit, and easy to do proper deprecation with at any point. |
The Any other blockers before merging? If not, I can press the button tomorrow! |
Here we go! |
CHANGES.md
if knowledge of this change could be valuable to users.Updated version of #1019 that's clean against the latest
main
. Aside from having no extra diff frommain
, it also has:Sized
required bound, but means that all sources get iterators out of the box with minimal required impls.