Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new xtypes: range, hms, dms, uri, uuid #24

Merged
merged 9 commits into from
Jul 5, 2023
Merged

Conversation

pdowler
Copy link
Collaborator

@pdowler pdowler commented May 29, 2023

removed multipolygon from shape
added range to shape for SIA and SODA compat

removed multipolygon from shape
added range to shape for SIA and SODA compat
This was referenced May 29, 2023
@pdowler pdowler changed the title new xtypes: range, hms, dms new xtypes: range, hms, dms, uri, uuid May 29, 2023
@pdowler pdowler marked this pull request as draft May 29, 2023 20:33
@pdowler
Copy link
Collaborator Author

pdowler commented May 29, 2023

still need to get the RFC 4122 citation included

This was referenced May 29, 2023
DALI.tex Outdated Show resolved Hide resolved
DALI.tex Outdated Show resolved Hide resolved
DALI.tex Outdated
Comment on lines 764 to 768
values and seconds is a real value. All hours must fall within [0,23], degrees
must fall within [0,359], minutes must fall within [0,59], and seconds
must fall within [0,59]; the minimum values are thus 0:0:0 (hms and dms) and the maximum
values are ~23:59:59.9 (hms) and ~359:59:59.9 (dms) (decimal values may have
additional precision).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The limits certainly need some changes: degrees in the range [-90,90] at least have to be permitted. But I don't know whether we should be a bit less prescriptive in general: do people use negative hours in hms sometimes? or dms for longitude values?

Copy link
Collaborator Author

@pdowler pdowler May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, dms should go from -90:0:0 to 90:0:0; good catch.

In other coordinates we limited the range of valid values so that providers were responsible for range reduction and I haven't heard any complaints about that, but here it's more markup for existing widespread use so I'm a lot less sure,

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realised that in the other coordinate validity specs we just said [-90,90] and [0,360] without worrying to much about the endpoint wrap around (which could have been [0,360) if we wanted to be really careful). Do you think the more relaxed spec like [0,360] is enough?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since as you say this is mostly about marking up existing content I'd incline to being more permissive here. But we don't have [0,360], we have [-90,90], and both extrema have to be allowed in that case anyway.

Another thing: I suggest also clarifying that an optional explicit "+" sign on dms is allowed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional "+" at the front, because "-" is also a possibility? and we don't need that for hms because they are only positive anyway

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added optional + for dms

DALI.tex Outdated Show resolved Hide resolved
@@ -800,6 +816,36 @@ \subsubsection{Circle}
of a table), but specific services may define something that is applicable in a
more limited context.

\subsubsection{Range}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no discussion in this section of ranges that span the antimeridian. I think something should be said: either you can't do that using range, or you can do it by providing reversed longitude max/min values (or in some other way).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, SIA and SODA also do not discuss this. with the text as written, I would interpret that
359 1 -1 1 spans the meridian and is small (2x2 deg) and technically 1 359 -1 1 would be a large band 358 deg x 2 deg... that's what you meant by "reversed longitude"? I've rewritten it this way.

that probably means I have to be careful with using the word "interval" since 359,1 would not be a valid interval (it wasn't intended to be, but I'll fix that anyway).

The same thing kind of comes up in polygons, where they look pretty odd to the eye in this case

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think the rewrite is OK.

With reference to polygons, STC 1.33 sec 4.5.1.4 resolves this by saying

In order to avoid ambiguities in direction, vertices need to be less than 180° apart in both coordinates.

Should we explicitly defer to STC concerning polygons? But this is probably wider than DALI - see Alberto's recent Interop talk.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent was that we were referring to the STC definition in it's entirety, but the text was too brief and gave a different impression. That's more or less a bug... erratum?

Now that STC-1.33 is really old and more or less obsoleted by Coords and Meas, I wonder if we should fork the complete definition and include it in DALI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that STC-1.33 is really old and more or less obsoleted by Coords and Meas, I wonder if we should fork the complete definition and include it in DALI.

You'd have my vote for it.

For Range (if we want it), I think the only sane convention is to let RA be in (-360,360] or in [0, 720] or something comparable. Otherwise, you'll get insane trying to define the behaviour along the stiching line.

The value is a polymorphic shape made up of a type label (equivalent to an existing simple
geometric xtype and the string serialisation of the value as described above.

The allowed shapes are: \verb|circle|, \verb|range|, \verb|polygon|. For example:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multipolygon has been removed in this commit. Is the intention that a shape can contain multiple elements (in which case multipolygon can be represented by several polygon entries - but that possibility is not currently documented here) or are we deciding that shape should not be multipolygon-capable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, shape contains exactly one label and associated numbers. The primary use case is the POS param of SIA (+DAP) and SODA; that does not require multipolygon.

The secondary use case would be values in ObsCore s_region column and as the string value in the ADQL region() function (to replace non-normative adql:REGION from TAP-1.0). In that context including multipolygon in shape is potentially useful.

The concern that Markus raised is that input to a service (upload tables, POS param) could include "shape" and if that includes multipolygon that is more work. I personally don't think it's a huge amount of extra work once you accept shape. Do you think adding it to shape in a later version is going to be painful? We need but don't have a way for services to list xtypes they understand (for input), so right now we would have to rely on good error messages in which case changing the definition of shape wouldn't be much worse... but if we did have a way to list supported xtypes then changing the definition of shape would be a subtle way to break stuff. Still more to discuss; I might post this topic to the DAL list for wider audience.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That secondary s_region case was what I had in mind. Services like ESO TAP_OBS have more complicated s_regions, e.g. UNION(POLYGON..POLYGON...). But I suppose this is not intended to be a drop-in replacement for ADQL:region.

I don't think that later addition of e.g. multipolygon would be particularly painful, but if people are going to want it, it may be better to include it from the start. However since I'm not involved in either implementing parsers or authoring regions, I don't have a strong opinion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think shape is intended to standardise the minimal part of adql region that are justified by use cases. I'd like to go as far as having ADQL-2.1 (already in RFC) say that the region() function takes a DALI shape as the string argument so that adql region from TAP-1.0 can finally go away. Not exactly a drop-in replacement because we decided that including coordsys in the literal values was a bad idea (it should be field metadata and query writers should do the work, like they have to do with units in every other numeric column), but still a replacement.

If there are use cases for multipolygon being included in shape, then that would justify including it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two additional questions here:

  1. coming back on what @mbtaylor said, how ESO would be able to describe their complicated s_regions (e.g. UNION(POLYGON..POLYGON...))? With a polygon? A multipolygon? It seems not very trivial to do. Maybe @almicol has an idea?
  2. Would it be possible to add moc in the list of allowed shapes? I think it would be a useful way to describe the region covered by an observation in ObsCore.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may require an update to ObsCore to permit it, but the idea would be that ObsCore implementations could declare the most appropriate xtype for the s_region column, probably allowing any of the defined geometric types (circle, polygon, shape, multipolygon). Although allowed, range is generally not a good description of observation bounds so I doubt it would get much use (as a column type or within shape columns) but I'd probably not disallow it.

With the current state of the PR, ESO would have to use multipolygon (and simple polygons are valid multipolygons). We (via CAOM model) have circles and polygons so would probably use shape, but I'll note that the polygons are generally outer simple polygons and we also have a more detailed multipolygon that sometimes differs from the simple outer polygon... the latter is in a different column so we'd have a column with xtype="multipolygon" in CAOM TAP services. We would map ObsCore.s_region to the shape column.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for shape also including moc, I have not implemented that and so while it sounds logical I can't say what's involved in doing a good job of that. I do think moc and other shapes are for different purposes so it's not clear if there is a compelling use case at this time.

If shape did include moc then an upload table could include circles, polygons, and mocs and a service would be trying to put those all into a column in the database to support spatial queries... that sounds kind of tricky to me and I don't know how I'd do that. I think it also likely that someone who doesn't support moc would have "partial shape support" and since shape is also the xtype to describe the SIA and SODA POS param that would probably introduce confusion and pain. So I would not want to add moc to shape right now and we'll have to figure out if/how to add things to such a polymorphic xtype and what that means. We need polymorphism, but it's hard and we shouldn't go overboard :-)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possibly an aside:

At the top of this thread, @mbtaylor pointed out that this PR removes multipolygon from the list of shapes. This was to lessen the burden of implementing table upload (shape column) and to avoid subtly extending the POS param in SIA and SODA (if/when someone uses xtype="shape" for POS). I have implemented a queryable shape column in the CAOM database so I know how to di it and make it work, but I haven't added support to our tap upload yet.

However, I will point out that if shape did include multipolygon, then ESO could use xtype="shape" for their ObsCore.s_region and not have to chose between multipolygon and polymorphism. I don't know how they would implement that with the spatial support in their database or whether they also need the polymorphism (for some circles), but we do need both polymorphism and multipolygon in CAOM and the model could have been simpler if shape included multipolygon directly. In detail, there are pros and cons to both choices.

also added some clarification to xtype=polygon about implicit closing of loop
@pdowler pdowler marked this pull request as ready for review May 31, 2023 18:41
@msdemlei
Copy link
Contributor

msdemlei commented Jun 7, 2023 via email

@pdowler
Copy link
Collaborator Author

pdowler commented Jun 7, 2023

use case for shape as input:

SIA and SODA POS param.

ADQL REGION function string arg should refer to DALI shape.

People can upload table of shapes now in TAP, which would just be a string, but once xtype="shape" is defined in DALI they have some expectation that it will behave like a geometry (in query). Presumably we have to add something to one or more registry extensions so services can say which xtypes they support (or require specific error messages for unsupported input data types)... so that applies to TAP at a minimum.

It was always the plan to add table upload support to SIA (now DAP), where one could have a column of geometries and I expect refer to the column with a query param (details TBD). The POS param is xtype="shape" (as currently specified in this PR and with the addition of range) so that would immediately lead to an expectation of upload tables with a shape column in there. Right now, with multiple occurrences of POS param, implementations can plausibly just do that inline in the SQL but the whole point of upload table would be to scale up the list such that inline wasn't the right solution (my opinion).

In general, not everyone uses postgresql and pgsphere so we can't hang everything on moc (or any other specific type) and we have to allow service implementers to say "that bit is to hard to do in my org-mandated db server". Maybe we should specify a standard string to put in error messages, eg

unsupported-xtype: multipolygon [optional extra detail here]

so clients can reasonably figure out what to look for.

@almicol
Copy link

almicol commented Jun 7, 2023 via email

@msdemlei
Copy link
Contributor

msdemlei commented Jun 13, 2023 via email

@pdowler
Copy link
Collaborator Author

pdowler commented Jun 13, 2023

I think this is now "done" and completes what we need for DALI-1.2

The extra section about unsupported-xtype error messages indicates the way I envision implementers (client and service) deal with any xtype(s) we might add in future or how we might change xtype="shape" by adding more subtypes. I could see that happening, and maybe soon if @almicol's ESO use cases require that we add multipolygon now rather than later. Basically, anyone should be able to round-trip values even when they don't grok the xtype (that was always true) and only have to fail with this error message when the usage requires it.

I would like to get this to RFC soon as I think the ADQL would benefit from saying that the string value in the REGION function (constructor) are specified by DALI "shape". I still think we can add that in ADQL-2.1 with minimal delay. Right now, REGION has no normative spec so this can finally fix that with minimal actual change in usage.

@msdemlei
Copy link
Contributor

msdemlei commented Jun 16, 2023 via email

@pdowler
Copy link
Collaborator Author

pdowler commented Jun 16, 2023

Well, there is nothing in DALI that says anything about how column types and values with xtypes are handled through an input + output cycle. That would be up to a service to specify (TAP, DAP, etc). Here we just define them.

I haven't thought through all the pros and cons, but my very initial instinct is that TAP could potentially have more specific requirements than a more abstract protocol like DAP (SIA), if the latter were to support tabular input. There are details of this sort to work out for WD-TAP-1.2 (because TAP_UPLOAD and the proposed user-created tables do mostly the same work and can use the same code) and that's the place to figure out these details.

@pdowler
Copy link
Collaborator Author

pdowler commented Jul 4, 2023

Hi all,

It has been silent here but I would like to merge this and have the WD published to the doc repo so we can move forward toward RFC. I have services that will make use of most of the new xtypes that I'll tweak to do so once the WD is published... probably the same is true for other early implementers so we need to take that step and move on to wider community feedback.

If someone could check the technical stuff (build, references, etc) and merhe, that would be great.

Copy link
Contributor

@msdemlei msdemlei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reserve the right to quibble about the various geometry types (and in particular being allowed to turn them into MOCs when round-tripping) during RFC, but that discussion perhaps isn't for a github issue in the first place; I'd much rather have it somewhere where we can control its preservation for posterity.

So, you have my blessing.


right ascension: \verb|datatype="char"| \verb|arraysize="*"| \verb|xtype="hms"|

declination: \verb|datatype="char"| \verb|arraysize="*"| \verb|xtype="dms"|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be dot points? They look a bit strange in the PDF as paras

(latitude) must fall within [-90,90], minutes must fall within [0,60), and seconds
must fall within [0,60). Valid values for \verb|xtype="hms"| are from 0:0:0 to 24:0:0.
Valid values for \verb|xtype="dms"| are from -90:0:0 to 90:0:0; an optional + sign at
the start is allowed (e.g. +10:20:30) but not required. Since the upper bound on minutes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since here makes the sentence feel unfinished.

@pdowler pdowler merged commit 404882a into ivoa-std:master Jul 5, 2023
1 check passed
@pdowler
Copy link
Collaborator Author

pdowler commented Jul 5, 2023

will fix the above in a separate PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants