-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
identifier optionally including time #17
Comments
I think the general idea is very powerful. I don't think we aught to target a just time, but some conventions for adding attributes to the identifier for start and end time, data version, etc. I'm not sure To sum up, don't this just for a time value, but instead a whole section on adding attributes. |
Agree, a more general scheme would be a good idea. I can also imagine using this for time ranges, like in data requests. Also maybe for identifying things like derived, synthetic or other non-data but still "channel like" objects? One advantage of following the style of http query params is that there are existing parsing systems that do not have to be reinvented. Having the system follow the Question, is this issue important enough to resolve before submission of the review for approval, or delay until a future revision? |
Probably the best approach would be to add a section describing the basics of a pattern for adding attributes to an SID at a high level and note that the WG may wish to consider refinement of the details/conventions. |
This seems like a great idea to me, but beyond my knowledge of URIs! |
I did some reading of the URN spec, and they have 3 types, r-components,
so if we want to follow the URN proposed spec, I think there is the problem of deciding if two identifiers are the same or not if they contain fragments (f-components). That said, I kind of feel like the practical use of this should be ok, and feel like a fragment with some limited rules could do this. Here is a start. I would suggest added to the Fragment identifiers A FDSN Source Identifier may contain an optional fragment identifier, as defined by RFC 3986. The start of the fragment is indicated by a hash or number sign ("#", ascii 35) character and terminated by the end of the URI. The form of the fragment within an FDSN Source Identifier is an sequence of key=value pairs, separated by the equals sign, = ascii 61, with each subsequent pair separated by an ampersand, & ascii 38 and following all character escaping rules for URIs in general. Keys may be any valid combination of characters, but keys composed of only capitol letters, ascii 65 to 90, A FDSN Source Identifier that contains a fragment identifier implies a relationship to the source identifier without the fragment, but in the absence of external information about the keys and values in the fragment, does not imply equivalence. Moreover, order of the key-value pairs within the fragment, and any meaning thereof is undefined. Equivalence between identifiers, with or without fragments, is only guarantied in the case of exact string match for the entire URI without reordering. Existence of a fragment may or may not imply numeric differences in recorded data values from the source with the fragment removed. Sources identified with fragments should respect all other rules relating to data source naming, including band, source and subsource codes. In particular, a fragment should NOT be used to create a derived source that is fundamentally different from the original source. For example the latency of a seismometer channel should NOT be formed by appending a fragment of Possible uses of the fragment could be to identify subsets of data from a source, eg by time range, to identify derived or processed versions of data from a source, or to indicate levels of quality control. Example:
could imply a data source that is somehow derived from or related to the FDSN:IU_COLA_00_B_H_Z data source. |
I was not thinking anything derived or synthetic, as that is a pretty limited way to denote data processing or generating provenance, which is a big can of worms to open. Instead I suggest that we limit this, for now, to attributes that further define, aka narrow the scope, of the data source identified. As in time range, data version, and other characteristics perhaps quality related.
Seems OK in our case if we interpret the URN-equivalence not to mean the exact same data, I don't think we can get to that point without including time and version as fundamental components anyway. So I'm thinking things like: Here's an alternate idea to include the fundamentals in the URI in a comparing way, add any "defining" characteristics to the path portion, e.g.:
Then the strict URN part uniquely identifies data. Hmm, not totally convinced we need this and it's pretty rigid. |
After sleeping on this, I am having doubts about this idea. I think your can of worms thought is right. The reason I put that in was just to be very general wrt the future uses of the fragment, and also becuase I always feel a tiny bit of guilt whenever I apply some processing step to waveform data and then write it back out using the original channel code. But you are right that the source identifier is not the right place for provenance. The time range "subset" idea feels the most natural, and so may be worth doing, but I worry about uses like in a future miniseed3 or stationxml where the time is provided elsewhere. If someone sets the channel id to be
but the header says it is data actually from 2019, then we have a problem. I suppose we could say that fragments are not allowed in places where the time is available via other means, but perhaps just not having it at all is safer. Similar issue of course if the fragment says Still not sure how I feel about this... |
You raise a good point, we wouldn't want information in these attributes that overlaps whatever is in formats where they are used (StationXML, mseed3, web service requests, etc). Which begs the question of where would these be used at all. The use case we have internally at our data center is for inter-service communication, so one service can provide a parsable "token" to another service, an identifier for some data with enough information/context needed to do some work. The example of Small detail: it was pointed out that using a |
Just my $0.02, but when embedding one URI, or really a string of any kind, in another URI, you had better do a %-escape on reserved chars or you are just asking for things to blow up in your face. So I don't buy the # is a bad char argument. That said, I, too, am struggling to find the use case that really motivates this. Unless there is one, I would defer this whole idea I think. |
related to #16 , would it be useful to allow the addition of a time to form a URI that actually uniquely identifies a source? I think in most cases this would not be used as the time would be implicit, but might be useful in some cases where identifying an actual channel is important, perhaps in forming requests or connections between channel and something else?
A separate deliminator is probably needed. Maybe use the '@' symbol and an ISO time, something like:
which would mean the BHZ active at that time? Or perhaps with the time should always be the starttime?
If #11, then networks and stations could also have the time addition, like:
Alternatively, using more http style url query params is possible, like:
This gets a bit more verbose, but also allows for other sub-identifying information in a future version of the spec.
As I said, not really sure this is needed or even a good idea at this point, but thought it was worth thinking about.
The text was updated successfully, but these errors were encountered: