Implementation notes #113

DylanVanAssche · 2024-03-05T10:02:59Z

When fixing RML test-cases across modules, we noticed that most modules need some additional notes regarding implementation details. The modules describe properly what something is and how it looks like. However, implementations do not know how to use it. Examples of implementation details we can describe in a separate note:

Error codes to generate when a mapping is invalid, data has errors, etc.
How reference formulations can be processed
Natural Data Type mapping
...

CC: @pmaria @chrdebru

Actions:

Create a separate repository for this Note
Add Respec for the Note
Publish Note and add to Portal

Let us know what you think!

pmaria · 2024-03-05T10:07:18Z

+1
IMO we need a note per reference formulation, or at least grouped per type like SQL or SPARQL.
This can than also include the specification of the natural (datatype) mapping and other details that are reference formulation specific.

chrdebru · 2024-03-07T20:49:34Z

I support this, yes.

DylanVanAssche · 2024-03-08T14:52:10Z

kg-construct/rml-io#41 is again such a specific thing, looks more to me like an implementation note than an actual test-case?

chrdebru · 2024-03-09T13:26:21Z

I agree. Or only rely on Postgres for that test case

andimou · 2024-03-17T10:19:22Z

Given that we are a CG, everythingn is a draft. That being said, I do not disagree with specifying in more details a few reference formulations but these can be just examples of how potential reference formulations may look like.

pmaria · 2024-03-19T07:35:29Z

Given that we are a CG, everythingn is a draft. That being said, I do not disagree with specifying in more details a few reference formulations but these can be just examples of how potential reference formulations may look like.

I don't agree that the description of reference formulations should be just examples. I think that clearly defining the reference formulations is essential. In r2rml the only reference formulation was SQL, and r2rml defined several aspects of it.

How to generate rows to be mapped
How to access column values
A natural mapping of values

These aspects should be clearly defined for any reference formulation that is introduced for RML. IMHO it would therefor be best to have a note per reference formulation where these can be described.

andimou · 2024-03-19T10:22:42Z

Well, we can decide with the entire CG if we agree on limiting the Reference Formulations that RML can accept.

In my opinion and how RML was designed so far: RML deliberately left the Reference Formulations unspecified so anyone can define its own Reference Formulation. In this sense, if we specify now some Reference Formulations, it should be examples of such Reference Formulations and RML should not be restricted to these Reference Formulations. One should be able to define any Reference Formulation desired.

pmaria · 2024-03-19T11:16:44Z

Well, we can decide with the entire CG if we agree on limiting the Reference Formulations that RML can accept.

I think there is a bit of a misunderstanding. There is non intention to limit the accepted reference formulations. Only to clearly define those that are already mentioned in the specs and test cases, and for which we have definitions in the ontology.

In my opinion and how RML was designed so far: RML deliberately left the Reference Formulations unspecified so anyone can define its own Reference Formulation. In this sense, if we specify now some Reference Formulations, it should be examples of such Reference Formulations and RML should not be restricted to these Reference Formulations. One should be able to define any Reference Formulation desired.

Agreed. This issue has no intention to change that. The intention of this issue is to define what needs to be defined and described for any reference formulation to be properly handled, and to have a place where we put those definitions. Otherwise we risk that every implementation handles the same reference formulation in a different way.

Since we already have several reference formulations that are broadly in use, the proposal is to define each of these as a note.
Any other new reference formulations that gain broad usage could later on also be added as a separate note at a location that we as a community deem fit.

DylanVanAssche · 2024-03-19T11:46:08Z

he intention of this issue is to define what needs to be defined and described for any reference formulation to be properly handled, and to have a place where we put those definitions.

My 2 cents here: we already 'enforce' specific behavior for reference formulations we use in the spec, ontology, and test cases. In the test cases we already have 'defined' what happens with a certain source + reference formulation implicitly. This note / notes is more to make this implicitly thing more explicit so developers do not have to read other implementations and interpret the output of each test case to know how a given reference formulation behaves.

andimou · 2024-03-19T13:47:00Z

Only to clearly define those that are already mentioned in the specs and test cases

These are indeed examples of reference formulations. We can include a few but we cannot produce Notes. From the W3C types of documents: "A W3C Draft Note is a document produced by a W3C Working Group, a W3C Interest Group, the Advisory Board (AB), or the W3C Technical Architecture Group (TAG)."

As a CG we publish a report. If specific Reference Formulations come in the report and we use this report for the WG, then the Reference Formulations will become part of the candidate recommendation. Even if we include them as notes, then these are notes for the RML-IO and not RML-core as RML-core is independent of reference formulations.

@DylanVanAssche how do we do that? The test cases in RML-core are independent of reference formulations. In RML-core we are in a situation where we have already retrieved the data and we have key-value pairs with which we deal. There is (or should not be) anything in RML-core that is reference-formulation-dependent.

DylanVanAssche · 2024-03-19T14:05:54Z

These are indeed examples of reference formulations. We can include a few but we cannot produce Notes. From the W3C types of documents: "A W3C Draft Note is a document produced by a W3C Working Group, a W3C Interest Group, the Advisory Board (AB), or the W3C Technical Architecture Group (TAG)."

Note is maybe not the right wording given the W3C definitions. With 'note' here is meant that it could be a document with examples and how a reference formulation is supposed to work. See it as a set of guidelines for implementations. Not a hard requirement they MUST follow, but more like a SHOULD as seen from good practice. The same assumptions about the reference formulations documented in there are also made for the output of the test cases.

The test cases in RML-core are independent of reference formulations.

That's definitely not the case currently, we depend on CSV (more abstract: tabular) there (if we move away all other data formats as proposed in an issue) but we still require implementations to interpret a CSV reference formulation as going over each row to correctly generate the triples/quads. If RML-Core was truly independent, no Logical Source may appear there in the test cases, but then the test cases are no integration tests as they are now. We cannot use an 'abstract' reference in RML-Core's test cases because at this point it is always tight to some data source defined by the Logical Source. At this point, this 'iterate over each row' is implicitly defined through test cases that assume such behavior. In R2RML this is hard defined in the spec:

Each logical table is mapped to RDF using a triples map. The triples map is a rule that maps each row in the logical table to a number of RDF triples

So this is actually mentioned for R2RML implementations, but not RML implementations. R2RML implementations now know they have to follow a row-based iteration model for RDBs as it is clearly mentioned in the spec. Where we put our 'guidelines' on this matter is of course a point of discussion.

In RML-core we are in a situation where we have already retrieved the data and we have key-value pairs with which we deal. There is (or should not be) anything in RML-core that is reference-formulation-dependent.

That's what RML-Core is supposed to be yes, but the test cases do not reflect this. How to improve this is a hard question as the references in rml:reference and rml:template always depend on the reference formulation. Regarding the key-value pairs, that's RML Field. The latter could be the abstracted reference formulation decoupling RML-Core completely. However, that requires Fields into Core.

pmaria · 2024-03-19T15:25:21Z

These are indeed examples of reference formulations. We can include a few but we cannot produce Notes. From the W3C types of documents: "A W3C Draft Note is a document produced by a W3C Working Group, a W3C Interest Group, the Advisory Board (AB), or the W3C Technical Architecture Group (TAG)."

Note is maybe not the right wording given the W3C definitions. With 'note' here is meant that it could be a document with examples and how a reference formulation is supposed to work. See it as a set of guidelines for implementations. Not a hard requirement they MUST follow, but more like a SHOULD as seen from good practice. The same assumptions about the reference formulations documented in there are also made for the output of the test cases.

IMO these descriptions will be more than a SHOULD, so then maybe it should these should also be reports and the W3C process can decide how to label it later on.

As for the test cases:

I see the test cases as functional tests, not as unit tests.
With functional tests it is quite normal to add dependencies that are not part of the module under test.

My proposal would be to:

use JSONPath as the reference formulation for all core tests (JSONPath in order to have examples of multiple values).
introduce tests for specific reference formulations together with the report on that reference formulation.
These tests can focus on the specifics of the reference formulation, like correct natural mapping of datatypes and such.

DylanVanAssche · 2024-03-19T18:02:37Z

IMO these descriptions will be more than a SHOULD, so then maybe it should these should also be reports and the W3C process can decide how to label it later on.

Yeah you can also do more 'enforcement' here, I first want to have a proper agreement on the rest before deciding the level of enforcement.

I see the test cases as functional tests, not as unit tests.

That's possible as well, but that contradicts with 'key-value' pairs IMO.
If the reference formulations are properly defined in some kind of document, it fixes a lot for the test cases in Core because the document clearly says then how to iterate over data.
We just need to decide on this ;)

use JSONPath as the reference formulation for all core tests (JSONPath in order to have examples of multiple values).

I prefer a tabular data source here for Core because it aligns better with R2RML making the transition less difficult. If we want adoption from R2RML implementations, Core should be as easy as possible to implement.

introduce tests for specific reference formulations together with the report on that reference formulation.
These tests can focus on the specifics of the reference formulation, like correct natural mapping of datatypes and such.

+1

SimonBin · 2024-05-14T08:09:01Z

just a comment, we also noticed that TC0002a-JSON fails because the "expected" result is not respecting natural data type mapping

dachafra · 2024-07-05T08:20:34Z

If the reference formulations are properly defined in some kind of document, it fixes a lot for the test cases in Core because the document clearly says then how to iterate over data.
We just need to decide on this ;)

It's done right? @DylanVanAssche, any action here?

DylanVanAssche · 2024-07-05T10:46:01Z

It's done right? @DylanVanAssche, any action here?

rml-io-registry repo is created but the actual task of creating these documents did not happen yet.

dachafra · 2024-07-05T11:26:27Z

but then is this a rml-io-registry issue?

DylanVanAssche · 2024-07-05T11:27:48Z

Yes, but that didn't exist back then when the issue was created. This is one of the issues that triggered the creation of the registry.

DylanVanAssche added help wanted Extra attention is needed question Further information is requested labels Mar 5, 2024

This was referenced Mar 11, 2024

LogicalTarget serialization, compression, and encoding domains kg-construct/rml-io#44

Closed

Does SPARQL TSV Results make sense? kg-construct/rml-io#48

Open

rml:Target: add more properties (write mode) kg-construct/rml-io#49

Closed

pmaria mentioned this issue Mar 27, 2024

datatype inference kg-construct/rml-io#63

Open

dachafra closed this as completed Jul 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation notes #113

Implementation notes #113

DylanVanAssche commented Mar 5, 2024 •

edited

Loading

pmaria commented Mar 5, 2024 •

edited

Loading

chrdebru commented Mar 7, 2024

DylanVanAssche commented Mar 8, 2024

chrdebru commented Mar 9, 2024 •

edited

Loading

andimou commented Mar 17, 2024

pmaria commented Mar 19, 2024

andimou commented Mar 19, 2024

pmaria commented Mar 19, 2024

DylanVanAssche commented Mar 19, 2024

andimou commented Mar 19, 2024

DylanVanAssche commented Mar 19, 2024

pmaria commented Mar 19, 2024

DylanVanAssche commented Mar 19, 2024

SimonBin commented May 14, 2024

dachafra commented Jul 5, 2024

DylanVanAssche commented Jul 5, 2024

dachafra commented Jul 5, 2024

DylanVanAssche commented Jul 5, 2024

Implementation notes #113

Implementation notes #113

Comments

DylanVanAssche commented Mar 5, 2024 • edited Loading

pmaria commented Mar 5, 2024 • edited Loading

chrdebru commented Mar 7, 2024

DylanVanAssche commented Mar 8, 2024

chrdebru commented Mar 9, 2024 • edited Loading

andimou commented Mar 17, 2024

pmaria commented Mar 19, 2024

andimou commented Mar 19, 2024

pmaria commented Mar 19, 2024

DylanVanAssche commented Mar 19, 2024

andimou commented Mar 19, 2024

DylanVanAssche commented Mar 19, 2024

pmaria commented Mar 19, 2024

DylanVanAssche commented Mar 19, 2024

SimonBin commented May 14, 2024

dachafra commented Jul 5, 2024

DylanVanAssche commented Jul 5, 2024

dachafra commented Jul 5, 2024

DylanVanAssche commented Jul 5, 2024

DylanVanAssche commented Mar 5, 2024 •

edited

Loading

pmaria commented Mar 5, 2024 •

edited

Loading

chrdebru commented Mar 9, 2024 •

edited

Loading