Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unreproducible results for type values from CSV vocabularies with empty first column #227

Open
schivmeister opened this issue Nov 13, 2023 · 5 comments
Labels
bug Something isn't working

Comments

@schivmeister
Copy link

Background

RMLMapper is a tool used as part of another library (basically a wrapper around rmlmapper and other tools) by Meaningfy to aid in the mapping of OP TED notices from XML to RDF. However, since the transformation was run by the mapping team at Meaningfy in July 2023, the same results can no longer be reproduced in November 2023, despite using the same version of rmlmapper v6.1.3.

One of these potential regressions relates to the introduction, among some of the data, of properties called epo:hasBuyerLegalType and epo:hasMainActivityType, which are themselves related to the corresponding object/value data vocabularies buyer_legal_type.csv and main_activity.csv, respectively. Help is now sought to determine what the root cause for this behaviour could be.

Problem

Expected

No occurrence of epo:hasBuyerLegalType or epo:hasMainActivityType in the resulting RDF data, wherever there is no XML element mapping in the object/value reference data vocabulary (empty first column).

Actual

Occurrences of epo:hasBuyerLegalType and epo:hasMainActivityType in the resulting RDF data with unexpected values, wherever there is no XML element mapping in the object/value reference data vocabulary (empty first column).

Observations

It was later found that the issue occurs in cases where the above-cited CSV vocabulary file has an empty cell value (no XML element and therefore no mapping to be expected). Placing a hyphen - or a white space in place of the empty first cells appears to fix this. However, this is unexpected, as the previous transformation in July 2023 did not exhibit this behaviour, and there were no such occurrences. It is uncertain if this relates in any way to #140.

MWE

As the transformation involves multiple RML files/modules, and it is not useful to prepare a very minimal example without all the contextual data, a reproduction test suite (of a mostly-minimal working example) is attached with this ticket. It contains also the MWE for another potential regression #226 identified alongside this one.

mfy-rml-mwe.zip

@DylanVanAssche
Copy link
Contributor

Hi @schivmeister ,

Thanks for the detailed issue.
You mention that both executions were with the same version of the RMLMapper, I'm not sure if it is then a bug in the RMLMapper, same for #226. If the input data was different for both executions, the results are indeed not the same.
Empty values should be ignored by RMLMapper.

@schivmeister
Copy link
Author

Hi @DylanVanAssche thanks for looking! That's the thing - the input data is the same, the version is the same, but we are seeing different results! The attached MWEs show exactly this.

The expected result was the one we last generated. The MWE will produce new output that is different. So, we were wondering if you might have any clue as to what else it could be. It requires a bit of time investment in following the MWE.

@DylanVanAssche
Copy link
Contributor

I had already a look at the MWE but I fail to understand which data was the 'old' data and which is the 'new' data.
I would expect that the MWE had 2 versions then, one from July and one from November?
Maybe I missed it :)

@schivmeister
Copy link
Author

@DylanVanAssche sorry about the confusion. The file expected.ttl is the "old" output from July. The rest of the files (the XML, RMLs, CSVs and JSONs) are all the original files used to generate that output TTL.

The scope of the MWE is to generate the "new" output actual.ttl exactly from these old resources, so that the tester can follow and compare how it comes about, both with and without applying the discovered workarounds.

Given this context, let me know if the MWE then is still hard to follow. We'll attempt to minimize whatever complexity still remains.

@schivmeister
Copy link
Author

We are in the process of preparing simpler MWEs for reporting the discovered causes as new tickets. Perhaps that will allow us to better comprehend these issues, and lead the way to finding the root cause of the behaviour described here (inability to reproduce specific prior results). We will start with the potential cause identified in #226, as that is more pressing at this time.

@schivmeister schivmeister changed the title Unexpected properties with type values since v6.1.3 Unreproducible results for values from CSV vocabularies with empty first column Nov 14, 2023
@schivmeister schivmeister changed the title Unreproducible results for values from CSV vocabularies with empty first column Unreproducible results for type values from CSV vocabularies with empty first column Nov 14, 2023
@DylanVanAssche DylanVanAssche added the bug Something isn't working label Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants