Modify TriG serializer to not generate new prefixes for graph URIs #2467
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary of changes
I personally have found it very annoying that the TriG serializer will generate new prefixes for named graph URIs if there is no appropriate prefix for that graph. This is a common occurrence when compiling TriG files where the named graphs each contain an ontology, and the names of the graphs are the URIs of the ontology. Many ontology URIs are the preferred namespace without the trailing
#
or/
.For example, the W3C Data Catalog Vocabulary URI is
http://www.w3.org/ns/dcat
while the preferred prefix/namespace isPREFIX dcat: <http://www.w3.org/ns/dcat#>
. If you run the following python snippet that simulates loading the DCAT ontology into the named graphhttp://www.w3.org/ns/dcat
(or at least one triple in it):you should see the following output:
That
ns1
prefix is generally unhelpful/unexpected, especially since you're probably expecting to see the URI written out in full like in the source file. Since a new namespace is generated per graph (when the conditions are right), you can end up with a bunch of unhelpful prefixes if you load a bunch of graphs into the dataset. An example would be merging all the QUDT Turtle files into one TriG file.This PR includes modifications to two lines in the TriG parser to prevent this from happening. Consequently, running the above snippet with this PR will result in the following, which I think most people would expect.
Of course, if an appropriate namespace does already exist or one gets bound, the graph URI will still be shortened appropriately. Adding this snippet at the end:
still results in the expected output:
I'm not sure if there are any concerns around backwards compatibility or not- the content of the file doesn't really change, it's just cosmetic.
Checklist
the same change.
so maintainers can fix minor issues and keep your PR up to date.