Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify TriG serializer to not generate new prefixes for graph URIs #2467

Merged
merged 2 commits into from
Jul 31, 2023

Conversation

mgberg
Copy link
Contributor

@mgberg mgberg commented Jul 3, 2023

Summary of changes

I personally have found it very annoying that the TriG serializer will generate new prefixes for named graph URIs if there is no appropriate prefix for that graph. This is a common occurrence when compiling TriG files where the named graphs each contain an ontology, and the names of the graphs are the URIs of the ontology. Many ontology URIs are the preferred namespace without the trailing # or /.

For example, the W3C Data Catalog Vocabulary URI is http://www.w3.org/ns/dcat while the preferred prefix/namespace is PREFIX dcat: <http://www.w3.org/ns/dcat#>. If you run the following python snippet that simulates loading the DCAT ontology into the named graph http://www.w3.org/ns/dcat (or at least one triple in it):

from rdflib import Dataset, URIRef
from rdflib.namespace import RDF, OWL

DCAT_URI = URIRef("http://www.w3.org/ns/dcat")

ds = Dataset()
g = ds.graph(DCAT_URI)
g.add((DCAT_URI, RDF.type, OWL.Ontology))

print(ds.serialize(format="trig"))

you should see the following output:

@prefix ns1: <http://www.w3.org/ns/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .

ns1:dcat {
    ns1:dcat a owl:Ontology .
}

That ns1 prefix is generally unhelpful/unexpected, especially since you're probably expecting to see the URI written out in full like in the source file. Since a new namespace is generated per graph (when the conditions are right), you can end up with a bunch of unhelpful prefixes if you load a bunch of graphs into the dataset. An example would be merging all the QUDT Turtle files into one TriG file.

This PR includes modifications to two lines in the TriG parser to prevent this from happening. Consequently, running the above snippet with this PR will result in the following, which I think most people would expect.

@prefix owl: <http://www.w3.org/2002/07/owl#> .

<http://www.w3.org/ns/dcat> {
    <http://www.w3.org/ns/dcat> a owl:Ontology .
}

Of course, if an appropriate namespace does already exist or one gets bound, the graph URI will still be shortened appropriately. Adding this snippet at the end:

ds.bind("MANUAL_NS", "http://www.w3.org/ns/")
print(ds.serialize(format="trig"))

still results in the expected output:

@prefix MANUAL_NS: <http://www.w3.org/ns/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .

MANUAL_NS:dcat {
    MANUAL_NS:dcat a owl:Ontology .
}

I'm not sure if there are any concerns around backwards compatibility or not- the content of the file doesn't really change, it's just cosmetic.

Checklist

  • Checked that there aren't other open pull requests for
    the same change.
  • Checked that all tests and type checking passes.
  • Considered adding additional documentation.
  • Considered granting push permissions to the PR branch,
    so maintainers can fix minor issues and keep your PR up to date.

@aucampia
Copy link
Member

aucampia commented Jul 3, 2023

@mgberg thanks for the PR, I don't have any backwards compatibility concerns with this because as you said it is just a cosmetic difference, but it is somewhat a matter of preference, and that is not that easy to gauge. I somewhat prefer the behaviour you are changing it to with this PR, but it may be good to reach out on Twitter to see if there are any concerns.

I will wait a while and see if there is any objection before merging.

I think long term we need to make serializers more easily customizable and then provide flags to control this behaviour, but right now adding custom flags for this may not be ideal.

@mgberg
Copy link
Contributor Author

mgberg commented Jul 3, 2023

I think long term we need to make serializers more easily customizable and then provide flags to control this behaviour, but right now adding custom flags for this may not be ideal.

I also think making the serializers more customizable would be a good feature.

…ct namespaces to be generated for graph URIs
@coveralls
Copy link

Coverage Status

coverage: 90.9% (+0.009%) from 90.891% when pulling 83d3fb2 on corning-incorporated:trig-serializer-graph-uris into 8582691 on RDFLib:main.

@aucampia aucampia requested a review from a team July 11, 2023 20:38
@aucampia aucampia added review wanted This indicates that the PR is ready for review ready to merge The PR will be merged soon if no further feedback is provided. labels Jul 11, 2023
@aucampia aucampia merged commit bd797ac into RDFLib:main Jul 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready to merge The PR will be merged soon if no further feedback is provided. review wanted This indicates that the PR is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants