-
Notifications
You must be signed in to change notification settings - Fork 556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rdfstar parsers and serializers update #2115
base: main
Are you sure you want to change the base?
Conversation
@nicholascar all the new stuff from https://github.com/RDFLib/rdflib-rdfstar |
run tests with python test_serializer_trigstar.py/python test_serializer_turtlestar.py/python test_serializer_ntriplesstar.py in rdflib and run tests with pytest test_parser_turtlestar.py/pytest test_parser_trigstar.py/pytest test_parser_ntriplesstar.py in rdflib/test |
the parser will go through a preprocessing stage to transfer the rdfstar with reification to let the old parser understand, because of the new syntax "{|" annotation in rdfstar is hard to expand with old parser, then instead of adding the statement with blanknode, I edited the older parser so that it will created a RdfstarTriple obejct for the <<:s :p :o>> quoted triple if any, then only index the 1 statement, this implementation is also compatiable with rdf documents because any part without new syntax will not be changed in the preprocessing stage |
@nicholascar so compare to the way of treating it as syntax sugar, we reduce the number of statements stored in graph, but in order to save the memory space, so that we don't need to store the whole nested rdflib.term.rdfstartriple object, we need a better support in store and memory class like store the rdfstartriple( Tuple(:b3f5e185425e8b59dc2f2c4178794ea6RdfstarTriple, s, p, o)) and treat the rdfstarTriple object just like the BNode which only has an id, but the way of adding support and edit the store and memory class needs huge amount of work so I may put it in the future work |
While this is amiable work, I'm not sure merging this until RDF-star is a REC is wise. There are still open issues on RDF-star itself. Also, I see uses of Furthermore, one of the open questions AFAIK is still whether RDF-star should turn triples into nodes in the core RDF model itself. I believe that this PR does just that (by adding (I do know RDFLib already has e.g. Notation 3 support with variables (and formulae?), while that is still not a standard. One of the things W3C must do here is to make a decision on whether these drafts are in conflict or complementary. I recommend to proceed with caution in RDFLib to keep some check on the complexities that can arise.) Also, are we to replace all parsers in RDFLib with Lark-generated ones, or use a mix of them? (There's been some discussion on that in other issues.) (For the record I'm still in favour of a non-monolithic RDFLib with pluggable parsers and serializers, so people can choose which specs and implementations to use.) |
Hi @niklasl
Hi, this is a new ontology we designed to identify whether a reification is refer to a quoted/asserted/both statement when further processing in the old parser ont = """ rdfstar:QutotedStatement rdfstar:AssertedStatement because rewrite whole parser with lark parser is too difficult and take too much time so I decide to preprocess all rdf-star syntax into rdf style and further edit the old parser
I think GraphDB/rdf4j did that as well https://rdf4j.org/documentation/programming/rdfstar/ for exmaple we have a turtle file it will firstly be parsed into a standard reification
I don't think it is complete correct me if I am wrong |
How is it incomplete in your view? |
@gjhiggins I looked into the formulae design before I started, I couldn't figure out how it is working. I did find there is an option in store to add quoted triple/statement but it is set to False as default, and need to specify the context if you want to set it to True?(I don't know what it means and how it works), and I don't know if it is regarding to rdfstar extension. Or it is for other use |
Ah okay, then I can confirm that i) It's not incomplete and ii) it woud be inappropriate to (ab)use the QuotedGraph for RDF Star reification purposes, its intended use is for representing rules expressed in N3 syntax: LOG_NS = Namespace("http://www.w3.org/2000/10/swap/log#")
rulegraph = """@prefix : <urn:example:> .
@prefix log: <http://www.w3.org/2000/10/swap/log#> .
:bob :hasmother :alice .
:john :hasmother :alice .
@forAll :x, :y, :z .
{ :x :hasmother :z . :y :hasmother :z } log:implies { :x :issibling :y } .
"""
def test_formula():
ds = Dataset()
g = ds.parse( data=rulegraph, format="n3")
rule_statement = list(g.triples((None, LOG_NS.implies, None)))[0]
assert (
repr(rule_statement)
== "(<Graph identifier=_:Formula2 (<class 'rdflib.graph.QuotedGraph'>)>, "
+ "rdflib.term.URIRef('http://www.w3.org/2000/10/swap/log#implies'), "
+ "<Graph identifier=_:Formula3 (<class 'rdflib.graph.QuotedGraph'>)>)"
)
quoted_graph_sub_obj = list(g.subject_objects(predicate=LOG_NS.implies))
assert (
str(quoted_graph_sub_obj)
== "[("
+ "<Graph identifier=_:Formula2 (<class 'rdflib.graph.QuotedGraph'>)>, "
+ "<Graph identifier=_:Formula3 (<class 'rdflib.graph.QuotedGraph'>)>"
+ ")]"
)
formula2, formula3 = quoted_graph_sub_obj[0]
assert str(sorted(list(formula2))) == (
"["
"(rdflib.term.Variable('urn:example:x'), "
"rdflib.term.URIRef('urn:example:hasmother'), "
"rdflib.term.Variable('urn:example:z')), "
"(rdflib.term.Variable('urn:example:y'), "
"rdflib.term.URIRef('urn:example:hasmother'), "
"rdflib.term.Variable('urn:example:z'))"
"]"
)
assert (
str(sorted(list(formula3)))
== "[("
+ "rdflib.term.Variable('urn:example:x'), "
+ "rdflib.term.URIRef('urn:example:issibling'), "
+ "rdflib.term.Variable('urn:example:y')"
+ ")]"
) |
A possible compromise here is to add RDFStarTriple as provisional. On the one hand, I am hesitant to commit to a public interface prematurely, but on the other hand, I would rather take sub-optimal RDFStar support than no RDFStar support, and making the API for it provisional kind of gets us out of this bind.
For the parsers that are not based on existing formats Lark presents a quite attractive offering, in that it is easier to deal with than hand written parsers, and that it can generate standalone parsers that don't depend on third party libraries. I'm fine with using a mix of them though, but we should try and avoid leaking any internals so we can switch out the implementations if need be.
At the moment resources for maintenance is still quite sparse. If we had a good cookiecutter template integrated with cruft that could make it easier to manage more repos, but even if we had that there would still be a lot of challenges keeping everything working and aligned. If we do things right, and make usage of any third party libraries optional (i.e. by using extras), then the only real downside I see here is size, and at least to me I think the size of RDFLib is still quite manageable. Our wheel is less than 500 KB big, and a significant chunk of that (>33%) is from our pre-compiled namespaces. |
@alooba1 thanks for the PR, I will have a more thorough look at it over the weekend, but do try and look at the CI failures, if you are unsure about their causes just ask and we will try and help, also just a reminder that there is a gitter/matrix chat and I will be happy to provide assistance to you there also. |
fwiw, the import for a Lark LALR standalone parser ( |
Nevertheless, if the work that I've done (implementing a suite of Lark-based parsers for RDFLib¹) ever sees the light of day, that's probably the route I'll be choosing for making that work available. Same goes for the SQLite-based stores (one SQL, one key-value) and an RDNA serializer. ¹ I was interested in assessing the degree of any performance penalty that might accrue from a switch to Lark-based parsers and the only truly reliable way to investigate that was to implement the parsers. |
You should make a draft PR for this or at least drop the link to the branch or commits somewhere, having it as a baseline to look at will be helpful to others I think. |
ok I will start fixing the CI failures, there is still a problem for some nested annotation I am trying to fix, I think will be fixed today/tomorrow |
@alooba1 I believe I understand. It is an interesting approach, and it would be good to discuss its design with the RDF-star CG. How can I get back the triple itself from the "tagged" bnode ID if the reified form isn't sent to storage? I do wonder if this multiple passes approach is an unnecessary complication though. Using Lark, this should be simple enough to do in one sweep (see the Turtle parser in Pymantic for an example of that). A separate utility function could then be used if someone wanted to convert between RDF and RDF-star, following the CG report. Of course, by using Lark here, I believe it would be wise to adapt the regular Turtle and TriG parsers to this approach, rather than mixing them up. Regarding that, there seems to be a lot of code duplication in this PR. The contents of Note that with RDF-star, all triples are to be possible to use as subjects and objects (asserted or not). Is this addressed in ths PR, or is the Have you considered the relative merits of defining e.g. |
@aucampia What would the concrete use case be for adding possibly sub-optimal support which results in objects with an unstable interface? How would users of RDFLib be able to use that without depending on the resulting types (particularly when using type annotations)? If this is to be added mostly to support conversion between fortmats, note that there is only one other syntax with pending support for this extension to RDF, and that is JSON-LD-star. That isn't complete either (though I hope it will be in sync with the RDF-star REC timeline). I would suggest to keep this in a branch (for further cleanup) and tracking the progress of RDF-star towards REC (which might entail little to no changes if all features of it are universally approved). Note that RDF-star has gotten some criticism for adding complexity (and possibly at odds with named graphs/datasets, and not being aligned with Notation 3, for better or worse). I can see its value though (as the old form of RDF reification have left most of us wanting; and RDF-star addresses that, to some extent). It does turn RDF graphs into hypergraphs though, and you cannot easily go back from that new level of complexity. Having implemented parsing and serializing of TriG-star in both LDTR (in JS, mainly for teaching and visualization of RDF) and TRLD (in Python), I know that adding the star-syntax to existing TriG parsers/serializers does not have to be that hard, even with a handwritten parser. The hard part is stabilizing and agreeing upon the semantics and underlying data model. So I wouldn't rush it out into a stable library release before its thoroughly vetted. The effects on the RDF community at large is the real concern here. Also, since maintainance time is scarse, I believe screening for quality and code reuse is essential to keep the RDFLib code base in a decent state. Copying and pasting will eventually make it unmanageable. I know you do a lot of screening and quality improvements, and I am humbled by your continued efforts. I just want to stress that adding this in a sub-optimal form right now might increase the maintenance burden. (Side note: some of the precompiled namespaces should, IMHO, not be a part of RDFLib core. I'd put those in another package, installable for those who need it.) |
@gjhiggins I do think reworking most/all parsers to use Lark could be beneficial. Of course, it needs to be meticulously done, but following the specs is much easier by using the official EBNF, so it would probably raise the quality of RDFLib parsers. I did some simple benchmarking, and both Pymantic and TRLD parse Turtle about ten times faster than RDFLib right now (that could be more due to the internal data model (or namespace handling) than the actual string wrangling though). Properly wiring Turtle/TriG/-star EBNF into RDFLib graphs ought to reduce the code a lot too (following the lead of Pymantic, and attributing them for the approach if we're looking at their code). I would be interested in having a go at replacing RDFLib parsers with Lark-based ones (well, time permitted). As @aucampia says, would you like to publish your efforts, or collaborate on this in some other way? Of course, the results have to be thoroughly assessed for compatibility, speed and ease of maintenance. We could then incorporate things from this PR, unless @alooba1 has the time to also join into this Lark readaptation. As I commented above, I believe clearly defining how to extend RDFLib to the hypergraph space of RDF-star is the hard part, and that would go much deeper than mere surface syntaxes. Unless doing clever tricks to keep triples-as-subjects as mere BNodes with special ID:s is doable; albeit perhaps at odds with the RDF-star ambitions. (You could even consider URI-encoding NTriples to keep them as "just" RDF identifiers. That is a hack of course; flattening the hyperghraph into opaque triples again, making it unnavigable "below" the triple level.) |
I'm in complete agreement with this, I don't want to accept sub-optimal code, and I don't want duplication in human maintained code. I'm fine with duplication in computer maintained (i.e. computer generated) code though, but even there my preference would be to reduce it as much as possible. I can see some things in this PR which does look like manual duplication as it contains copyright from Tim Burners Lee, and I think the code in this PR needs some work, but I will see what I can do to get it to a quality that I'm happy with before approving. Regarding my suggestion of adding the RDFStar support as provisional, Provisional should not be taken to mean sub-optimal, it should only be taken to imply that it does not carry the guarantees of our public interfaces. It does not mean the code that implements it can or should be lower quality. The interface may be sub-optimal, or incomplete, but it should be as optimal and complete as it can be given the situation. In this case, because there are still open questions about RDFStar, there are still open questions about what the final interface should look like, and even if there were no open questions about RDFStar there would still be open questions about what constitutes an optimal interface I think.
To me the main benefit of merging the RDFStar support as provisional over not merging it is that it allows us to see what does and doesn't works in terms of an interface, and it makes it possible for people to try out RDFStar. If we see that the interface for RDFStar has no flaws, we promote the provisional interface to a stable interface, and then all code that was using the provisional interface keeps working. Most of the current interfaces of RDFLib is not optimal, and making optimal interfaces is extremely difficult, having an interface that is as optimal as we can conceive but does not carry the guarantees of our public interface gives as an opportunity to adjust it if necessary, but also allows us to finalize it with no negative impact to users of it if we find that it is suitable.
They would depend on the interface, they just won't have the guarantees associated with the rest of our public interface.
Given given the quality of the code is sufficient, and the interface is as good as we can make it right now, I would much rather merge the code into the main branch as provisional than keep it in a feature branch because I think the most likely outcome of that is that the feature branch will become defunct. If it is in main branch then any changes to the main branch has to include accommodations for it. I do realize though that the quality of the code as it stands is possibly not quite sufficent, and the interface could be improved, but as I said I will try and work to get the PR where it needs to be before approving. Another thing to keep in mind is that we do plan to do a massive overhaul of the core interfaces of RDFLib. This will be a lengthy process, and it is quite unlikely that the RDFStar support will look similar after the overhault, but we should consider RDFStar as part of this redesign I think, it does add complexity, but to some extent I hope that RDFStar does gain widespread adoption because I think it the value it gives is commensurate to the complexity it adds. And to me having some RDFStar support with the current core interfaces makes it easier to design new core interfaces that are even better suited to RDFStar. |
just as I stated in the previous sections, cuz I preprocess the input sequence to a level that old parser can understand, and then I use the old parser and further edit the old parser so that it will index the reification in a way rdfstar does (not just adding blank nodes) so the code is using part of n3 parser, maybe I can do a import for the parts that I never touched or? |
@niklasl the s, p, o that a RdfstarTriple contains is recorded in the triple (nestedly) and sent to storage, as I discussed above, it store some duplicate information in memory if multiple times the RdfstarTriple is used. So it is better to have a support of graph.rdfstartriple((hashvalueoftriple, s, p, o)). So that this statement (rdfhash s p o) represents the nested relationship like BNode one and the RdfstarTriple becomes only an id. Then the relationships won't be memorized multiple times even if the RdfstartTriple is reused in the document.
yeah, it is not that efficient to do a preprocess before parsing, I noticed that part of code in pymantic maybe we can shift to pure lark in the future
its covered, thats why we need the asserted triple and quoted triple tag, so the quoted triple will be stored in the parser's dictionary and used as subject/object when necessary, I think the rdfstar test cases do cover all the possible combinations there
thats kind of time consuming, since I haven't fully looked into the memory class and store class's structure, maybe put it as a future work |
@alooba1 to get rid of the pre-commit.ci failures you can write a comment "pre-commit.ci autofix", you can also install pre-commit and run You can also use the devcontainer to get the same thing done: docker-compose run --rm devcontainer task validate:fix And if you use But alternatively you can also use tox which is what the CI pipeline runs: # run tox for all environments (this will take very long and run for python 3.7 -> 3.10)
tox
# run tox for only one specific environment, like python 3.7
tox -e py37 |
@@ -0,0 +1 @@ | |||
<http://example/a> << <http://example/s> <http://example/p> <http://example/o> >> <http://example/z> . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These files should be relocated to test/data/
and test/data/fetcher.py
should be updated to fetch it.
For more info see test/data/README.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to move it and add it to fetcher.py if you are fine with it.
|
||
g = Graph() | ||
g.parse("test/turtlestar-evaluation/turtle-star-eval-quoted-annotation-2.ttl", format = "ttls") | ||
print(g.serialize(format = "ttlstar")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should rather use logging
than print as logging
can be controlled with the --log-cli-level
/--log-level
pytest flag.
g = Graph() | ||
g.parse(data="test/turtlestar-evaluation/turtle-star-eval-01.ttl", format = "ttls") | ||
print(g.serialize(format = "ttlstar")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test should not be bare in the file, it should be inside test functions so that pytest correctly reports on it.
Further, this should ideally be using a parameterized test, most likely the right thing to do here is to use test/test_roundtrip.py
Probably a good baseline is to look at the n3 suite roundtrips being done:
Lines 490 to 504 in 3a41821
@pytest.mark.parametrize( | |
"checker, args", | |
make_cases( | |
N3_W3C_SUITE_FILES, | |
formats={"n3"}, | |
# NOTE: Isomomorphic check does not work on Quoted Graphs | |
checks={Check.SET_EQUALS_WITHOUT_BLANKS}, | |
graph_type=Graph, | |
same_public_id=True, | |
), | |
) | |
def test_n3_suite( | |
checker: Callable[[str, str, Path], None], args: Tuple[str, str, Path] | |
): | |
checker(*args) |
g.parse("test/turtlestar-evaluation/turtle-star-eval-02.ttl", format = "ttls") | ||
print(g.serialize(format = "ttlstar")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be best to attempt some manner of equivalence checking here. Blank nodes may complicate matters, but we have GraphHelper.assert_sets_equals
which could be adapted to deal with RDFStar nodes, the best would be to use bnode_handling = BNodeHandling.COLLAPSE
.
We should actually change GraphHelper.assert_sets_equals
to compare size also now that I think about it.
test/test_parser_ntriplesstar.py
Outdated
register( | ||
"ntstar", | ||
Parser, | ||
"rdflib.plugins.parsers.ntriples-star", | ||
"NtriplesStarParser", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this (and similar in other parser tests) needed here if you are already have
Lines 267 to 284 in e0d5746
register( | |
"ntstar", | |
Serializer, | |
"rdflib.plugins.serializers.ntriples-star", | |
"NtriplesStarSerializer" | |
) | |
register( | |
"ttlstar", | |
Serializer, | |
"rdflib.plugins.serializers.turtlestar", | |
"TurtlestarSerializer" | |
) | |
register( | |
"trigstar", | |
Serializer, | |
"rdflib.plugins.serializers.trigstar", | |
"TrigstarSerializer" | |
) |
"NtriplesStarParser", | ||
) | ||
|
||
# tests should be past |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be using the N-Triples-star test manifest [ref] for this together with our manifest processor [ref]
For an example see:
rdflib/test/test_w3c_spec/test_nt_w3c.py
Lines 85 to 96 in 3a41821
@pytest.mark.parametrize( | |
["manifest_entry"], | |
params_from_sources( | |
MAPPER, | |
ManifestEntry, | |
LOCAL_BASE_DIR / "manifest.ttl", | |
mark_dict=MARK_DICT, | |
report_prefix="rdflib_w3c_ntriples", | |
), | |
) | |
def test_entry(manifest_entry: ManifestEntry) -> None: | |
check_entry(manifest_entry) |
@@ -0,0 +1,96 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file (and other test files which essentially run the test suites) should be in test/test_w3c_spec/
.
@@ -0,0 +1 @@ | |||
<http://example/a> << <http://example/s> <http://example/p> <http://example/o> >> <http://example/z> . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like you have the same files in test/rdf-star/turtle-star/
and test/turtle-star
@alooba1 as far as I can tell you are not using LARK in standalone mode as you are importing lark, is there a reason for this? If you do this we will have to update the RDFLib dependencies to include lark and this is not entirely ideal. Also, can you include the grammar files you used as inputs to lark? It would also be best to add the steps generate the code the code in And optimally the generated code should be untouched unless there is a very good reason that it must be hand edited, and then I would prefer that such changes are in the form of a unified diff patch so we can reapply them when we regenerate the code, but even just having somewhat of a separation between generated and written code is a good start. |
@aucampia thanks for your advice, I will try fix these issues asap, but since I'm working on this as a student research project and it is ending at the end of October, I need to implement the sparql-star first as much as possible and then write the thesis first and after that, ideally in October I will start polish my code, I know they are very bad style wise |
I'll work towards it. The work's not in a suitable state for exposure as yet, I'm fairly deep in the weeds, as I'll explain later. I'd like to reiterate, my interest was/is in assessing the degree of the performance penalty, if any, that accrues from using Lark-based parsers. I based my work on pymantic's implementation of Lark-based turtle/ntriples/nquads parsers ... but ... because I was focussed on assessing performance, I retained quite a bit of the pymantic support code so's I could get on with the job without having to do a complete port to RDFLib first. To wit, I rewired pymantic's BaseParser to use RDFLib objects and then bound a slightly-modified version of their Turtle Transformer class to the Lark parser instantiation (as one is enabled to do and advised “Applies the transformer to every parse tree (equivalent to applying it after the parse, but faster”)) ... def unpack_predicateobjectlist(subject, pol):
# ...
class TurtleTransformer(BaseParser, Transformer):
# ...
turtle_lark = Lark(
open(
os.path.abspath(
os.path.join(os.path.dirname(__file__), "grammar", "turtle.lark")
),
"r",
).read(),
parser="lalr",
transformer=TurtleTransformer(),
debug=False,
# _plugins=lark_cython.plugins,
)
class LarkTurtleParser(rdflib.Parser):
format = None
def parse(self, string_or_stream, graph=None, base="", **kwargs):
if hasattr(string_or_stream, "readline"):
string = string_or_stream.read()
else:
# Presume string.
string = string_or_stream
if isinstance(string_or_stream, bytes):
string = string_or_stream.decode("utf-8")
else:
string = string_or_stream
# TODO: stringify the remaining input sources
if isinstance(string, FileInputSource):
string = string.file.read().decode()
elif isinstance(string, StringInputSource):
string = string.getCharacterStream().read()
tf = turtle_lark.options.transformer
tf.base_iri = base
tf.preserve_bnode_ids = kwargs.get("preserve_bnode_ids", False)
if graph is None:
graph = tf._make_graph()
tf._prepare_parse(graph)
statements = list(turtle_lark.parse(string))
tf._cleanup_parse()
return graph The above is near-boilerplate for the rest of the RDF syntax formats, as even a casual inspection of the three pymantic RDF parsers will reveal. The approach results in an RDF Parser class that conforms to the current RDFLib API and can be registered via the plugin interface. The thing is, there are tradeoffs to consider. The tradeoff comments in the documentation of Lark's extensive set of Visitor/Transformer classes provokes some thoughtful consideration and I've only explored a few options as yet. Of note in the above illustration is that the returned statements are ignored, this is because the graph is accessible in the Transformer (via the BaseParser inheritance) and I chose to bang the triples into the def triples(self, children):
if len(children) == 2:
subject = children[0]
for triple in unpack_predicate_object_list(subject, children[1]):
self._call_state.graph.add(triple)
yield triple
elif len(children) == 1:
for triple_or_node in children[0]:
if isinstance(triple_or_node, rdflib.graph.Triple):
self._call_state.graph.add(
(
triple_or_node.subject,
triple_or_node.predicate,
triple_or_node.object,
)
)
yield triple_or_node vs the original pymantic code ... def triples(self, children):
if len(children) == 2:
subject = children[0]
for triple in unpack_predicate_object_list(subject, children[1]):
yield triple
elif len(children) == 1:
for triple_or_node in children[0]:
if isinstance(triple_or_node, Triple):
yield triple_or_node And the parser returns a populated RDFLib Graph, as per standard. I should stress that I've really only just scratched the surface, there's a fair amount of work still to be done in terms of working through the memory use/speed tradeoff combos as well as investigating the opportunities for generalising the various Transformer implementations. |
If you are open to it I would be willing to address some of these issues in your branch, and if there is some limits in which this is acceptable to you please set out adequate constraints. |
Sure, I am open to all advices. When I start writing these codes I didn't think too deep, I just want the functionalities to work first and then start polishing my codes. now I can think about performances / styles, actually I could avoid most of the global variables/improve the time complexities, but the time for the project is too tight for me |
@aucampia I'm actually very worried about my serializer, because I didn't actually understand how the old one work, the getQName thing and it will be very nice if you know there are documents about the old serializer or give me some advice about my implementation, I basically rewirte the whole thing and there are a lot of redundancies at the moment, also currently the serializer is just designed to pass the tests so the turtle-star get parse in will be serialized into ntriples and trig-star will be serialized into nquads |
The extent of the documentation about serializers is really this - if you have specific questions do ask though. It may be worth considering to base your serializer on https://github.com/RDFLib/rdflib/blob/main/rdflib/plugins/serializers/turtle.py#L39 - but I'm not that sure about it.
If is valid turtle-star (which ntriples-star should be) and valid trig-star (which nquads-start should be) it is technically correct™ , which is the best kind of correct 😄 - however, I would somewhat favour us just having a ntriples-star and nquads-star serializer if that is all they really are. We can potentially alias them to turtle-star and trig-star, but I can't really think of any compelling reason to do that. If we have ntriples-star and nquads-star serializers then can do some round tripping which would already be very useful.
The best advice I have here is to see where you can reuse, maybe try collect your code into something for nqauds-star and then just see where you can branch the logic to do the same as ntriples-star. |
Hmm. I wonder if there's a need to add an attribute declaring a Graph as an RDF-star graph with selective referential transparency enabled? According to 6.4.5 Selective referential transparency ...
By default, quoted RDF-star statements are referentially opaque, the above allows for selective referential transparency to be declared for certain predicates. I'm wondering about the situation where statements in RDF-star graphs G¹ and G² include the same predicate p where p is declared as referentially transparent in G¹ but not in G² and the consequent impact on set-theoretic operations such as G¹ ∪ G². Or (as is likely the case) is my understanding flawed? |
I support this too. Using branches within demo code, as opposed to development, is a pain and this is a demo of capability. The main purpose of this work is to provide another RDF-star implementation in readiness for the moves to standardise RDF-star. RDFLib is an RDF / Linked Data implementation testbed, as much as it is a stable, mature library. We have needed a major piece of work to push RDF-star support in RDFLib ahead as we have had several failed starts previously. If the final RDF-star standard is different to the provisional specification we have now, all other implementations (GraphDB, Jena, RDF4J etc.) will all have to change too, but anyway, the changes are hardly going to be completely revolutionary, only small updates to this base. I'll be meeting with @alooba1 shortly (a few hours) and we'll try and focus on what code quality updates and replies to the comments above can be made ASAP. I think his effort on this work will conclude in about a week, so we will need to carry on which where he gets to. I wanted Lark to be used for it's pretty clear to me that a Lark implementation of our RDF parsers is the way to go. It's going to be either the same or better in performance and the use of common. modern, Python will make parser maintenance much, much easier. |
As invited ... |
@gjhiggins the pure lark based parser is amazing, I looked into the experimental lark and found that some nested annotation and quoted triple tests are WIP, I wired my preprocessing class for expanding annotations to pure quoted triple formats into your current work and now they are able to pass all tests, including all the nested ones, Although it is not the most efficient way.. |
also @aucampia I cleaned up the turtle-star and trig-star parser a bit, only kept those functions I overided. I am still working on sparql-star atm, simple select insert delete are not supported but on another branch, I think I may need to start a new draft pr for the sparql-star once it is completed? |
Glad you found it useful. All the credit should go to Nicholas Pilon and Gavin Carothers, it's nearly all their code, slightly adapted for RDFLib.
It doesn't impact your RDF Star work but fwiw, there's also some work left to do on the basic Lark Turtle parser. The N3 folk introduced a couple of new tests of the Turtle grammar's |
Hi, First of all, thank you for all your contributions :) I would like to assign information on triples and what I'm doing now is as follow:
If I could directly create RDF-star by RDFLib, it would be really great! |
Here's an informal status update regarding the RDF-star WG chartered to define RDF 1.2. (I'm a member, but I write here as myself; i.e. not representing the group as a whole): We will have a charter extension until august 2026 (basically with one year for finalizing, one year for maintenance). There's been quite some discussion and some agreed upon changes. This will reasonably affect when this PR (and/or subsequent ones) can be merged into something releasable. As things stand (debates are ongoing but this is the currently agreed baseline), RDF 1.2 will have triple terms; which are to be used with regular resources relating to those with a new predicate, currently named (Again, this is my interpretation and summary, and not an official one from the WG.) I will try to keep an eye on the work here and possibly assist in it. I want to help solidify an addition to RDFLib for what will be standardized. The WG will also define tests for N-triples, Turtle and SPARQL which this addition would reasonably need to pass. |
Would be really nice if this is merged! |
Summary of changes
Checklist
the same change.
./examples
for new features.CHANGELOG.md
).so maintainers can fix minor issues and keep your PR up to date.