GH#2256 Introduce a query optimizer concept #2257

JervenBolleman · 2023-03-10T21:54:51Z

First implementation is to make it more likely that VALUES is used on the left side of a join.

Added a test, but the optimizer is not invoked by default at this time

Summary of changes

Introduced a way to add new generic query optimizers that work on the query algebra.

Checklist

Checked that there aren't other open pull requests for
the same change.
Added tests for any changes that have a runtime impact.
Checked that all tests and type checking passes.
For changes that have a potential impact on users of this project:
- Updated relevant documentation to avoid inaccuracies.
- Considered adding additional documentation.
- Considered adding an example in ./examples for new features.
- Considered updating our changelog (CHANGELOG.md).
Considered granting push permissions to the PR branch,
so maintainers can fix minor issues and keep your PR up to date.

First implementation is to make it more likely that VALUES is used on the left side of a join. Added a test, but the optimizer is not invoked by default at this time

aucampia · 2023-03-10T22:31:25Z

rdflib/plugins/sparql/processor.py

@@ -63,8 +64,9 @@ def update(self, strOrQuery, initBindings={}, initNs={}):


 class SPARQLProcessor(Processor):
-    def __init__(self, graph):
+    def __init__(self, graph, optimizers: List[SPARQLOptimizer] = None):


I think it would be best to just accept any callable that maps a query to a query:

So somewhere before this:

_QueryTranslatorType = Callable[[Query],Query]

Then:

Suggested change

def __init__(self, graph, optimizers: List[SPARQLOptimizer] = None):

def __init__(self, graph, query_translators: Optional[List[_QueryTranslatorType]] = None):

That way, users can pass methods or free functions, and even have multiple different translator methods on one class.

Your tests should still work fine with that.

aucampia · 2023-03-10T22:34:39Z

rdflib/plugins/sparql/optimizer.py

+        return query
+
+
+class ValuesToTheLeftOfTheJoin(SPARQLOptimizer):


While this is valuable, I think it may be better to keep it in rdflib._contrib, as we don't necessarily want to offer the same level of compatibility guarantees as we do for other code.

aucampia · 2023-03-10T22:36:32Z

rdflib/plugins/sparql/optimizer.py

+
+class ValuesToTheLeftOfTheJoin(SPARQLOptimizer):
+
+    def optimize(self, query: Query) -> Query:


Suggested change

def optimize(self, query: Query) -> Query:

@classmethod

def optimize(cls, query: Query) -> Query:

As these methods don't use the class state and are side effect free it is best to make them class methods, that way it is clearer to users that they don't have to be concerned with concurrency issues.

aucampia · 2023-03-10T22:36:52Z

rdflib/plugins/sparql/optimizer.py

+        query.algebra = self._optimize_node(main)
+        return query
+
+    def _optimize_node(self, cv: Any) -> Any:


Suggested change

def _optimize_node(self, cv: Any) -> Any:

@classmethod

def _optimize_node(cls, cv: Any) -> Any:

Big doubt about the _.contrib location.

…-optimizers

aucampia · 2023-04-09T10:11:02Z

@JervenBolleman I made a PR against your branch with some changes: JervenBolleman#5

One problem with taking translators as an argument to SPARQLProcessor.__init__ is that it is a bit awkward to call it, it may be better to take translators as an argument to SPARQLProcessor.query(), that way someone can just pass translators when calling Graph.query().

There may be reasons to not do this also, and keep it in the constructor, but I'm not sure that there is.

CC: @niklasl @westurner @RDFLib/core-reviewers

JervenBolleman · 2023-04-09T20:05:11Z

@aucampia thank you for helping with this.

The reason why I did it via the constructor, is that stores normally know how to optimize queries the best. And if you just want to modify a query before sending it for evaluation, there is no need for the store to have any knowledge of the modification, or do it for the user. My natural habitat being RDF4j I think that the approach of query optimizer pipelines is natural, and they are store constructor based.

westurner · 2023-04-11T00:52:51Z

If this is a config file parameter, constructor args may be easier than passing those through for each .query() I don't know whether it makes a difference for testing whether constructor args create a default SPARQLProcessor/Optimizer on a public attr or the/*every* store just has extra private methods and args/kwarg?

…

On Sun, Apr 9, 2023, 4:05 PM JervenBolleman ***@***.***> wrote: @aucampia <https://github.com/aucampia> thank you for helping with this. The reason why I did it via the constructor, is that stores normally know how to optimize queries the best. And if you just want to modify a query before sending it for evaluation, there is no need for the store to have any knowledge of the modification, or do it for the user. My natural habitat being RDF4j I think that the approach of query optimizer pipelines is natural, and they are store constructor based. — Reply to this email directly, view it on GitHub <#2257 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAMNS5LS57KU3BRZHR2Z4LXAMJADANCNFSM6AAAAAAVW7RLYA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

aucampia · 2023-04-12T20:35:29Z

I will have to double-check how exactly this will be called then by passing it to the store, using just graph with the default store, however, the current approach requires this as far as I can tell:

https://github.com/JervenBolleman/rdflib/blob/4dfaaec2b599748360613f58bd6734da225ec7c5/test/test_sparql/test_translators.py#L65-L94

def test_graph_query(rdfs_graph: Graph):
    requested_query_string = """
    PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
    SELECT ?x {
        ?x rdfs:label "subClassOf".
    }
    """

    translated_query_string = """
    PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
    SELECT ?x {
        ?x rdfs:label "subPropertyOf".
    }
    """

    requested_query = _prepare_query(requested_query_string)
    translated_query = _prepare_query(translated_query_string)

    translate = Mock(return_value=translated_query)

    processor = SPARQLProcessor(rdfs_graph, translators=[translate])
    result = rdfs_graph.query(requested_query, processor)
    rows = []
    for row in result:
        assert isinstance(row, ResultRow)
        rows.append(row.asdict())

    assert rows == [{"x": RDFS.subPropertyOf}]

    translate.assert_called_once_with(requested_query)

It may be worth adding an example of how users should be using this, and also adding a tests for the idiomatic end-to-end usage.

@JervenBolleman when you have a moment do check JervenBolleman#5 - happy to update it if there is something you want different.

westurner · 2023-04-12T21:13:12Z

To not have to pass the processor as a positional arg on every query,:

result = rdfs_graph.query(requested_query, processor)

(If it isn't already) couldn't it be:

Store.__init__(*, processor:SPARQLProcessor=None)

store = Store(, processor=processor)
# ...
result = rdfs_graph.query(requested_query)

aucampia · 2023-04-12T21:33:09Z

To not have to pass the processor as a positional arg on every query,:
result = rdfs_graph.query(requested_query, processor)
(If it isn't already) couldn't it be:
Store.__init__(*, processor:SPARQLProcessor=None)

store = Store(, processor=processor)
# ...
result = rdfs_graph.query(requested_query)

Will check this weekend if this will work or can be made to work.

aucampia · 2023-05-21T18:39:25Z

PRs to V6 is closed until further notice. See this for more details:

RDFLib V6 and V7 #2395

aucampia · 2023-05-22T22:21:53Z

PRs to V6 is closed until further notice. See this for more details:

RDFLib V6 and V7 #2395

We will be open for PRs again once this is resolved:

docs: add guidelines for breaking changes #2402

GH#2256 Introduce a query optimizer concept

66a266a

First implementation is to make it more likely that VALUES is used on the left side of a join. Added a test, but the optimizer is not invoked by default at this time

aucampia reviewed Mar 10, 2023

View reviewed changes

cthoyt mentioned this pull request Mar 12, 2023

Implement an identifier mapping service biopragmatics/curies#41

Merged

JervenBolleman and others added 2 commits March 17, 2023 11:46

GH#2256 Apply suggested changes regarding the query translator concept.

bf71d4a

Big doubt about the _.contrib location.

Merge remote-tracking branch 'upstream/main' into RDFLibGH-2256-query…

94ede34

…-optimizers

aucampia added the on hold Progress on this issue is blocked by something. label May 21, 2023

aucampia removed the on hold Progress on this issue is blocked by something. label Jun 2, 2023

Merge branch 'main' into RDFLibGH-2256-query-optimizers

78e9d90

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH#2256 Introduce a query optimizer concept #2257

GH#2256 Introduce a query optimizer concept #2257

JervenBolleman commented Mar 10, 2023

aucampia Mar 10, 2023 •

edited

Loading

aucampia Mar 10, 2023

aucampia Mar 10, 2023

aucampia Mar 10, 2023

aucampia Mar 10, 2023

aucampia commented Apr 9, 2023

JervenBolleman commented Apr 9, 2023

westurner commented Apr 11, 2023 via email

aucampia commented Apr 12, 2023

westurner commented Apr 12, 2023

aucampia commented Apr 12, 2023

aucampia commented May 21, 2023

aucampia commented May 22, 2023

	def __init__(self, graph, optimizers: List[SPARQLOptimizer] = None):
	def __init__(self, graph, query_translators: Optional[List[_QueryTranslatorType]] = None):

		return query


		class ValuesToTheLeftOfTheJoin(SPARQLOptimizer):


		class ValuesToTheLeftOfTheJoin(SPARQLOptimizer):

		def optimize(self, query: Query) -> Query:

	def optimize(self, query: Query) -> Query:
	@classmethod
	def optimize(cls, query: Query) -> Query:

	def _optimize_node(self, cv: Any) -> Any:
	@classmethod
	def _optimize_node(cls, cv: Any) -> Any:

GH#2256 Introduce a query optimizer concept #2257

Are you sure you want to change the base?

GH#2256 Introduce a query optimizer concept #2257

Conversation

JervenBolleman commented Mar 10, 2023

Summary of changes

Checklist

aucampia Mar 10, 2023 • edited Loading

Choose a reason for hiding this comment

aucampia Mar 10, 2023

Choose a reason for hiding this comment

aucampia Mar 10, 2023

Choose a reason for hiding this comment

aucampia Mar 10, 2023

Choose a reason for hiding this comment

aucampia Mar 10, 2023

Choose a reason for hiding this comment

aucampia commented Apr 9, 2023

JervenBolleman commented Apr 9, 2023

westurner commented Apr 11, 2023 via email

aucampia commented Apr 12, 2023

westurner commented Apr 12, 2023

aucampia commented Apr 12, 2023

aucampia commented May 21, 2023

aucampia commented May 22, 2023

aucampia Mar 10, 2023 •

edited

Loading