-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Draft][Please review] Make Paginator use simpler queries when there are no sql joins used (#8278) #11595
base: 3.3.x
Are you sure you want to change the base?
[Draft][Please review] Make Paginator use simpler queries when there are no sql joins used (#8278) #11595
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,6 +10,7 @@ | |
use Doctrine\ORM\Internal\SQLResultCasing; | ||
use Doctrine\ORM\NoResultException; | ||
use Doctrine\ORM\Query; | ||
use Doctrine\ORM\Query\Expr\Join; | ||
use Doctrine\ORM\Query\Parameter; | ||
use Doctrine\ORM\Query\Parser; | ||
use Doctrine\ORM\Query\ResultSetMapping; | ||
|
@@ -21,6 +22,7 @@ | |
use function array_map; | ||
use function array_sum; | ||
use function assert; | ||
use function count; | ||
use function is_string; | ||
|
||
/** | ||
|
@@ -38,6 +40,16 @@ | |
private readonly Query $query; | ||
private bool|null $useOutputWalkers = null; | ||
private int|null $count = null; | ||
/** | ||
* @var bool The auto-detection of queries style was added a lot later to this class, and this | ||
* class historically was by default using the more complex queries style, which means that | ||
* the simple queries style is potentially very under-tested in production systems. The purpose | ||
* of this variable is to not introduce breaking changes until an impression is developed that | ||
* the simple queries style has been battle-tested enough. | ||
*/ | ||
private bool $queryStyleAutoDetectionEnabled = false; | ||
/** @var bool|null Null means "undetermined". */ | ||
private bool|null $queryHasHavingClause = null; | ||
|
||
/** @param bool $fetchJoinCollection Whether the query joins a collection (true by default). */ | ||
public function __construct( | ||
|
@@ -51,6 +63,29 @@ | |
$this->query = $query; | ||
} | ||
|
||
/** | ||
* Create an instance of Paginator with auto-detection of whether the provided | ||
* query is suitable for simple (and fast) pagination queries, or whether a complex | ||
* set of pagination queries has to be used. | ||
*/ | ||
public static function newWithAutoDetection(QueryBuilder $queryBuilder): self | ||
{ | ||
/** @var array<string, Join[]> $joinsPerRootAlias */ | ||
$joinsPerRootAlias = $queryBuilder->getDQLPart('join'); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. would it be possible to also support There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Are you ok with this new function's supporting only instances of QueryBuilder if I decide that I ran out of time for this PR? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @d-ph Supporting I'm not asking for the public API to drop support for QueryBuilder. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Supporting queries is important to support cases using
The QueryBuilder cannot configuring the fetch mode of relations. So even when using the QueryBuilder (which I often do), I often have cases where I pass a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
What about the comment I raised in point 1. above about the cost of constructing a Query's AST? My intention was to at some point add the support for Query, but in a form where there would be separate routines for inspecting the provided instance of Query or QueryBuilder, so that the AST construction is avoided when a QueryBuilder is provided. Something along the following lines:
Are you saying that you'd like to disregard the fact that constructing an AST is not a cheap operation (and which depends on the complexity of the query)?
👍 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I could accept a fast path introspecting the QueryBuilder instead of parsing the Query as an optimization if it has indeed an impact. However, it would have to support the same capabilities. See my other comment related to the detection of joining collections rather than any joins. Maybe the QueryBuilder introspection would not be viable for that. However, my personal opinion would be that we should not merge this without support for Query as it would prevent downstream projects from migrating to it (for instance, the Pagerfanta DoctrineOrmAdapter uses this Paginator and currently supports There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @stof when there are multiple to-one associations and the eager loading fetch mode is used, does this cause multiple result rows for a single entity and this is something where we need to be careful with regard to the goal of this PR? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. to-one associations don't create multiple rows in the SQL results. Only There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, even when there are multiple of them 🤦🏻♂️ |
||
|
||
// For now, only check whether there are any sql joins present in the query. Note, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be great to implement the proper detection of joining collections. Taking the code of my work project, almost all my paginated queries involve some toOne joins but the majority of them don't join a collection (making them compatible with the fast queries, which would then mean that switching to auto-detection would be a performance regression right now). |
||
// however, that it is doable to detect a presence of only *ToOne joins via the access to | ||
// joined entity classes' metadata (see: QueryBuilder::getEntityManager()->getClassMetadata(className)). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Performing this analysis on a QueryBuilder will be complex, as it has to account for joins done on previous joins and not on the root entity, replicating work already done by the query parser as part of the AST. |
||
$sqlJoinsPresent = count($joinsPerRootAlias) > 0; | ||
|
||
$paginator = new self($queryBuilder, $sqlJoinsPresent); | ||
|
||
$paginator->queryStyleAutoDetectionEnabled = true; | ||
$paginator->queryHasHavingClause = $queryBuilder->getDQLPart('having') !== null; | ||
|
||
return $paginator; | ||
} | ||
|
||
/** | ||
* Returns the query. | ||
*/ | ||
|
@@ -171,13 +206,25 @@ | |
/** | ||
* Determines whether to use an output walker for the query. | ||
*/ | ||
private function useOutputWalker(Query $query): bool | ||
private function useOutputWalker(Query $query, bool $forCountQuery = false): bool | ||
{ | ||
if ($this->useOutputWalkers === null) { | ||
return (bool) $query->getHint(Query::HINT_CUSTOM_OUTPUT_WALKER) === false; | ||
if ($this->useOutputWalkers !== null) { | ||
return $this->useOutputWalkers; | ||
} | ||
|
||
return $this->useOutputWalkers; | ||
// When a custom output walker already present, then do not use the Paginator's. | ||
if ($query->getHint(Query::HINT_CUSTOM_OUTPUT_WALKER) !== false) { | ||
return false; | ||
} | ||
|
||
// When not joining onto *ToMany relations, then do not use the more complex CountOutputWalker. | ||
return ! ( | ||
$forCountQuery | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if we have a separate detection for count query and result query, I would rather extract the check for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You said:
Did you mean to say "extract (...) in a separate function"? In other words, are you proposing splitting the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IIRC, I meant This would look like that: // ...
if (!$this->queryStyleAutoDetectionEnabled) {
return true; // Old simple guessing
}
if ($forCountQuery) {
return /* condition for the support of CountWalker */;
}
return /* condition for the support of LimitSubqueryWalker */; |
||
&& $this->queryStyleAutoDetectionEnabled | ||
&& ! $this->fetchJoinCollection | ||
// CountWalker doesn't support the "having" clause, while CountOutputWalker does. | ||
&& $this->queryHasHavingClause === false | ||
); | ||
} | ||
|
||
/** | ||
|
@@ -205,10 +252,20 @@ | |
$countQuery = $this->cloneQuery($this->query); | ||
|
||
if (! $countQuery->hasHint(CountWalker::HINT_DISTINCT)) { | ||
$countQuery->setHint(CountWalker::HINT_DISTINCT, true); | ||
$hintDistinctDefaultTrue = true; | ||
|
||
// When not joining onto *ToMany relations, then use a simpler COUNT query in the CountWalker. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see what you mean. It appears I misunderstood the purpose of that variable. I gave a thought about what to do with it, and I initially concluded that that variable is just misnamed. Consider the following:
In other words, both the following DQL queries require the extra "SELECT DISTINCT" subquery to happen in the
It appears then that the docblocks are under an impression that that variable is "true" regardless of whether a *ToMany relation is fetch-loaded or not. Do you agree that that instance variable is not named correctly (or the author incorrectly thought that not fetch-loading a *ToMany relation is immune to the "duplicated records being returned from the database" problem)? Would you agree for me to just rename that instance variable to something like: "queryHasToManyJoins"? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hmm, maybe it is indeed badly named. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that there is a case where joined collections that are not selected will not impact the count or the iteration: when using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just so that I understand: you're trying to say that running the following DQLs without an "outputWalker" would be fine because it would produce the correct results, therefore the literal meaning of
I agree, however note that in the current code there is nothing that would ensure that DISTINCT is added/present in the In case there is a legitimate worry that there might be someone in the userspace that does the (undocumented) due diligence and adds the DISTINCT keyword to their queries when they set Other that that, I admit that there is a potential future improvement, where So, are you ok for me to rename that instance variable, or would you rather that I should not touch it, and instead introduce a new instance variable? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That was a lot to take in for me. After a substantial amount of thought, I would like to propose and get your opinion on the following.
Q1. Are you personally ok with the proposition above?
Q2. Could you please confirm that you meant:
Thanks for mentioning the case of
That is good to know, however note that I do not intend to touch the region of the code that makes the decision whether the walker-based or tree-walker fetching query should be used.
Fair point, however this again is not something that I'm going to change or touch in my PR. Note that if a query uses composite primary keys AND produces duplicate rows (i.e. is not compatible with the "fast" result query), then the Paginator will not able to be used at all, and it's no different to the current situation in the Paginator.
This again is good to know, however I don't intend to auto-detect/guess in this PR which of the two walkers should be used by the result-fetching query. Could you reply to Q1 and Q2, please, and share any other thought you have about what I said above? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
your code provides guessing for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, are you referring to the following diff in the current PR? private function useOutputWalker(Query $query): bool
{
if ($this->useOutputWalkers === null) {
return (bool) $query->getHint(Query::HINT_CUSTOM_OUTPUT_WALKER) === false;
}
return $this->useOutputWalkers;
} -> private function useOutputWalker(Query $query, bool $forCountQuery = false): bool
{
if ($this->useOutputWalkers !== null) {
return $this->useOutputWalkers;
}
// When a custom output walker already present, then do not use the Paginator's.
if ($query->getHint(Query::HINT_CUSTOM_OUTPUT_WALKER) !== false) {
return false;
}
// When not joining onto *ToMany relations, then do not use the more complex CountOutputWalker.
return ! (
$forCountQuery
// (rest of the diff not relevant)
);
} Do we both agree that this code diff does not result in any change to whether the walker-based or the tree-walker result-fetching query should be used? Or do you refer to the proposition of splitting Note that I started this PR with only the intention of adding code that "if there are any dql-joins present, then use the fast version of the queries". I believe this alone would immediately address 80% of the use cases in the wild. Anything more than this, I consider a nicety. However, I do agree that going the extra mile and checking for *ToMany dql-joins should be easy enough, so as an exception from my original plan, I'm willing to attempt to make that change in the current PR. Anything other than that is a "less than 5% of use cases" matter in my mind, and although I do appreciate us discussing it, I do not intend to address it in this PR. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. when There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I agree.
I agree.
Do we both agree that my code diff does not change the behaviour of
Are you trying to say that because the new If you're not referring to the consequences of the |
||
if ( | ||
$this->queryStyleAutoDetectionEnabled | ||
&& ! $this->fetchJoinCollection | ||
) { | ||
$hintDistinctDefaultTrue = false; | ||
} | ||
|
||
$countQuery->setHint(CountWalker::HINT_DISTINCT, $hintDistinctDefaultTrue); | ||
} | ||
|
||
if ($this->useOutputWalker($countQuery)) { | ||
if ($this->useOutputWalker($countQuery, forCountQuery: true)) { | ||
$platform = $countQuery->getEntityManager()->getConnection()->getDatabasePlatform(); // law of demeter win | ||
|
||
$rsm = new ResultSetMapping(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a property phpdoc, the description is just the phpdoc description, not something attached to the tag (and then, we don't need
@var
at all as there is a native type).