Fix issue 195 "leading hint + join methods hint cannot totally force the join order" #207
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi! I am excited to share with you that I have solved #195. Before I write code to solve the problem, I have discussed it with others in #195. As a reminder, I will briefly introduce the problem, the reason, the solution, and the limitations of the solution.
Problem Description
The problem is that PostgreSQL with pg_hint_plan generates an execution plan inconsistent with the input hints. In particular, for the following query
The generated execution plan (reimplemented in my server) is
The actual join order is inconsistent with the input join order hint, both table t and chn are not following the leading hint.:
Reason and Solution
There are two reasons for this issue.
1. PostgreSQL does not include disable_cost for disabled operator.
PostgreSQL with pg_hint_plan supports disabling certain operators (e.g., hash join, seq scan) by setting pg parameters like “set enable_hashjoin = false”. This setting causes PostgreSQL to add a high disable_cost (e.g., 1e10) to the estimated cost of the hash join operator, effectively preventing the planner from selecting hash joins due to the inflated cost. Additionally, pg_hint_plan supports enforcing specific join orders. To do this, pg_hint_plan disables all join algorithms when it encounters inconsistent join orders, by adding the disable_cost to each join operator. As a result, only the assigned join order will be selected. This is the mechanism behind pg_hint_plan.
In the given example, the hint specifies a join order (rt (it ((n (chn (mc (mi (t (ci an)))))) cn))), but the generated join order is (rt (it ((n ((mc (mi ((ci an) t))) chn)) cn))). Here, PostgreSQL generates sub-join order ((ci an) t) instead of the assigned sub-join order (t (ci an)), and ((mc (mi ((ci an) t))) chn) instead of (chn (mc (mi ((ci an) t)))). This discrepancy arises because PostgreSQL estimates operator costs in two phases. In the first phase, it filters out paths that are obviously suboptimal based on estimated costs. However, it does not include disable_cost for disabled operators in this phase, only doing so in the second phase. While (t (ci an)) would use a regular nested loop join, ((ci an) t) uses an index-based nested loop join with an index scan on t, which is significantly faster. Consequently, (t (ci an)) is filtered out after the first phase of cost estimation. The same reasoning applies to (chn (mc (mi ((ci an) t)))).
To solve this problem, we could simply include diabled_cost(set to 1e10) in the first phase of cost estimation for disabled operators.
2. disable_cost(set to 1e10) defined in PG is not large enough.
After applying the modifications introduced in section 1, we generated the following plan:
The actual join order is still not inconsistent with the input join order hint, where table chn follows the leading hint, but the table t still does not.:
By examining the cost of desired subplan Nestloop(t Hashjoin(ci an)), I found its cost estimate is 25367763419.74, which is even larger than disable_cost (set to 1e10). This cost estimate of Nestloop(t Hashjoin(ci an)) could be approximated as row(t)*cost(Hashjoin(ci an)) = 567391 * 777302.92 = 44e10. A horrible assigned plan may has larger cost estimate than disable_cost, then the diabled join order may has less cost estimates and would be selected as the final plan.
To solve this problem, I use
pg_hint_diable_cost
(set to 1e20) to replace the diable_cost in PG.With these two modifications, we get the desired plans:
Limitations of this Solution
There two main limitaions of this solution:
(1) There is no hook for cost estimation functions in PostgreSQL, then I copied a lot of routine functions to modify the cost estimation functions.
(2) There may be some plans with large estimated cost than 1e20 (I think it's not possible for the centric database), in this case, the assigned plan wil be not selected.
I implement this solution in PG16 and the extension to other PG version is not hard.
Hope your reply!