-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
leading hint + join methods hint cannot totally force the join order. #195
Comments
I have the same question. I also wonder why the planning time is so long, even though I have restricted the join order and join method with pg_hint_plan. |
Could you produce a self-contained test case that could be used for investigation, please? I am not familar with the JOB benchmark nor any of its queries, hence it would be a serious time gain for me. |
You can follow these steps to set up the JOB benchmark 1. Download imdb data:
in postgresql
Download JOB Benchmark:
|
Hi Micahel-san,
Regarding to How to set up JOB, I hope this info helps you:
https://github.com/yamatattsu/pg_plan_advsr/blob/master/JOB/how_to_setup.md
…On Fri, Sep 20, 2024 at 1:56 PM KaiZhong ***@***.***> wrote:
Could you produce a self-contained test case that could be used for
investigation, please? I am not familar with the JOB benchmark nor any of
its queries, hence it would be a serious time gain for me.
You can follow these steps to set up the JOB benchmark
download JOB benchmark (VLDB version)
1. Download imdb data:
wget http://homepages.cwi.nl/~boncz/job/imdb.tgz
tar -zxvf imdb.tar.gz
in postgresql
CREATE DATABASE imdb;
\c imdb
\i /PATH/TO/imdb/schematext.sql
\copy aka_name from '/Downloads/imdb_data/aka_name.csv' csv delimiter ','
quote '"' escape '';
\copy aka_title from '/Downloads/imdb_data/aka_title.csv' csv delimiter
',' quote '"' escape '';
\copy cast_info from '/Downloads/imdb_data/cast_info.csv' csv delimiter
',' quote '"' escape '';
\copy char_name from '/Downloads/imdb_data/char_name.csv' csv delimiter
',' quote '"' escape '';
\copy comp_cast_type from '/Downloads/imdb_data/comp_cast_type.csv' csv
delimiter ',' quote '"' escape '';
\copy company_name from '/Downloads/imdb_data/company_name.csv' csv
delimiter ',' quote '"' escape '';
\copy company_type from '/Downloads/imdb_data/company_type.csv' csv
delimiter ',' quote '"' escape '';
\copy complete_cast from '/Downloads/imdb_data/complete_cast.csv' csv
delimiter ',' quote '"' escape '';
\copy info_type from '/Downloads/imdb_data/info_type.csv' csv delimiter
',' quote '"' escape '';
\copy keyword from '/Downloads/imdb_data/keyword.csv' csv delimiter ','
quote '"' escape '';
\copy kind_type from '/Downloads/imdb_data/kind_type.csv' csv delimiter
',' quote '"' escape '';
\copy link_type from '/Downloads/imdb_data/link_type.csv' csv delimiter
',' quote '"' escape '';
\copy movie_companies from '/Downloads/imdb_data/movie_companies.csv' csv
delimiter ',' quote '"' escape '';
\copy movie_info from '/Downloads/imdb_data/movie_info.csv' csv delimiter
',' quote '"' escape '';
\copy movie_info_idx from '/Downloads/imdb_data/movie_info_idx.csv' csv
delimiter ',' quote '"' escape '';
\copy movie_keyword from '/Downloads/imdb_data/movie_keyword.csv' csv
delimiter ',' quote '"' escape '';
\copy movie_link from '/Downloads/imdb_data/movie_link.csv' csv delimiter
',' quote '"' escape '';
\copy name from '/Downloads/imdb_data/name.csv' csv delimiter ',' quote
'"' escape '';
\copy person_info from '/Downloads/imdb_data/person_info.csv' csv
delimiter ',' quote '"' escape '';
\copy role_type from '/Downloads/imdb_data/role_type.csv' csv delimiter
',' quote '"' escape '';
\copy title from '~/Downloads/imdb_data/title.csv' csv delimiter ',' quote
'"' escape '';
\i /PATH/TO/imdb/fkindexes.sql
2. Download JOB Benchmark:
Git clone https://github.com/gregrahn/join-order-benchmark
—
Reply to this email directly, view it on GitHub
<#195 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADSHQGYUMSHQ4B6P65VUWPLZXOTHTAVCNFSM6AAAAABJ43O25CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRSHAZDENJSGU>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Hi, all. I am working on this issue. I found that the join order inconsistent with the input hints is caused by the implementation of PG and is not a bug of pg_hint_plan. In particular, PG disables certain operators by adding a disable_cost(1.0e10) to its estimated cost; then, the planner would not select this operator due to its large cost. To assign a join order, pg_hint_plan disables all join operators when considering inconsistent join orders (by adding a disable cost to the cost of join operators). This is how pg_hint_plan works. In this issue, the bug in the second example of @ZhengtongYan is it generates sub-join order ((ci an) t) for assigned sub-join order (t (ci an)) and ((mc (mi ((ci an) t))) chn) for (chn (mc (mi ((ci an) t)))). This is because PG estimates the cost of operators in two phases. Moreover, PG will filter some obviously inferior based on the estimated cost in the first phase. However, PG does not include disable_cost for disabled operators and only considers it in the second phase. While (t (ci an)) will utilize a normal nest loop join and access relation t with a seq scan, ((ci an) t) will utilize an index-based nest loop join and access relation t with an index scan, which is obviously faster. Thus, (t (ci an)) will be filtered after the first phase of cost estimation. Same reason to (chn (mc (mi ((ci an) t)))). The reason why the third example works is that forces PG to select seqscan for relation t and chn, thus PG would not consider index-based nest loop joins for them. I have addressed the problem in the second example by considering disable_cost for the disabled operators in the first phase of cost estimation. I want to create a pull request to Postgres and make PG consider disable_cost for the disabled operators in the first phase of cost estimation. Would you like to help me? @michaelpq |
Contributions to PostgreSQL are not accepted through github. Upstream community has its own process, see: |
I see. Thanks for the reminder. |
@michaelpq Hi, I found a way to solve this problem we could just write a wrapper for the function of cost estimation of PostgreSQL. How do you like this idea? |
Hi, all! I have created a pull-request in #207. |
I tested Q19a query of JOB benchmark in the latest PG17beta1 version. I found the following problems which indicate that hints cannot totally force the decisions of optimizer and optimizer will not follow the hints in some cases.
Execution plan is:
Execution plan is:
Actual join order and join algorithms:
We could see that:
t
andchn
are not following the leading hint((rt (it ((n (chn (mc (mi (t (ci an)))))) cn))))
.t
andchn
use index scan, so they are used as the inner tables.If we add hints
SeqScan(t) SeqScan(chn)
to force the access methods, the join order will become right.Execution plan is:
Based on the above results, we can get the following conclusions:
t
andchn
as the inner table in the nest loop join.Recommendation: fix this bug by completely forcing the optimizer following the hints such as a combination hints with join order and join algorithms!
The text was updated successfully, but these errors were encountered: