Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

leading hint + join methods hint cannot totally force the join order. #195

Open
ZhengtongYan opened this issue Jun 26, 2024 · 9 comments
Open

Comments

@ZhengtongYan
Copy link

I tested Q19a query of JOB benchmark in the latest PG17beta1 version. I found the following problems which indicate that hints cannot totally force the decisions of optimizer and optimizer will not follow the hints in some cases.

  1. Leading hint only: work correctly!
/*+ Leading((rt (it ((n (chn (mc (mi (t (ci an)))))) cn)))) */
EXPLAIN (FORMAT TEXT)
SELECT MIN(n.name) AS voicing_actress,
       MIN(t.title) AS voiced_movie
FROM aka_name AS an,
     char_name AS chn,
     cast_info AS ci,
     company_name AS cn,
     info_type AS it,
     movie_companies AS mc,
     movie_info AS mi,
     name AS n,
     role_type AS rt,
     title AS t
WHERE ci.note IN ('(voice)',
                  '(voice: Japanese version)',
                  '(voice) (uncredited)',
                  '(voice: English version)')
  AND cn.country_code ='[us]'
  AND it.info = 'release dates'
  AND mc.note IS NOT NULL
  AND (mc.note LIKE '%(USA)%'
       OR mc.note LIKE '%(worldwide)%')
  AND mi.info IS NOT NULL
  AND (mi.info LIKE 'Japan:%200%'
       OR mi.info LIKE 'USA:%200%')
  AND n.gender ='f'
  AND n.name LIKE '%Ang%'
  AND rt.role ='actress'
  AND t.production_year BETWEEN 2005 AND 2009
  AND t.id = mi.movie_id
  AND t.id = mc.movie_id
  AND t.id = ci.movie_id
  AND mc.movie_id = ci.movie_id
  AND mc.movie_id = mi.movie_id
  AND mi.movie_id = ci.movie_id
  AND cn.id = mc.company_id
  AND it.id = mi.info_type_id
  AND n.id = ci.person_id
  AND rt.id = ci.role_id
  AND n.id = an.person_id
  AND ci.person_id = an.person_id
  AND chn.id = ci.person_role_id;

Execution plan is:

 Aggregate  (cost=1129471.94..1129471.95 rows=1 width=64)
   ->  Nested Loop  (cost=1031727.69..1129471.94 rows=1 width=32)
         Join Filter: (rt.id = ci.role_id)
         ->  Seq Scan on role_type rt  (cost=0.00..18.88 rows=4 width=4)
               Filter: ((role)::text = 'actress'::text)
         ->  Materialize  (cost=1031727.69..1129453.01 rows=1 width=36)
               ->  Nested Loop  (cost=1031727.69..1129453.00 rows=1 width=36)
                     Join Filter: (it.id = mi.info_type_id)
                     ->  Seq Scan on info_type it  (cost=0.00..2.41 rows=1 width=4)
                           Filter: ((info)::text = 'release dates'::text)
                     ->  Nested Loop  (cost=1031727.69..1129449.85 rows=59 width=40)
                           ->  Hash Join  (cost=1031727.27..1129253.56 rows=163 width=44)
                                 Hash Cond: (n.id = ci.person_id)
                                 ->  Gather  (cost=1000.00..83655.92 rows=9516 width=19)
                                       Workers Planned: 2
                                       ->  Parallel Seq Scan on name n  (cost=0.00..81704.32 rows=3965 width=19)
                                             Filter: ((name ~~ '%Ang%'::text) AND ((gender)::text = 'f'::text))
                                 ->  Hash  (cost=1029835.88..1029835.88 rows=71311 width=37)
                                       ->  Merge Join  (cost=928724.12..1029835.88 rows=71311 width=37)
                                             Merge Cond: (chn.id = ci.person_role_id)
                                             ->  Index Only Scan using char_name_pkey on char_name chn  (cost=0.43..92048.75 rows=3140288 width=4)
                                             ->  Materialize  (cost=928688.51..929418.41 rows=145979 width=41)
                                                   ->  Sort  (cost=928688.51..929053.46 rows=145979 width=41)
                                                         Sort Key: ci.person_role_id
                                                         ->  Gather  (cost=861411.88..911672.87 rows=145979 width=41)
                                                               Workers Planned: 2
                                                               ->  Parallel Hash Join  (cost=860411.88..896074.97 rows=60825 width=41)
                                                                     Hash Cond: (mc.movie_id = t.id)
                                                                     ->  Parallel Seq Scan on movie_companies mc  (cost=0.00..35131.06 rows=128227 width=8)
                                                                           Filter: ((note IS NOT NULL) AND ((note ~~ '%(USA)%'::text) OR (note ~~ '%(worldwide)%'::text))
)
                                                                     ->  Parallel Hash  (cost=859150.63..859150.63 rows=100900 width=49)
                                                                           ->  Parallel Hash Join  (cost=601038.73..859150.63 rows=100900 width=49)
                                                                                 Hash Cond: (mi.movie_id = t.id)
                                                                                 ->  Parallel Seq Scan on movie_info mi  (cost=0.00..253676.43 rows=226580 width=8)
                                                                                       Filter: ((info ~~ 'Japan:%200%'::text) OR (info ~~ 'USA:%200%'::text))
                                                                                 ->  Parallel Hash  (cost=597052.41..597052.41 rows=187225 width=41)
                                                                                       ->  Parallel Hash Join  (cost=525612.09..597052.41 rows=187225 width=41)
                                                                                             Hash Cond: (t.id = ci.movie_id)
                                                                                             ->  Parallel Seq Scan on title t  (cost=0.00..51808.72 rows=235421 width=21)
                                                                                                   Filter: ((production_year >= 2005) AND (production_year <= 2009))
                                                                                             ->  Parallel Hash  (cost=510231.45..510231.45 rows=837731 width=20)
                                                                                                   ->  Parallel Hash Join  (cost=21334.10..510231.45 rows=837731 width=20
)
                                                                                                         Hash Cond: (ci.person_id = an.person_id)
                                                                                                         ->  Parallel Seq Scan on cast_info ci  (cost=0.00..479269.70 row
s=345362 width=16)
                                                                                                               Filter: (note = ANY ('{(voice),"(voice: Japanese version)"
,"(voice) (uncredited)","(voice: English version)"}'::text[]))
                                                                                                         ->  Parallel Hash  (cost=15171.60..15171.60 rows=375560 width=4)
                                                                                                               ->  Parallel Seq Scan on aka_name an  (cost=0.00..15171.60
 rows=375560 width=4)
                           ->  Index Scan using company_name_pkey on company_name cn  (cost=0.42..1.20 rows=1 width=4)
                                 Index Cond: (id = mc.company_id)
                                 Filter: ((country_code)::text = '[us]'::text)
  1. Leading hint + join methods hint: wrong results.
/*+
Leading((rt (it ((n (chn (mc (mi (t (ci an)))))) cn))))
HashJoin(ci an)
NestLoop(ci an t)
NestLoop(ci an t mi)
NestLoop(ci an t mi mc)
NestLoop(ci an t mi mc chn)
NestLoop(ci an t mi mc chn n)
NestLoop(ci an t mi mc chn n cn)
NestLoop(ci an t mi mc chn n cn it)
NestLoop(ci an t mi mc chn n cn it rt)
*/
EXPLAIN (FORMAT TEXT)
SELECT MIN(n.name) AS voicing_actress,
       MIN(t.title) AS voiced_movie
FROM aka_name AS an,
     char_name AS chn,
     cast_info AS ci,
     company_name AS cn,
     info_type AS it,
     movie_companies AS mc,
     movie_info AS mi,
     name AS n,
     role_type AS rt,
     title AS t
WHERE ci.note IN ('(voice)',
                  '(voice: Japanese version)',
                  '(voice) (uncredited)',
                  '(voice: English version)')
  AND cn.country_code ='[us]'
  AND it.info = 'release dates'
  AND mc.note IS NOT NULL
  AND (mc.note LIKE '%(USA)%'
       OR mc.note LIKE '%(worldwide)%')
  AND mi.info IS NOT NULL
  AND (mi.info LIKE 'Japan:%200%'
       OR mi.info LIKE 'USA:%200%')
  AND n.gender ='f'
  AND n.name LIKE '%Ang%'
  AND rt.role ='actress'
  AND t.production_year BETWEEN 2005 AND 2009
  AND t.id = mi.movie_id
  AND t.id = mc.movie_id
  AND t.id = ci.movie_id
  AND mc.movie_id = ci.movie_id
  AND mc.movie_id = mi.movie_id
  AND mi.movie_id = ci.movie_id
  AND cn.id = mc.company_id
  AND it.id = mi.info_type_id
  AND n.id = ci.person_id
  AND rt.id = ci.role_id
  AND n.id = an.person_id
  AND ci.person_id = an.person_id
  AND chn.id = ci.person_role_id;

Execution plan is:

 Aggregate  (cost=27678801700.75..27678801700.76 rows=1 width=64)
   ->  Nested Loop  (cost=20000031697.50..27678801700.75 rows=1 width=32)
         Join Filter: (rt.id = ci.role_id)
         ->  Seq Scan on role_type rt  (cost=0.00..18.88 rows=4 width=4)
               Filter: ((role)::text = 'actress'::text)
         ->  Materialize  (cost=20000031697.50..27678801681.81 rows=1 width=36)
               ->  Nested Loop  (cost=20000031697.50..27678801681.81 rows=1 width=36)
                     Join Filter: (it.id = mi.info_type_id)
                     ->  Seq Scan on info_type it  (cost=0.00..2.41 rows=1 width=4)
                           Filter: ((info)::text = 'release dates'::text)
                     ->  Nested Loop  (cost=20000031697.50..27678801678.66 rows=59 width=40)
                           ->  Nested Loop  (cost=20000031697.08..27678801482.37 rows=163 width=44)
                                 Join Filter: (n.id = ci.person_id)
                                 ->  Seq Scan on name n  (cost=0.00..118171.96 rows=9516 width=19)
                                       Filter: ((name ~~ '%Ang%'::text) AND ((gender)::text = 'f'::text))
                                 ->  Materialize  (cost=20000031697.08..27663195186.54 rows=71311 width=37)
                                       ->  Nested Loop  (cost=20000031697.08..27663194271.99 rows=71311 width=37)
                                             ->  Nested Loop  (cost=10000031696.65..17663060878.15 rows=145979 width=41)
                                                   Join Filter: (t.id = mc.movie_id)
                                                   ->  Seq Scan on movie_companies mc  (cost=0.00..57960.93 rows=307745 width=8)
                                                         Filter: ((note IS NOT NULL) AND ((note ~~ '%(USA)%'::text) OR (note ~~ '%(worldwide)%'::text)))
                                                   ->  Materialize  (cost=10000031696.65..15817340640.79 rows=242159 width=49)
                                                         ->  Nested Loop  (cost=10000031696.65..15817337065.00 rows=242159 width=49)
                                                               Join Filter: (t.id = mi.movie_id)
                                                               ->  Seq Scan on movie_info mi  (cost=0.00..382516.23 rows=543793 width=8)
                                                                     Filter: ((info ~~ 'Japan:%200%'::text) OR (info ~~ 'USA:%200%'::text))
                                                               ->  Materialize  (cost=10000031696.65..10003749915.92 rows=449341 width=41)
                                                                     ->  Nested Loop  (cost=10000031696.65..10003743719.21 rows=449341 width=41)
                                                                           ->  Hash Join  (cost=31696.22..852385.40 rows=2010555 width=20)
                                                                                 Hash Cond: (ci.person_id = an.person_id)
                                                                                 ->  Seq Scan on cast_info ci  (cost=0.00..796439.28 rows=828870 width=16)
                                                                                       Filter: (note = ANY ('{(voice),"(voice: Japanese version)","(voice) (uncredited)",
"(voice: English version)"}'::text[]))
                                                                                 ->  Hash  (cost=20429.43..20429.43 rows=901343 width=4)
                                                                                       ->  Seq Scan on aka_name an  (cost=0.00..20429.43 rows=901343 width=4)
                                                                           ->  Index Scan using title_pkey on title t  (cost=0.43..1.44 rows=1 width=21)
                                                                                 Index Cond: (id = ci.movie_id)
                                                                                 Filter: ((production_year >= 2005) AND (production_year <= 2009))
                                             ->  Index Only Scan using char_name_pkey on char_name chn  (cost=0.43..0.91 rows=1 width=4)
                                                   Index Cond: (id = ci.person_role_id)
                           ->  Index Scan using company_name_pkey on company_name cn  (cost=0.42..1.20 rows=1 width=4)
                                 Index Cond: (id = mc.company_id)
                                 Filter: ((country_code)::text = '[us]'::text)

Actual join order and join algorithms:

(rt (it ((n ((mc (mi ((ci an) t))) chn)) cn)))
HashJoin(ci an)
NestLoop(ci an t)
NestLoop(ci an t mi)
NestLoop(ci an t mi mc)
NestLoop(ci an t mi mc chn)
NestLoop(ci an t mi mc chn n)
NestLoop(ci an t mi mc chn n cn)

We could see that:

  • both table t and chn are not following the leading hint ((rt (it ((n (chn (mc (mi (t (ci an)))))) cn)))).
  • both table t and chn use index scan, so they are used as the inner tables.
  1. Leading hint + join methods hint + access methods hint: work correctly!.
    If we add hints SeqScan(t) SeqScan(chn) to force the access methods, the join order will become right.
/*+
SeqScan(t) SeqScan(chn)
Leading((rt (it ((n (chn (mc (mi (t (ci an)))))) cn))))
HashJoin(ci an)
NestLoop(ci an t)
NestLoop(ci an t mi)
NestLoop(ci an t mi mc)
NestLoop(ci an t mi mc chn)
NestLoop(ci an t mi mc chn n)
NestLoop(ci an t mi mc chn n cn)
NestLoop(ci an t mi mc chn n cn it)
NestLoop(ci an t mi mc chn n cn it rt)
*/
EXPLAIN (FORMAT TEXT)
SELECT MIN(n.name) AS voicing_actress,
       MIN(t.title) AS voiced_movie
FROM aka_name AS an,
     char_name AS chn,
     cast_info AS ci,
     company_name AS cn,
     info_type AS it,
     movie_companies AS mc,
     movie_info AS mi,
     name AS n,
     role_type AS rt,
     title AS t
WHERE ci.note IN ('(voice)',
                  '(voice: Japanese version)',
                  '(voice) (uncredited)',
                  '(voice: English version)')
  AND cn.country_code ='[us]'
  AND it.info = 'release dates'
  AND mc.note IS NOT NULL
  AND (mc.note LIKE '%(USA)%'
       OR mc.note LIKE '%(worldwide)%')
  AND mi.info IS NOT NULL
  AND (mi.info LIKE 'Japan:%200%'
       OR mi.info LIKE 'USA:%200%')
  AND n.gender ='f'
  AND n.name LIKE '%Ang%'
  AND rt.role ='actress'
  AND t.production_year BETWEEN 2005 AND 2009
  AND t.id = mi.movie_id
  AND t.id = mc.movie_id
  AND t.id = ci.movie_id
  AND mc.movie_id = ci.movie_id
  AND mc.movie_id = mi.movie_id
  AND mi.movie_id = ci.movie_id
  AND cn.id = mc.company_id
  AND it.id = mi.info_type_id
  AND n.id = ci.person_id
  AND rt.id = ci.role_id
  AND n.id = an.person_id
  AND ci.person_id = an.person_id
  AND chn.id = ci.person_role_id;

Execution plan is:

 Aggregate  (cost=42280242141.33..42280242141.34 rows=1 width=64)
   ->  Nested Loop  (cost=32696.64..42280242141.32 rows=1 width=32)
         Join Filter: (rt.id = ci.role_id)
         ->  Seq Scan on role_type rt  (cost=0.00..18.88 rows=4 width=4)
               Filter: ((role)::text = 'actress'::text)
         ->  Materialize  (cost=32696.64..42280242122.39 rows=1 width=36)
               ->  Nested Loop  (cost=32696.64..42280242122.39 rows=1 width=36)
                     Join Filter: (it.id = mi.info_type_id)
                     ->  Seq Scan on info_type it  (cost=0.00..2.41 rows=1 width=4)
                           Filter: ((info)::text = 'release dates'::text)
                     ->  Nested Loop  (cost=32696.64..42280242119.24 rows=59 width=40)
                           ->  Nested Loop  (cost=32696.22..42280241922.95 rows=163 width=44)
                                 Join Filter: (n.id = ci.person_id)
                                 ->  Seq Scan on name n  (cost=0.00..118171.96 rows=9516 width=19)
                                       Filter: ((name ~~ '%Ang%'::text) AND ((gender)::text = 'f'::text))
                                 ->  Materialize  (cost=32696.22..42264635627.12 rows=71311 width=37)
                                       ->  Nested Loop  (cost=32696.22..42264634712.57 rows=71311 width=37)
                                             Join Filter: (chn.id = ci.person_role_id)
                                             ->  Seq Scan on char_name chn  (cost=0.00..67850.88 rows=3140288 width=4)
                                             ->  Materialize  (cost=32696.22..31356197189.35 rows=145979 width=41)
                                                   ->  Nested Loop  (cost=32696.22..31356195175.46 rows=145979 width=41)
                                                         Join Filter: (t.id = mc.movie_id)
                                                         ->  Seq Scan on movie_companies mc  (cost=0.00..57960.93 rows=307745 width=8)
                                                               Filter: ((note IS NOT NULL) AND ((note ~~ '%(USA)%'::text) OR (note ~~ '%(worldwide)%'::text)))
                                                         ->  Materialize  (cost=32696.22..29510474938.10 rows=242159 width=49)
                                                               ->  Nested Loop  (cost=32696.22..29510471362.30 rows=242159 width=49)
                                                                     Join Filter: (t.id = mi.movie_id)
                                                                     ->  Seq Scan on movie_info mi  (cost=0.00..382516.23 rows=543793 width=8)
                                                                           Filter: ((info ~~ 'Japan:%200%'::text) OR (info ~~ 'USA:%200%'::text))
                                                                     ->  Materialize  (cost=32696.22..23696884213.22 rows=449341 width=41)
                                                                           ->  Nested Loop  (cost=32696.22..23696878016.52 rows=449341 width=41)
                                                                                 Join Filter: (t.id = ci.movie_id)
                                                                                 ->  Seq Scan on title t  (cost=0.00..73929.74 rows=565011 width=21)
                                                                                       Filter: ((production_year >= 2005) AND (production_year <= 2009))
                                                                                 ->  Materialize  (cost=32696.22..640936.59 rows=2010555 width=20)
                                                                                       ->  Hash Join  (cost=32696.22..619102.82 rows=2010555 width=20)
                                                                                             Hash Cond: (ci.person_id = an.person_id)
                                                                                             ->  Gather  (cost=1000.00..563156.70 rows=828870 width=16)
                                                                                                   Workers Planned: 2
                                                                                                   ->  Parallel Seq Scan on cast_info ci  (cost=0.00..479269.70 rows=3453
62 width=16)
                                                                                                         Filter: (note = ANY ('{(voice),"(voice: Japanese version)","(voi
ce) (uncredited)","(voice: English version)"}'::text[]))
                                                                                             ->  Hash  (cost=20429.43..20429.43 rows=901343 width=4)
                                                                                                   ->  Seq Scan on aka_name an  (cost=0.00..20429.43 rows=901343 width=4)
                           ->  Index Scan using company_name_pkey on company_name cn  (cost=0.42..1.20 rows=1 width=4)
                                 Index Cond: (id = mc.company_id)
                                 Filter: ((country_code)::text = '[us]'::text)

Based on the above results, we can get the following conclusions:

  • If we only use the leading hint, the optimizer will choose the appropriate join methods to follow the specified join order.
  • If we use both leading hint and join methods hint, the join order will not strictly follow the join order hint in some cases. For example, in our example the optimizer choose to use the index nested loop join rather than the normal nested loop join. So it put t and chn as the inner table in the nest loop join.
  • Use only leading hint means the optimizer has some freedom to choose the join methods and access methods.
  • The pg_hint_plan has bugs that optimizer's decisions cannot be totally forced and controlled by hints when some hints combinations are used. This is a bug of the hint and need to be fixed. There is a similar issue on github about this problem.

Recommendation: fix this bug by completely forcing the optimizer following the hints such as a combination hints with join order and join algorithms!

@ZhengtongYan ZhengtongYan changed the title Leading hint + join methods hint cannot totally force the join order. leading hint + join methods hint cannot totally force the join order. Jun 26, 2024
@zk9907
Copy link

zk9907 commented Jul 5, 2024

I have the same question. I also wonder why the planning time is so long, even though I have restricted the join order and join method with pg_hint_plan.

@michaelpq
Copy link
Collaborator

Could you produce a self-contained test case that could be used for investigation, please? I am not familar with the JOB benchmark nor any of its queries, hence it would be a serious time gain for me.

@zk9907
Copy link

zk9907 commented Sep 20, 2024

Could you produce a self-contained test case that could be used for investigation, please? I am not familar with the JOB benchmark nor any of its queries, hence it would be a serious time gain for me.

You can follow these steps to set up the JOB benchmark

1. Download imdb data:

wget http://homepages.cwi.nl/~boncz/job/imdb.tgz
tar -zxvf imdb.tar.gz

in postgresql

CREATE DATABASE imdb;
\c imdb
\i /PATH/TO/imdb/schematext.sql
\copy aka_name from '~/Downloads/imdb_data/aka_name.csv' csv delimiter ',' quote '"' escape '\';
\copy aka_title from '~/Downloads/imdb_data/aka_title.csv' csv delimiter ',' quote '"' escape '\';
\copy cast_info from '~/Downloads/imdb_data/cast_info.csv' csv delimiter ',' quote '"' escape '\';
\copy char_name from '~/Downloads/imdb_data/char_name.csv' csv delimiter ',' quote '"' escape '\';
\copy comp_cast_type from '~/Downloads/imdb_data/comp_cast_type.csv' csv delimiter ',' quote '"' escape '\';
\copy company_name from '~/Downloads/imdb_data/company_name.csv' csv delimiter ',' quote '"' escape '\';
\copy company_type from '~/Downloads/imdb_data/company_type.csv' csv delimiter ',' quote '"' escape '\';
\copy complete_cast from '~/Downloads/imdb_data/complete_cast.csv' csv delimiter ',' quote '"' escape '\';
\copy info_type from '~/Downloads/imdb_data/info_type.csv' csv delimiter ',' quote '"' escape '\';
\copy keyword from '~/Downloads/imdb_data/keyword.csv' csv delimiter ',' quote '"' escape '\';
\copy kind_type from '~/Downloads/imdb_data/kind_type.csv' csv delimiter ',' quote '"' escape '\';
\copy link_type from '~/Downloads/imdb_data/link_type.csv' csv delimiter ',' quote '"' escape '\';
\copy movie_companies from '~/Downloads/imdb_data/movie_companies.csv' csv delimiter ',' quote '"' escape '\';
\copy movie_info from '~/Downloads/imdb_data/movie_info.csv' csv delimiter ',' quote '"' escape '\';
\copy movie_info_idx from '~/Downloads/imdb_data/movie_info_idx.csv' csv delimiter ',' quote '"' escape '\';
\copy movie_keyword from '~/Downloads/imdb_data/movie_keyword.csv' csv delimiter ',' quote '"' escape '\';
\copy movie_link from '~/Downloads/imdb_data/movie_link.csv' csv delimiter ',' quote '"' escape '\';
\copy name from '~/Downloads/imdb_data/name.csv' csv delimiter ',' quote '"' escape '\';
\copy person_info from '~/Downloads/imdb_data/person_info.csv' csv delimiter ',' quote '"' escape '\';
\copy role_type from '~/Downloads/imdb_data/role_type.csv' csv delimiter ',' quote '"' escape '\';
\copy title from '~/Downloads/imdb_data/title.csv' csv delimiter ',' quote '"' escape '\';
\i /PATH/TO/imdb/fkindexes.sql

Download JOB Benchmark:

git clone https://github.com/gregrahn/join-order-benchmark

@yamatattsu
Copy link
Member

yamatattsu commented Sep 26, 2024 via email

@HennyNile
Copy link

HennyNile commented Oct 28, 2024

Hi, all. I am working on this issue. I found that the join order inconsistent with the input hints is caused by the implementation of PG and is not a bug of pg_hint_plan.

In particular, PG disables certain operators by adding a disable_cost(1.0e10) to its estimated cost; then, the planner would not select this operator due to its large cost. To assign a join order, pg_hint_plan disables all join operators when considering inconsistent join orders (by adding a disable cost to the cost of join operators). This is how pg_hint_plan works.

In this issue, the bug in the second example of @ZhengtongYan is it generates sub-join order ((ci an) t) for assigned sub-join order (t (ci an)) and ((mc (mi ((ci an) t))) chn) for (chn (mc (mi ((ci an) t)))). This is because PG estimates the cost of operators in two phases. Moreover, PG will filter some obviously inferior based on the estimated cost in the first phase. However, PG does not include disable_cost for disabled operators and only considers it in the second phase. While (t (ci an)) will utilize a normal nest loop join and access relation t with a seq scan, ((ci an) t) will utilize an index-based nest loop join and access relation t with an index scan, which is obviously faster. Thus, (t (ci an)) will be filtered after the first phase of cost estimation. Same reason to (chn (mc (mi ((ci an) t)))).

The reason why the third example works is that forces PG to select seqscan for relation t and chn, thus PG would not consider index-based nest loop joins for them. I have addressed the problem in the second example by considering disable_cost for the disabled operators in the first phase of cost estimation.

I want to create a pull request to Postgres and make PG consider disable_cost for the disabled operators in the first phase of cost estimation. Would you like to help me? @michaelpq

@michaelpq
Copy link
Collaborator

I want to create a pull request to Postgres and make PG consider disable_cost for the disabled operators in the first phase of cost estimation. Would you like to help me? @michaelpq

Contributions to PostgreSQL are not accepted through github. Upstream community has its own process, see:
https://wiki.postgresql.org/wiki/Submitting_a_Patch

@HennyNile
Copy link

I want to create a pull request to Postgres and make PG consider disable_cost for the disabled operators in the first phase of cost estimation. Would you like to help me? @michaelpq

Contributions to PostgreSQL are not accepted through github. Upstream community has its own process, see: https://wiki.postgresql.org/wiki/Submitting_a_Patch

I see. Thanks for the reminder.

@HennyNile
Copy link

@michaelpq Hi, I found a way to solve this problem we could just write a wrapper for the function of cost estimation of PostgreSQL. How do you like this idea?

@HennyNile
Copy link

Hi, all! I have created a pull-request in #207.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants