leading hint + join methods hint cannot totally force the join order. #195

ZhengtongYan · 2024-06-26T01:01:03Z

I tested Q19a query of JOB benchmark in the latest PG17beta1 version. I found the following problems which indicate that hints cannot totally force the decisions of optimizer and optimizer will not follow the hints in some cases.

Leading hint only: work correctly!

/*+ Leading((rt (it ((n (chn (mc (mi (t (ci an)))))) cn)))) */
EXPLAIN (FORMAT TEXT)
SELECT MIN(n.name) AS voicing_actress,
       MIN(t.title) AS voiced_movie
FROM aka_name AS an,
     char_name AS chn,
     cast_info AS ci,
     company_name AS cn,
     info_type AS it,
     movie_companies AS mc,
     movie_info AS mi,
     name AS n,
     role_type AS rt,
     title AS t
WHERE ci.note IN ('(voice)',
                  '(voice: Japanese version)',
                  '(voice) (uncredited)',
                  '(voice: English version)')
  AND cn.country_code ='[us]'
  AND it.info = 'release dates'
  AND mc.note IS NOT NULL
  AND (mc.note LIKE '%(USA)%'
       OR mc.note LIKE '%(worldwide)%')
  AND mi.info IS NOT NULL
  AND (mi.info LIKE 'Japan:%200%'
       OR mi.info LIKE 'USA:%200%')
  AND n.gender ='f'
  AND n.name LIKE '%Ang%'
  AND rt.role ='actress'
  AND t.production_year BETWEEN 2005 AND 2009
  AND t.id = mi.movie_id
  AND t.id = mc.movie_id
  AND t.id = ci.movie_id
  AND mc.movie_id = ci.movie_id
  AND mc.movie_id = mi.movie_id
  AND mi.movie_id = ci.movie_id
  AND cn.id = mc.company_id
  AND it.id = mi.info_type_id
  AND n.id = ci.person_id
  AND rt.id = ci.role_id
  AND n.id = an.person_id
  AND ci.person_id = an.person_id
  AND chn.id = ci.person_role_id;

Execution plan is:

 Aggregate  (cost=1129471.94..1129471.95 rows=1 width=64)
   ->  Nested Loop  (cost=1031727.69..1129471.94 rows=1 width=32)
         Join Filter: (rt.id = ci.role_id)
         ->  Seq Scan on role_type rt  (cost=0.00..18.88 rows=4 width=4)
               Filter: ((role)::text = 'actress'::text)
         ->  Materialize  (cost=1031727.69..1129453.01 rows=1 width=36)
               ->  Nested Loop  (cost=1031727.69..1129453.00 rows=1 width=36)
                     Join Filter: (it.id = mi.info_type_id)
                     ->  Seq Scan on info_type it  (cost=0.00..2.41 rows=1 width=4)
                           Filter: ((info)::text = 'release dates'::text)
                     ->  Nested Loop  (cost=1031727.69..1129449.85 rows=59 width=40)
                           ->  Hash Join  (cost=1031727.27..1129253.56 rows=163 width=44)
                                 Hash Cond: (n.id = ci.person_id)
                                 ->  Gather  (cost=1000.00..83655.92 rows=9516 width=19)
                                       Workers Planned: 2
                                       ->  Parallel Seq Scan on name n  (cost=0.00..81704.32 rows=3965 width=19)
                                             Filter: ((name ~~ '%Ang%'::text) AND ((gender)::text = 'f'::text))
                                 ->  Hash  (cost=1029835.88..1029835.88 rows=71311 width=37)
                                       ->  Merge Join  (cost=928724.12..1029835.88 rows=71311 width=37)
                                             Merge Cond: (chn.id = ci.person_role_id)
                                             ->  Index Only Scan using char_name_pkey on char_name chn  (cost=0.43..92048.75 rows=3140288 width=4)
                                             ->  Materialize  (cost=928688.51..929418.41 rows=145979 width=41)
                                                   ->  Sort  (cost=928688.51..929053.46 rows=145979 width=41)
                                                         Sort Key: ci.person_role_id
                                                         ->  Gather  (cost=861411.88..911672.87 rows=145979 width=41)
                                                               Workers Planned: 2
                                                               ->  Parallel Hash Join  (cost=860411.88..896074.97 rows=60825 width=41)
                                                                     Hash Cond: (mc.movie_id = t.id)
                                                                     ->  Parallel Seq Scan on movie_companies mc  (cost=0.00..35131.06 rows=128227 width=8)
                                                                           Filter: ((note IS NOT NULL) AND ((note ~~ '%(USA)%'::text) OR (note ~~ '%(worldwide)%'::text))
)
                                                                     ->  Parallel Hash  (cost=859150.63..859150.63 rows=100900 width=49)
                                                                           ->  Parallel Hash Join  (cost=601038.73..859150.63 rows=100900 width=49)
                                                                                 Hash Cond: (mi.movie_id = t.id)
                                                                                 ->  Parallel Seq Scan on movie_info mi  (cost=0.00..253676.43 rows=226580 width=8)
                                                                                       Filter: ((info ~~ 'Japan:%200%'::text) OR (info ~~ 'USA:%200%'::text))
                                                                                 ->  Parallel Hash  (cost=597052.41..597052.41 rows=187225 width=41)
                                                                                       ->  Parallel Hash Join  (cost=525612.09..597052.41 rows=187225 width=41)
                                                                                             Hash Cond: (t.id = ci.movie_id)
                                                                                             ->  Parallel Seq Scan on title t  (cost=0.00..51808.72 rows=235421 width=21)
                                                                                                   Filter: ((production_year >= 2005) AND (production_year <= 2009))
                                                                                             ->  Parallel Hash  (cost=510231.45..510231.45 rows=837731 width=20)
                                                                                                   ->  Parallel Hash Join  (cost=21334.10..510231.45 rows=837731 width=20
)
                                                                                                         Hash Cond: (ci.person_id = an.person_id)
                                                                                                         ->  Parallel Seq Scan on cast_info ci  (cost=0.00..479269.70 row
s=345362 width=16)
                                                                                                               Filter: (note = ANY ('{(voice),"(voice: Japanese version)"
,"(voice) (uncredited)","(voice: English version)"}'::text[]))
                                                                                                         ->  Parallel Hash  (cost=15171.60..15171.60 rows=375560 width=4)
                                                                                                               ->  Parallel Seq Scan on aka_name an  (cost=0.00..15171.60
 rows=375560 width=4)
                           ->  Index Scan using company_name_pkey on company_name cn  (cost=0.42..1.20 rows=1 width=4)
                                 Index Cond: (id = mc.company_id)
                                 Filter: ((country_code)::text = '[us]'::text)

Leading hint + join methods hint: wrong results.

/*+
Leading((rt (it ((n (chn (mc (mi (t (ci an)))))) cn))))
HashJoin(ci an)
NestLoop(ci an t)
NestLoop(ci an t mi)
NestLoop(ci an t mi mc)
NestLoop(ci an t mi mc chn)
NestLoop(ci an t mi mc chn n)
NestLoop(ci an t mi mc chn n cn)
NestLoop(ci an t mi mc chn n cn it)
NestLoop(ci an t mi mc chn n cn it rt)
*/
EXPLAIN (FORMAT TEXT)
SELECT MIN(n.name) AS voicing_actress,
       MIN(t.title) AS voiced_movie
FROM aka_name AS an,
     char_name AS chn,
     cast_info AS ci,
     company_name AS cn,
     info_type AS it,
     movie_companies AS mc,
     movie_info AS mi,
     name AS n,
     role_type AS rt,
     title AS t
WHERE ci.note IN ('(voice)',
                  '(voice: Japanese version)',
                  '(voice) (uncredited)',
                  '(voice: English version)')
  AND cn.country_code ='[us]'
  AND it.info = 'release dates'
  AND mc.note IS NOT NULL
  AND (mc.note LIKE '%(USA)%'
       OR mc.note LIKE '%(worldwide)%')
  AND mi.info IS NOT NULL
  AND (mi.info LIKE 'Japan:%200%'
       OR mi.info LIKE 'USA:%200%')
  AND n.gender ='f'
  AND n.name LIKE '%Ang%'
  AND rt.role ='actress'
  AND t.production_year BETWEEN 2005 AND 2009
  AND t.id = mi.movie_id
  AND t.id = mc.movie_id
  AND t.id = ci.movie_id
  AND mc.movie_id = ci.movie_id
  AND mc.movie_id = mi.movie_id
  AND mi.movie_id = ci.movie_id
  AND cn.id = mc.company_id
  AND it.id = mi.info_type_id
  AND n.id = ci.person_id
  AND rt.id = ci.role_id
  AND n.id = an.person_id
  AND ci.person_id = an.person_id
  AND chn.id = ci.person_role_id;

Execution plan is:

 Aggregate  (cost=27678801700.75..27678801700.76 rows=1 width=64)
   ->  Nested Loop  (cost=20000031697.50..27678801700.75 rows=1 width=32)
         Join Filter: (rt.id = ci.role_id)
         ->  Seq Scan on role_type rt  (cost=0.00..18.88 rows=4 width=4)
               Filter: ((role)::text = 'actress'::text)
         ->  Materialize  (cost=20000031697.50..27678801681.81 rows=1 width=36)
               ->  Nested Loop  (cost=20000031697.50..27678801681.81 rows=1 width=36)
                     Join Filter: (it.id = mi.info_type_id)
                     ->  Seq Scan on info_type it  (cost=0.00..2.41 rows=1 width=4)
                           Filter: ((info)::text = 'release dates'::text)
                     ->  Nested Loop  (cost=20000031697.50..27678801678.66 rows=59 width=40)
                           ->  Nested Loop  (cost=20000031697.08..27678801482.37 rows=163 width=44)
                                 Join Filter: (n.id = ci.person_id)
                                 ->  Seq Scan on name n  (cost=0.00..118171.96 rows=9516 width=19)
                                       Filter: ((name ~~ '%Ang%'::text) AND ((gender)::text = 'f'::text))
                                 ->  Materialize  (cost=20000031697.08..27663195186.54 rows=71311 width=37)
                                       ->  Nested Loop  (cost=20000031697.08..27663194271.99 rows=71311 width=37)
                                             ->  Nested Loop  (cost=10000031696.65..17663060878.15 rows=145979 width=41)
                                                   Join Filter: (t.id = mc.movie_id)
                                                   ->  Seq Scan on movie_companies mc  (cost=0.00..57960.93 rows=307745 width=8)
                                                         Filter: ((note IS NOT NULL) AND ((note ~~ '%(USA)%'::text) OR (note ~~ '%(worldwide)%'::text)))
                                                   ->  Materialize  (cost=10000031696.65..15817340640.79 rows=242159 width=49)
                                                         ->  Nested Loop  (cost=10000031696.65..15817337065.00 rows=242159 width=49)
                                                               Join Filter: (t.id = mi.movie_id)
                                                               ->  Seq Scan on movie_info mi  (cost=0.00..382516.23 rows=543793 width=8)
                                                                     Filter: ((info ~~ 'Japan:%200%'::text) OR (info ~~ 'USA:%200%'::text))
                                                               ->  Materialize  (cost=10000031696.65..10003749915.92 rows=449341 width=41)
                                                                     ->  Nested Loop  (cost=10000031696.65..10003743719.21 rows=449341 width=41)
                                                                           ->  Hash Join  (cost=31696.22..852385.40 rows=2010555 width=20)
                                                                                 Hash Cond: (ci.person_id = an.person_id)
                                                                                 ->  Seq Scan on cast_info ci  (cost=0.00..796439.28 rows=828870 width=16)
                                                                                       Filter: (note = ANY ('{(voice),"(voice: Japanese version)","(voice) (uncredited)",
"(voice: English version)"}'::text[]))
                                                                                 ->  Hash  (cost=20429.43..20429.43 rows=901343 width=4)
                                                                                       ->  Seq Scan on aka_name an  (cost=0.00..20429.43 rows=901343 width=4)
                                                                           ->  Index Scan using title_pkey on title t  (cost=0.43..1.44 rows=1 width=21)
                                                                                 Index Cond: (id = ci.movie_id)
                                                                                 Filter: ((production_year >= 2005) AND (production_year <= 2009))
                                             ->  Index Only Scan using char_name_pkey on char_name chn  (cost=0.43..0.91 rows=1 width=4)
                                                   Index Cond: (id = ci.person_role_id)
                           ->  Index Scan using company_name_pkey on company_name cn  (cost=0.42..1.20 rows=1 width=4)
                                 Index Cond: (id = mc.company_id)
                                 Filter: ((country_code)::text = '[us]'::text)

Actual join order and join algorithms:

(rt (it ((n ((mc (mi ((ci an) t))) chn)) cn)))
HashJoin(ci an)
NestLoop(ci an t)
NestLoop(ci an t mi)
NestLoop(ci an t mi mc)
NestLoop(ci an t mi mc chn)
NestLoop(ci an t mi mc chn n)
NestLoop(ci an t mi mc chn n cn)

We could see that:

both table t and chn are not following the leading hint ((rt (it ((n (chn (mc (mi (t (ci an)))))) cn)))).
both table t and chn use index scan, so they are used as the inner tables.

Leading hint + join methods hint + access methods hint: work correctly!.
If we add hints SeqScan(t) SeqScan(chn) to force the access methods, the join order will become right.

/*+
SeqScan(t) SeqScan(chn)
Leading((rt (it ((n (chn (mc (mi (t (ci an)))))) cn))))
HashJoin(ci an)
NestLoop(ci an t)
NestLoop(ci an t mi)
NestLoop(ci an t mi mc)
NestLoop(ci an t mi mc chn)
NestLoop(ci an t mi mc chn n)
NestLoop(ci an t mi mc chn n cn)
NestLoop(ci an t mi mc chn n cn it)
NestLoop(ci an t mi mc chn n cn it rt)
*/
EXPLAIN (FORMAT TEXT)
SELECT MIN(n.name) AS voicing_actress,
       MIN(t.title) AS voiced_movie
FROM aka_name AS an,
     char_name AS chn,
     cast_info AS ci,
     company_name AS cn,
     info_type AS it,
     movie_companies AS mc,
     movie_info AS mi,
     name AS n,
     role_type AS rt,
     title AS t
WHERE ci.note IN ('(voice)',
                  '(voice: Japanese version)',
                  '(voice) (uncredited)',
                  '(voice: English version)')
  AND cn.country_code ='[us]'
  AND it.info = 'release dates'
  AND mc.note IS NOT NULL
  AND (mc.note LIKE '%(USA)%'
       OR mc.note LIKE '%(worldwide)%')
  AND mi.info IS NOT NULL
  AND (mi.info LIKE 'Japan:%200%'
       OR mi.info LIKE 'USA:%200%')
  AND n.gender ='f'
  AND n.name LIKE '%Ang%'
  AND rt.role ='actress'
  AND t.production_year BETWEEN 2005 AND 2009
  AND t.id = mi.movie_id
  AND t.id = mc.movie_id
  AND t.id = ci.movie_id
  AND mc.movie_id = ci.movie_id
  AND mc.movie_id = mi.movie_id
  AND mi.movie_id = ci.movie_id
  AND cn.id = mc.company_id
  AND it.id = mi.info_type_id
  AND n.id = ci.person_id
  AND rt.id = ci.role_id
  AND n.id = an.person_id
  AND ci.person_id = an.person_id
  AND chn.id = ci.person_role_id;

Execution plan is:

 Aggregate  (cost=42280242141.33..42280242141.34 rows=1 width=64)
   ->  Nested Loop  (cost=32696.64..42280242141.32 rows=1 width=32)
         Join Filter: (rt.id = ci.role_id)
         ->  Seq Scan on role_type rt  (cost=0.00..18.88 rows=4 width=4)
               Filter: ((role)::text = 'actress'::text)
         ->  Materialize  (cost=32696.64..42280242122.39 rows=1 width=36)
               ->  Nested Loop  (cost=32696.64..42280242122.39 rows=1 width=36)
                     Join Filter: (it.id = mi.info_type_id)
                     ->  Seq Scan on info_type it  (cost=0.00..2.41 rows=1 width=4)
                           Filter: ((info)::text = 'release dates'::text)
                     ->  Nested Loop  (cost=32696.64..42280242119.24 rows=59 width=40)
                           ->  Nested Loop  (cost=32696.22..42280241922.95 rows=163 width=44)
                                 Join Filter: (n.id = ci.person_id)
                                 ->  Seq Scan on name n  (cost=0.00..118171.96 rows=9516 width=19)
                                       Filter: ((name ~~ '%Ang%'::text) AND ((gender)::text = 'f'::text))
                                 ->  Materialize  (cost=32696.22..42264635627.12 rows=71311 width=37)
                                       ->  Nested Loop  (cost=32696.22..42264634712.57 rows=71311 width=37)
                                             Join Filter: (chn.id = ci.person_role_id)
                                             ->  Seq Scan on char_name chn  (cost=0.00..67850.88 rows=3140288 width=4)
                                             ->  Materialize  (cost=32696.22..31356197189.35 rows=145979 width=41)
                                                   ->  Nested Loop  (cost=32696.22..31356195175.46 rows=145979 width=41)
                                                         Join Filter: (t.id = mc.movie_id)
                                                         ->  Seq Scan on movie_companies mc  (cost=0.00..57960.93 rows=307745 width=8)
                                                               Filter: ((note IS NOT NULL) AND ((note ~~ '%(USA)%'::text) OR (note ~~ '%(worldwide)%'::text)))
                                                         ->  Materialize  (cost=32696.22..29510474938.10 rows=242159 width=49)
                                                               ->  Nested Loop  (cost=32696.22..29510471362.30 rows=242159 width=49)
                                                                     Join Filter: (t.id = mi.movie_id)
                                                                     ->  Seq Scan on movie_info mi  (cost=0.00..382516.23 rows=543793 width=8)
                                                                           Filter: ((info ~~ 'Japan:%200%'::text) OR (info ~~ 'USA:%200%'::text))
                                                                     ->  Materialize  (cost=32696.22..23696884213.22 rows=449341 width=41)
                                                                           ->  Nested Loop  (cost=32696.22..23696878016.52 rows=449341 width=41)
                                                                                 Join Filter: (t.id = ci.movie_id)
                                                                                 ->  Seq Scan on title t  (cost=0.00..73929.74 rows=565011 width=21)
                                                                                       Filter: ((production_year >= 2005) AND (production_year <= 2009))
                                                                                 ->  Materialize  (cost=32696.22..640936.59 rows=2010555 width=20)
                                                                                       ->  Hash Join  (cost=32696.22..619102.82 rows=2010555 width=20)
                                                                                             Hash Cond: (ci.person_id = an.person_id)
                                                                                             ->  Gather  (cost=1000.00..563156.70 rows=828870 width=16)
                                                                                                   Workers Planned: 2
                                                                                                   ->  Parallel Seq Scan on cast_info ci  (cost=0.00..479269.70 rows=3453
62 width=16)
                                                                                                         Filter: (note = ANY ('{(voice),"(voice: Japanese version)","(voi
ce) (uncredited)","(voice: English version)"}'::text[]))
                                                                                             ->  Hash  (cost=20429.43..20429.43 rows=901343 width=4)
                                                                                                   ->  Seq Scan on aka_name an  (cost=0.00..20429.43 rows=901343 width=4)
                           ->  Index Scan using company_name_pkey on company_name cn  (cost=0.42..1.20 rows=1 width=4)
                                 Index Cond: (id = mc.company_id)
                                 Filter: ((country_code)::text = '[us]'::text)

Based on the above results, we can get the following conclusions:

If we only use the leading hint, the optimizer will choose the appropriate join methods to follow the specified join order.
If we use both leading hint and join methods hint, the join order will not strictly follow the join order hint in some cases. For example, in our example the optimizer choose to use the index nested loop join rather than the normal nested loop join. So it put t and chn as the inner table in the nest loop join.
Use only leading hint means the optimizer has some freedom to choose the join methods and access methods.
The pg_hint_plan has bugs that optimizer's decisions cannot be totally forced and controlled by hints when some hints combinations are used. This is a bug of the hint and need to be fixed. There is a similar issue on github about this problem.

Recommendation: fix this bug by completely forcing the optimizer following the hints such as a combination hints with join order and join algorithms!

The text was updated successfully, but these errors were encountered:

zk9907 · 2024-07-05T05:41:58Z

I have the same question. I also wonder why the planning time is so long, even though I have restricted the join order and join method with pg_hint_plan.

michaelpq · 2024-08-20T04:09:00Z

Could you produce a self-contained test case that could be used for investigation, please? I am not familar with the JOB benchmark nor any of its queries, hence it would be a serious time gain for me.

zk9907 · 2024-09-20T04:56:00Z

Could you produce a self-contained test case that could be used for investigation, please? I am not familar with the JOB benchmark nor any of its queries, hence it would be a serious time gain for me.

You can follow these steps to set up the JOB benchmark

1. Download imdb data:

wget http://homepages.cwi.nl/~boncz/job/imdb.tgz
tar -zxvf imdb.tar.gz

in postgresql

CREATE DATABASE imdb;
\c imdb
\i /PATH/TO/imdb/schematext.sql

\copy aka_name from '~/Downloads/imdb_data/aka_name.csv' csv delimiter ',' quote '"' escape '\';
\copy aka_title from '~/Downloads/imdb_data/aka_title.csv' csv delimiter ',' quote '"' escape '\';
\copy cast_info from '~/Downloads/imdb_data/cast_info.csv' csv delimiter ',' quote '"' escape '\';
\copy char_name from '~/Downloads/imdb_data/char_name.csv' csv delimiter ',' quote '"' escape '\';
\copy comp_cast_type from '~/Downloads/imdb_data/comp_cast_type.csv' csv delimiter ',' quote '"' escape '\';
\copy company_name from '~/Downloads/imdb_data/company_name.csv' csv delimiter ',' quote '"' escape '\';
\copy company_type from '~/Downloads/imdb_data/company_type.csv' csv delimiter ',' quote '"' escape '\';
\copy complete_cast from '~/Downloads/imdb_data/complete_cast.csv' csv delimiter ',' quote '"' escape '\';
\copy info_type from '~/Downloads/imdb_data/info_type.csv' csv delimiter ',' quote '"' escape '\';
\copy keyword from '~/Downloads/imdb_data/keyword.csv' csv delimiter ',' quote '"' escape '\';
\copy kind_type from '~/Downloads/imdb_data/kind_type.csv' csv delimiter ',' quote '"' escape '\';
\copy link_type from '~/Downloads/imdb_data/link_type.csv' csv delimiter ',' quote '"' escape '\';
\copy movie_companies from '~/Downloads/imdb_data/movie_companies.csv' csv delimiter ',' quote '"' escape '\';
\copy movie_info from '~/Downloads/imdb_data/movie_info.csv' csv delimiter ',' quote '"' escape '\';
\copy movie_info_idx from '~/Downloads/imdb_data/movie_info_idx.csv' csv delimiter ',' quote '"' escape '\';
\copy movie_keyword from '~/Downloads/imdb_data/movie_keyword.csv' csv delimiter ',' quote '"' escape '\';
\copy movie_link from '~/Downloads/imdb_data/movie_link.csv' csv delimiter ',' quote '"' escape '\';
\copy name from '~/Downloads/imdb_data/name.csv' csv delimiter ',' quote '"' escape '\';
\copy person_info from '~/Downloads/imdb_data/person_info.csv' csv delimiter ',' quote '"' escape '\';
\copy role_type from '~/Downloads/imdb_data/role_type.csv' csv delimiter ',' quote '"' escape '\';
\copy title from '~/Downloads/imdb_data/title.csv' csv delimiter ',' quote '"' escape '\';

\i /PATH/TO/imdb/fkindexes.sql

Download JOB Benchmark:

git clone https://github.com/gregrahn/join-order-benchmark

yamatattsu · 2024-09-26T00:24:01Z

Hi Micahel-san, Regarding to How to set up JOB, I hope this info helps you: https://github.com/yamatattsu/pg_plan_advsr/blob/master/JOB/how_to_setup.md

…

On Fri, Sep 20, 2024 at 1:56 PM KaiZhong ***@***.***> wrote: Could you produce a self-contained test case that could be used for investigation, please? I am not familar with the JOB benchmark nor any of its queries, hence it would be a serious time gain for me. You can follow these steps to set up the JOB benchmark download JOB benchmark (VLDB version) 1. Download imdb data: wget http://homepages.cwi.nl/~boncz/job/imdb.tgz tar -zxvf imdb.tar.gz in postgresql CREATE DATABASE imdb; \c imdb \i /PATH/TO/imdb/schematext.sql \copy aka_name from '/Downloads/imdb_data/aka_name.csv' csv delimiter ',' quote '"' escape ''; \copy aka_title from '/Downloads/imdb_data/aka_title.csv' csv delimiter ',' quote '"' escape ''; \copy cast_info from '/Downloads/imdb_data/cast_info.csv' csv delimiter ',' quote '"' escape ''; \copy char_name from '/Downloads/imdb_data/char_name.csv' csv delimiter ',' quote '"' escape ''; \copy comp_cast_type from '/Downloads/imdb_data/comp_cast_type.csv' csv delimiter ',' quote '"' escape ''; \copy company_name from '/Downloads/imdb_data/company_name.csv' csv delimiter ',' quote '"' escape ''; \copy company_type from '/Downloads/imdb_data/company_type.csv' csv delimiter ',' quote '"' escape ''; \copy complete_cast from '/Downloads/imdb_data/complete_cast.csv' csv delimiter ',' quote '"' escape ''; \copy info_type from '/Downloads/imdb_data/info_type.csv' csv delimiter ',' quote '"' escape ''; \copy keyword from '/Downloads/imdb_data/keyword.csv' csv delimiter ',' quote '"' escape ''; \copy kind_type from '/Downloads/imdb_data/kind_type.csv' csv delimiter ',' quote '"' escape ''; \copy link_type from '/Downloads/imdb_data/link_type.csv' csv delimiter ',' quote '"' escape ''; \copy movie_companies from '/Downloads/imdb_data/movie_companies.csv' csv delimiter ',' quote '"' escape ''; \copy movie_info from '/Downloads/imdb_data/movie_info.csv' csv delimiter ',' quote '"' escape ''; \copy movie_info_idx from '/Downloads/imdb_data/movie_info_idx.csv' csv delimiter ',' quote '"' escape ''; \copy movie_keyword from '/Downloads/imdb_data/movie_keyword.csv' csv delimiter ',' quote '"' escape ''; \copy movie_link from '/Downloads/imdb_data/movie_link.csv' csv delimiter ',' quote '"' escape ''; \copy name from '/Downloads/imdb_data/name.csv' csv delimiter ',' quote '"' escape ''; \copy person_info from '/Downloads/imdb_data/person_info.csv' csv delimiter ',' quote '"' escape ''; \copy role_type from '/Downloads/imdb_data/role_type.csv' csv delimiter ',' quote '"' escape ''; \copy title from '~/Downloads/imdb_data/title.csv' csv delimiter ',' quote '"' escape ''; \i /PATH/TO/imdb/fkindexes.sql 2. Download JOB Benchmark: Git clone https://github.com/gregrahn/join-order-benchmark — Reply to this email directly, view it on GitHub <#195 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADSHQGYUMSHQ4B6P65VUWPLZXOTHTAVCNFSM6AAAAABJ43O25CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRSHAZDENJSGU> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

HennyNile · 2024-10-28T07:59:46Z

Hi, all. I am working on this issue. I found that the join order inconsistent with the input hints is caused by the implementation of PG and is not a bug of pg_hint_plan.

In particular, PG disables certain operators by adding a disable_cost(1.0e10) to its estimated cost; then, the planner would not select this operator due to its large cost. To assign a join order, pg_hint_plan disables all join operators when considering inconsistent join orders (by adding a disable cost to the cost of join operators). This is how pg_hint_plan works.

In this issue, the bug in the second example of @ZhengtongYan is it generates sub-join order ((ci an) t) for assigned sub-join order (t (ci an)) and ((mc (mi ((ci an) t))) chn) for (chn (mc (mi ((ci an) t)))). This is because PG estimates the cost of operators in two phases. Moreover, PG will filter some obviously inferior based on the estimated cost in the first phase. However, PG does not include disable_cost for disabled operators and only considers it in the second phase. While (t (ci an)) will utilize a normal nest loop join and access relation t with a seq scan, ((ci an) t) will utilize an index-based nest loop join and access relation t with an index scan, which is obviously faster. Thus, (t (ci an)) will be filtered after the first phase of cost estimation. Same reason to (chn (mc (mi ((ci an) t)))).

The reason why the third example works is that forces PG to select seqscan for relation t and chn, thus PG would not consider index-based nest loop joins for them. I have addressed the problem in the second example by considering disable_cost for the disabled operators in the first phase of cost estimation.

I want to create a pull request to Postgres and make PG consider disable_cost for the disabled operators in the first phase of cost estimation. Would you like to help me? @michaelpq

michaelpq · 2024-10-29T02:57:01Z

I want to create a pull request to Postgres and make PG consider disable_cost for the disabled operators in the first phase of cost estimation. Would you like to help me? @michaelpq

Contributions to PostgreSQL are not accepted through github. Upstream community has its own process, see:
https://wiki.postgresql.org/wiki/Submitting_a_Patch

HennyNile · 2024-10-29T12:52:45Z

I want to create a pull request to Postgres and make PG consider disable_cost for the disabled operators in the first phase of cost estimation. Would you like to help me? @michaelpq

Contributions to PostgreSQL are not accepted through github. Upstream community has its own process, see: https://wiki.postgresql.org/wiki/Submitting_a_Patch

I see. Thanks for the reminder.

HennyNile · 2024-11-01T14:25:44Z

@michaelpq Hi, I found a way to solve this problem we could just write a wrapper for the function of cost estimation of PostgreSQL. How do you like this idea?

HennyNile · 2024-11-12T11:15:16Z

Hi, all! I have created a pull-request in #207.

ZhengtongYan changed the title ~~Leading hint + join methods hint cannot totally force the join order.~~ leading hint + join methods hint cannot totally force the join order. Jun 26, 2024

michaelpq added help wanted bug labels Aug 20, 2024

michaelpq mentioned this issue Aug 21, 2024

Next release of pg_hint_plan with PostgreSQL 17 compatibility #190

Closed

HennyNile mentioned this issue Nov 12, 2024

Fix issue 195 "leading hint + join methods hint cannot totally force the join order" #207

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

leading hint + join methods hint cannot totally force the join order. #195

leading hint + join methods hint cannot totally force the join order. #195

ZhengtongYan commented Jun 26, 2024

zk9907 commented Jul 5, 2024

michaelpq commented Aug 20, 2024

zk9907 commented Sep 20, 2024 •

edited

Loading

yamatattsu commented Sep 26, 2024 via email

HennyNile commented Oct 28, 2024 •

edited

Loading

michaelpq commented Oct 29, 2024

HennyNile commented Oct 29, 2024

HennyNile commented Nov 1, 2024

HennyNile commented Nov 12, 2024

leading hint + join methods hint cannot totally force the join order. #195

leading hint + join methods hint cannot totally force the join order. #195

Comments

ZhengtongYan commented Jun 26, 2024

zk9907 commented Jul 5, 2024

michaelpq commented Aug 20, 2024

zk9907 commented Sep 20, 2024 • edited Loading

1. Download imdb data:

in postgresql

Download JOB Benchmark:

yamatattsu commented Sep 26, 2024 via email

HennyNile commented Oct 28, 2024 • edited Loading

michaelpq commented Oct 29, 2024

HennyNile commented Oct 29, 2024

HennyNile commented Nov 1, 2024

HennyNile commented Nov 12, 2024

zk9907 commented Sep 20, 2024 •

edited

Loading

HennyNile commented Oct 28, 2024 •

edited

Loading