pg_plan_advsr

pg_plan_advsr is a PostgreSQL extension that provides Automated execution plan tuning using feedback loop. This extension might help you if you have an analytic query which has many joins and aggregates and you'd like to get an efficient plan to reduce execution time. This extension is intended to use in a plan tuning phase at the end of system development.

Note: This extension is intended to be used in a validation environment, not a commercial environment.
master branch for PostgreSQL 12 or above

1 Cookbook

This is a simple example of how to use pg_plan_advsr for auto execution plan tuning. More detailed information will be provided in the sections Options and Usage.

Try running regression tests inside the sql directory: pg_plan_advsr/sql. First, create a test table, etc. to execute init.sql. After that, run base.sql and observe the results of automatic plan tuning.

As shown below, you can see that a Nested Loop join during the first run has changed to a Hash join after tuning. The error in the estimated number of rows should be zero, and the query execution time was successfully reduced.

Note that base.sql will clear these tables and view such as hint_plan.hints, plan_repo.norm_queries, plan_repo.raw_queries, plan_repo.plan_history, and pg_store_plans view. So, please take care if you use it.

psql
\i init.sql
\i base.sql

Before tuning (first run)

                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=295.29..515.35 rows=1 width=16) (actual time=24.591..164.421 rows=9991 loops=1)
   ->  Hash Join  (cost=295.00..515.01 rows=1 width=16) (actual time=24.500..46.584 rows=10000 loops=1)
         Hash Cond: ((a.c1 = b.c1) AND (a.c2 = b.c2))
         ->  Seq Scan on table_a a  (cost=0.00..145.00 rows=10000 width=8) (actual time=0.032..3.899 rows=10000 loops=1)
         ->  Hash  (cost=145.00..145.00 rows=10000 width=8) (actual time=24.359..24.359 rows=10000 loops=1)
               Buckets: 16384  Batches: 1  Memory Usage: 519kB
               ->  Seq Scan on table_b b  (cost=0.00..145.00 rows=10000 width=8) (actual time=0.026..5.771 rows=10000 loops=1)
   ->  Index Scan using ind_c_c2 on table_c c  (cost=0.29..0.33 rows=1 width=8) (actual time=0.009..0.010 rows=1 loops=10000)
         Index Cond: ((c2 = a.c2) AND (c2 >= 10))
         Filter: ((c1 > 1) AND (c1 = a.c1))
 Planning Time: 3.952 ms
 Execution Time: 168.867 ms
(12 rows)

After tuning (final run)

                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=590.00..934.88 rows=9991 width=16) (actual time=37.030..74.904 rows=9991 loops=1)
   Hash Cond: ((a.c1 = b.c1) AND (a.c2 = b.c2))
   ->  Hash Join  (cost=295.00..564.93 rows=9991 width=16) (actual time=17.751..40.944 rows=9991 loops=1)
         Hash Cond: ((c.c1 = a.c1) AND (c.c2 = a.c2))
         ->  Seq Scan on table_c c  (cost=0.00..195.00 rows=9990 width=8) (actual time=0.060..7.318 rows=9991 loops=1)
               Filter: ((c1 > 1) AND (c2 >= 10))
               Rows Removed by Filter: 9
         ->  Hash  (cost=145.00..145.00 rows=10000 width=8) (actual time=17.658..17.660 rows=10000 loops=1)
               Buckets: 16384  Batches: 1  Memory Usage: 519kB
               ->  Seq Scan on table_a a  (cost=0.00..145.00 rows=10000 width=8) (actual time=0.019..4.141 rows=10000 loops=1)
   ->  Hash  (cost=145.00..145.00 rows=10000 width=8) (actual time=19.249..19.250 rows=10000 loops=1)
         Buckets: 16384  Batches: 1  Memory Usage: 519kB
         ->  Seq Scan on table_b b  (cost=0.00..145.00 rows=10000 width=8) (actual time=0.022..4.799 rows=10000 loops=1)
 Planning Time: 3.961 ms
 Execution Time: 78.294 ms
(15 rows)

See: Usage for more details.

2 Objects created by the extension

Functions

FUNCTION pg_plan_advsr_enable_feedback() RETURNS void
FUNCTION pg_plan_advsr_disable_feedback() RETURNS void
FUNCTION plan_repo.get_hint(bigint) RETURNS text
- If you give a pgsp_planid as an argument, it will return the hints to reproduce the plan based on pgsp_planid
FUNCTION plan_repo.get_extstat(bigint) RETURNS text
- If you give a queryid as an argument, it will return the syntax for generating extended statistics. This function supports PG14 or above since it uses compute_query_id.

Tables

plan_repo.plan_history
plan_repo.norm_queries
plan_repo.raw_queries

Table "plan_repo.plan_history"

      Column         |            Type             | Description
---------------------+-----------------------------+-------------------------------------------------------------------------------
 id                  | integer                     | Sequence as a primary key: nextval('plan_repo.plan_history_id_seq'::regclass)
 norm_query_hash     | text                        | MD5 based on normalized query text
 pgsp_queryid        | bigint                      | Queryid of pg_store_plans
 pgsp_planid         | bigint                      | Planid of pg_sotre_plans
 execution_time      | numeric                     | Execution time (ms) of this planid
 rows_hint           | text                        | Rows hint of this plan
 scan_hint           | text                        | Scan hint of this plan
 join_hint           | text                        | Join hint of this plan
 lead_hint           | text                        | Leading hint of this plan
 scan_rows_err       | numeric                     | Sum of estimation row error of scans
 scan_err_ratio      | numeric                     | Maximum estimation row error ratio of scans
 join_rows_err       | numeric                     | Sum of estimation row error of joins
 join_err_ratio      | numeric                     | Maximum estimation row error ratio of joins
 scan_cnt            | integer                     | Number of scan nodes in this plan
 join_cnt            | integer                     | Number of Join nodes in this plan
 application_name    | text                        | Application name of client tool such as "psql"
 timestamp           | timestamp without time zone | Timestamp of this record inserted

Table "plan_repo.norm_queries"

      Column       |           Type             | Description
-------------------+----------------------------+-----------------------------------
 norm_query_hash   | text                       | MD5 based on normalized query text
 norm_query_string | text                       | Normalized query text

Table "plan_repo.raw_queries"

      Column      |            Type             | Description
------------------+-----------------------------+----------------------------------------------------------------------------------------
 norm_query_hash  | text                        | MD5 based on normalized query text
 raw_query_id     | integer                     | Sequence of raw query text: nextval('plan_repo.raw_queries_raw_query_id_seq'::regclass)
 raw_query_string | text                        | Raw query text (not normalized)
 timestamp        | timestamp without time zone | Timestamp of this record inserted

Views

plan_repo.plan_history_pretty

Columns are same as plan_history table, but number of decimal places are reduced for readability

3 Options

pg_plan_advsr.enabled

"ON": Enable pg_plan_advsr. It allows creating various hints for fixing row estimation errors and also for reproducing a plan. It also stores them in the plan_history table. If you want to use "auto plan tuning using feedback loop", you have to execute below function "pg_plan_advsr_enable_feedback(). Default setting is "ON".
pg_plan_advsr.quieted

"ON": Enable quiet mode. It allows to disable emmiting the following messages when your EXPLAIN ANALYZE commmand finished.
```
  pgsp_queryid
  pgsp_planid
  Execution time
  Hints for current plan
  Rows hint (feedback info)
  and so on
```
Default setting is "OFF".
pg_plan_advsr.widely

"ON": Enable creating hints even if EXPLAIN command without ANALYZE option. It allows creating various hints needed for reproducing a plan, but it doesn't create hints for fixing row estimation errors because there is no information of Actual rows. It also stores them in the plan_history table. If you want to get hints to reproduce a plan, this option helps you. Default setting is "OFF".
pg_plan_advsr_enable_feedback()

This function allows you to use feedback loop for plan tuning. Actually, it is a wrapper for these commands:
```
  set pg_plan_advsr.enabled to on;
  set pg_hint_plan.enable_hint_table to on;
  set pg_hint_plan.debug_print to on;
```
pg_plan_advsr_disable_feedback()

This function disables using feedback loop for plan tuning. It is a wrapper for these commands:
```
  set pg_plan_advsr.enabled to on;
  set pg_hint_plan.enable_hint_table to off;
  set pg_hint_plan.debug_print to off;
```

4 Usage

TBA

There are four types of usage:

Displaying Execution Plan Characteristics
Automatic Execution Plan Tuning
Automatic Hint Clause Generation
Extended Statistics Suggestion

Details on how to use each are shown below.

Displaying Execution Plan Characteristics

First, Make sure pg_plan_advsr.enabled to on. Then, Execute EXPLAIN ANALYZE command (which is your query). You can get the result with the DESCRIBE section appended, and you can see the number of joins, scans, errors in row count estimation, and so on.

e.g.

  explain (analyze, verbose) select * from t where a = 1 and b = 1;
  
  		                                               QUERY PLAN
  --------------------------------------------------------------------------------------------------------
   Seq Scan on public.t  (cost=0.00..195.00 rows=100 width=8) (actual time=0.052..6.269 rows=100 loops=1)
     Output: a, b
     Filter: ((t.a = 1) AND (t.b = 1))
     Rows Removed by Filter: 9900
   Query Identifier: -3455024416178978571
   Planning Time: 0.930 ms
   Execution Time: 9.136 ms

   DESCRIBE
   ------------------------
   application:    psql
   pgsp_queryid:   -3455024416178978571
   pgsp_planid:    3455841613
   join_cnt:       0
   join_rows_err:  0
   join_err_ratio: 0.00
   scan_cnt:       1
   scan_rows_err:  0
   scan_err_ratio: 0.00
   lead hint:      LEADING( t  )
   join hint:
   scan hint:      SEQSCAN(t)
   rows hint:
  (23 rows)

For auto plan tuning

First, Run select pg_plan_advsr_enable_feedback();. Then, Execute EXPLAIN ANALYZE command (which is your query) repeatedly until row estimation errors had vanished. Finally, You can check a result of the tuning by using the below queries:
```
select pgsp_queryid, pgsp_planid, execution_time, scan_hint, join_hint, lead_hint from plan_repo.plan_history order by id;

select queryid, planid, plan from pg_store_plans where queryid='your pgsp_queryid in plan_history' order by first_call;
```
See shell script file as an example: JOB/auto_tune_31c.sh

Demo of auto tuning (3x speed)

If you'd like to reproduce the execution plans on other environments, you'd be better to read the other.

Note:
- A plan may temporarily worse than an initial plan during auto tuning phase.
- Use stable data for auto plan tuning. This extension doesn't get converged plan (the ideal plan for the data) if it was updating concurrently.

For getting hints of current query

First, Run select pg_plan_advsr_disable_feedback();. Then, Execute EXPLAIN ANALYZE command (which is your query). Finally, You can get hints by using the below queries:

select pgsp_queryid, pgsp_planid, execution_time, scan_hint, join_hint, lead_hint from plan_repo.plan_history order by id;

e.g.

 pgsp_queryid | pgsp_planid | execution_time |                scan_hint                |     join_hint      |        lead_hint
--------------+-------------+----------------+-----------------------------------------+--------------------+-------------------------
   4173287301 |  3707748199 |        265.179 | SEQSCAN(t2) SEQSCAN(x) INDEXSCAN(t1)    | HASHJOIN(t2 t1 x) +| LEADING( (t2 (x t1 )) )
              |             |                |                                         | NESTLOOP(t1 x)     |
   4173287301 |  1101439786 |          2.149 | SEQSCAN(x) INDEXSCAN(t1) INDEXSCAN(t2)  | NESTLOOP(t2 t1 x) +| LEADING( ((x t1 )t2 ) )
              |             |                |                                         | NESTLOOP(t1 x)     |

# \a
Output format is unaligned.
# \t
Tuples only is on.
  
select plan_repo.get_hint(1101439786);
  
/*+
LEADING( ((x t1 )t2 ) )
NESTLOOP(t2 t1 x)
NESTLOOP(t1 x)
SEQSCAN(x) INDEXSCAN(t1) INDEXSCAN(t2)
*/
--1101439786

You can use the hints to reproduce the execution plan anywhere. It also can be used to modify the execution plan by changing the hints manually.

For getting extended statistics suggestion

This feature is enabled when you use PG14 or above with pg_qualstats. First, Make sure pg_plan_advsr.enabled to on. Then, Execute EXPLAIN ANALYZE command (which is your query). Finally, You can get extended statistics suggestion by using the below queries:
```
select * from plan_repo.get_extstat(queryid);
```
e.g.
```
  # select * from plan_repo.get_extstat(-3455024416178978571);
                   suggest
  ------------------------------------------
   CREATE STATISTICS ON a, b FROM public.t;
  (1 row)
```

5 Installation Requirements

pg_plan_advsr uses pg_hint_plan and pg_store_plans cooperatively.

PostgreSQL 12 or above
pg_hint_plan
pg_store_plans
pg_qualstats
- if you'd like to use Extended statistic suggestion feature on PG14 or above
RHEL/CentOS/Rocky = 7.x or above

6 Installation

TBA

There are two methods to install the extension: Using building pg_plan_advsr manually.

Build and install (make && make install)

Prerequisite for installation
- Install postgresql-devel package if you installed PostgreSQL by rpm files
- Set the PATH environment variable to pg_config of your PostgreSQL

Operations

git clone extensions

-- Required
git clone https://github.com/ossc-db/pg_hint_plan.git
git clone https://github.com/ossc-db/pg_store_plans.git
git clone https://github.com/ossc-db/pg_plan_advsr.git

-- Optional: if you use Extended statistic suggestion feature on PG14 or above
git clone https://github.com/powa-team/pg_qualstats.git

git checkout

Set the appropriate version of PostgreSQL for the VERSION variable. For example, If you use PG12, see below:

export VERSION=12
cd pg_hint_plan
git checkout -b PG${VERSION} origin/PG${VERSION} && git checkout $(git describe --tag)
cd ../pg_store_plans
git checkout $(git describe --tag)

build and install

-- Required
cd ../pg_hint_plan
make -s && make -s install
cp pg_stat_statements.c ../pg_plan_advsr/
cp normalize_query.h ../pg_plan_advsr/

cd ../pg_store_plans
make -s USE_PGXS=1 all install
cp pgsp_json*.[ch] ../pg_plan_advsr/

cd ../pg_plan_advsr
git describe --alway
make
make install

-- Optional
cd ../pg_qualstats
make
make install

edit PostgreSQL.conf

vi $PGDATA/postgresql.conf

---- Add these lines -----------------------------------------------------
-- Required
shared_preload_libraries = 'pg_hint_plan, pg_plan_advsr, pg_store_plans'
max_parallel_workers_per_gather = 0
max_parallel_workers = 0
compute_query_id = on

or

-- Optional
shared_preload_libraries = 'pg_hint_plan, pg_plan_advsr, pg_store_plans, pg_qualstats'
max_parallel_workers_per_gather = 0
max_parallel_workers = 0
compute_query_id = on
pg_qualstats.resolve_oids = true
pg_qualstats.sample_rate = 1
-----------------------------------------------------------------------------------

---- Consider tweak these numbers -------------------------------------------------
-- Use a large value than join numbers of your query
geqo_threshold = 12 -> 20
from_collapse_limit = 8 -> 20
join_collapse_limit = 8 -> 20

-- Optional
random_page_cost = 4 -> 1
-----------------------------------------------------------------------------------

run create extension commands on psql

pg_ctl start
psql

-- Required
create extension pg_hint_plan;
create extension pg_store_plans;
create extension pg_plan_advsr;

-- Optional
create extension pg_qualstats;

You can try this extension with Join Order Benchmark as a example. See: how_to_setup.md in JOB directory

Dockerfile (experimental)

Operations
```
  \# cd pg_plan_advsr/docker
  \# ./build.sh
```
See: build.sh and Dockerfile

7 Internals

TBA

These presentation materials are useful to know concepts and its architecture, and these show a benchmark result by using Join order benchmark:

8 Limitations

Not supported

Handle InitPlans and SubPlans
Handle Append and MergeAppend
Fix bese-relation's estimated row error (This is pg_hint_plan's limitation)
Concurrent execution
Extended Statistics Suggestion for Grouping columuns and Expressions
Extended Statistics Suggestion on PG13 or below

Not tested

Parallel query
Partitioned Table
JIT

See: TODO file

pg_plan_advsr uses pg_hint_plan and pg_store_plans, it would be better to check these document to know their limitations.

9 Support

If you want to report a problem with pg_plan_advsr, please include the following information because we will analyze it by reproducing your problem:

Versions
- PostgreSQL
- pg_hint_plan
- pg_store_plans
Query
DDL
- CREATE TABLE
- CREATE INDEX
Data (If possible)

If you have a problem or question or any kind of feedback, the preferred option is to open an issue on GitHub: https://github.com/ossc-db/pg_plan_advsr/issues This requires a GitHub account. Of course, any Pull request welcome!

10 Author

Tatsuro Yamada (yamatattsu at gmail dot com)

11 Acknowledgments

The following individuals (in alphabetical order) have contributed to pg_plan_advsr as patch authors, reviewers, testers, advisers, or reporters of issues. Thanks a lot!

Amit Langote
David Pitts
Etsuro Fujita
Hironobu Suzuki
Julien Rouhaud
Kaname Furutani
Kyotaro Horiguchi
Laurenz Albe
Nuko Yokohama
Sam Xu

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
.github/workflows		.github/workflows
JOB		JOB
docker		docker
expected		expected
sql		sql
Makefile		Makefile
README.md		README.md
TODO.md		TODO.md
pg_plan_advsr--0.0.sql		pg_plan_advsr--0.0.sql
pg_plan_advsr--0.1.sql		pg_plan_advsr--0.1.sql
pg_plan_advsr.c		pg_plan_advsr.c
pg_plan_advsr.control		pg_plan_advsr.control

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pg_plan_advsr

Contents

1 Cookbook

Before tuning (first run)

After tuning (final run)

2 Objects created by the extension

Functions

Tables

Views

3 Options

4 Usage

5 Installation Requirements

6 Installation

Set the appropriate version of PostgreSQL for the VERSION variable. For example, If you use PG12, see below:

7 Internals

8 Limitations

Not supported

Not tested

9 Support

10 Author

11 Acknowledgments

About

Releases

Packages

Languages

yamatattsu/pg_plan_advsr

Folders and files

Latest commit

History

Repository files navigation

pg_plan_advsr

Contents

1 Cookbook

Before tuning (first run)

After tuning (final run)

2 Objects created by the extension

Functions

Tables

Views

3 Options

4 Usage

5 Installation Requirements

6 Installation

Set the appropriate version of PostgreSQL for the VERSION variable. For example, If you use PG12, see below:

7 Internals

8 Limitations

Not supported

Not tested

9 Support

10 Author

11 Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages