Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pg_repack is not terminating conflicting queries when repacking only indexes #310

Open
markosutic opened this issue May 24, 2022 · 5 comments

Comments

@markosutic
Copy link

markosutic commented May 24, 2022

Hello,

pg_repack is not terminating conflicting queries when repacking indexes only - wait-timeout is specified.

$ /usr/pgsql-11/bin/pg_repack --version
pg_repack 1.4.7
\d t1
                                     Table "public.t1"
   Column   |         Type          | Collation | Nullable |            Default
------------+-----------------------+-----------+----------+--------------------------------
 id         | integer               |           | not null | nextval('t1_id_seq'::regclass)
 code       | character varying(10) |           | not null |
 article    | text                  |           |          |
 name       | text                  |           | not null |
 department | character varying(4)  |           | not null |
Indexes:
    "t1_code_department_key" UNIQUE CONSTRAINT, btree (code, department)
    "t1_id_key" UNIQUE CONSTRAINT, btree (id)
    "code_key" btree (code)
    "name_article_key" btree (name, article)

SESSION 1:

postgres=# begin;
BEGIN
postgres=# update t1 set name='Test' where id=5;
UPDATE 1

SESSION 2:

$ /usr/pgsql-11/bin/pg_repack --table=t1 --dbname=postgres --only-indexes --no-order --wait-timeout=5 --echo

WARNING: option -n (--no-order) has no effect while repacking indexes
LOG: (query) SET search_path TO pg_catalog, pg_temp, public
LOG: (query) SET search_path TO pg_catalog, pg_temp, public
LOG: (query) select repack.version(), repack.version_sql()
LOG: (query) SET statement_timeout = 0
LOG: (query) SET search_path = pg_catalog, pg_temp, public
LOG: (query) SET client_min_messages = warning
LOG: (query) SELECT r FROM (VALUES ($1)) AS given_t(r) WHERE NOT EXISTS(  SELECT FROM repack.tables WHERE relid=to_regclass(given_t.r) )
LOG:    (param:0) = t1
LOG: (query) SELECT repack.oid2text(i.oid), idx.indexrelid, idx.indisvalid, idx.indrelid, $1::text, n.nspname FROM pg_index idx JOIN pg_class i ON i.oid = idx.indexrelid JOIN pg_namespace n ON n.oid = i.relnamespace WHERE idx.indrelid = $1::regclass ORDER BY indisvalid DESC, i.relname, n.nspname
LOG:    (param:0) = t1
INFO: repacking indexes of "t1"
LOG: (query) SELECT pg_try_advisory_lock($1, CAST(-2147483648 + $2::bigint AS integer))
LOG:    (param:0) = 16185446
LOG:    (param:1) = 21486
INFO: repacking index "public.code_key"
LOG: (query) SELECT pgc.relname, nsp.nspname FROM pg_class pgc INNER JOIN pg_namespace nsp ON nsp.oid = pgc.relnamespace WHERE pgc.relname = 'index_21525' AND nsp.nspname = $1
LOG:    (param:0) = public
LOG: (query) SELECT repack.repack_indexdef($1, $2, $3, true)
LOG:    (param:0) = 21525
LOG:    (param:1) = 21486
LOG:    (param:2) = (null)
LOG: (query) CREATE INDEX CONCURRENTLY index_21525 ON public.t1 USING btree (code) TABLESPACE pg_default
WARNING: Error creating index "public"."index_21525": ERROR:  canceling statement due to lock timeout
INFO: repacking index "public.name_article_key"
LOG: (query) SELECT pgc.relname, nsp.nspname FROM pg_class pgc INNER JOIN pg_namespace nsp ON nsp.oid = pgc.relnamespace WHERE pgc.relname = 'index_21526' AND nsp.nspname = $1
LOG:    (param:0) = public
LOG: (query) SELECT repack.repack_indexdef($1, $2, $3, true)
LOG:    (param:0) = 21526
LOG:    (param:1) = 21486
LOG:    (param:2) = (null)
LOG: (query) CREATE INDEX CONCURRENTLY index_21526 ON public.t1 USING btree (name, article) TABLESPACE pg_default
WARNING: Error creating index "public"."index_21526": ERROR:  canceling statement due to lock timeout
INFO: repacking index "public.t1_code_department_key"
LOG: (query) SELECT pgc.relname, nsp.nspname FROM pg_class pgc INNER JOIN pg_namespace nsp ON nsp.oid = pgc.relnamespace WHERE pgc.relname = 'index_21495' AND nsp.nspname = $1
LOG:    (param:0) = public
LOG: (query) SELECT repack.repack_indexdef($1, $2, $3, true)
LOG:    (param:0) = 21495
LOG:    (param:1) = 21486
LOG:    (param:2) = (null)
LOG: (query) CREATE UNIQUE INDEX CONCURRENTLY index_21495 ON public.t1 USING btree (code, department) TABLESPACE pg_default
WARNING: Error creating index "public"."index_21495": ERROR:  canceling statement due to lock timeout
INFO: repacking index "public.t1_id_key"
LOG: (query) SELECT pgc.relname, nsp.nspname FROM pg_class pgc INNER JOIN pg_namespace nsp ON nsp.oid = pgc.relnamespace WHERE pgc.relname = 'index_21493' AND nsp.nspname = $1
LOG:    (param:0) = public
LOG: (query) SELECT repack.repack_indexdef($1, $2, $3, true)
LOG:    (param:0) = 21493
LOG:    (param:1) = 21486
LOG:    (param:2) = (null)
LOG: (query) CREATE UNIQUE INDEX CONCURRENTLY index_21493 ON public.t1 USING btree (id) TABLESPACE pg_default
WARNING: Error creating index "public"."index_21493": ERROR:  canceling statement due to lock timeout
WARNING: Skipping index swapping for "t1", since no new indexes built
INFO: Skipping drop of index_21525
INFO: Skipping drop of index_21526
INFO: Skipping drop of index_21495
INFO: Skipping drop of index_21493
WARNING: repack failed for "t1"

Operation failed and we have invalid indexes created:

postgres=# \d t1
                                     Table "public.t1"
   Column   |         Type          | Collation | Nullable |            Default
------------+-----------------------+-----------+----------+--------------------------------
 id         | integer               |           | not null | nextval('t1_id_seq'::regclass)
 code       | character varying(10) |           | not null |
 article    | text                  |           |          |
 name       | text                  |           | not null |
 department | character varying(4)  |           | not null |
Indexes:
    "index_21493" UNIQUE, btree (id) INVALID
    "index_21495" UNIQUE, btree (code, department) INVALID
    "t1_code_department_key" UNIQUE CONSTRAINT, btree (code, department)
    "t1_id_key" UNIQUE CONSTRAINT, btree (id)
    "code_key" btree (code)
    "index_21525" btree (code) INVALID
    "index_21526" btree (name, article) INVALID
    "name_article_key" btree (name, article)

Env:
OS: Centos 7
PG 11/14

@xzilla
Copy link

xzilla commented Oct 10, 2022

ISTM you have a default lock_timeout that is kicking in before repack gets to the point where it's own wait_timeout would activate.

@markosutic
Copy link
Author

Hello,
we have lock_timeout=30sec and pg_repack wait-timeout=5sec. lock_timeout is kicking because pg_repack did not terminate conflicting sessions under 30 seconds. It should activate in 5 seconds based on the wait-timeout parameter - try to cancel conflicting sessions after 5 seconds, try again after 5 seconds, if lock still exits terminate conflicting sessions.

Or I am mistaken...

btw
When repacking table pg_repack successfully executes cancel/terminate of conflicting sessions.

@xzilla
Copy link

xzilla commented Oct 19, 2022

-T SECS, --wait-timeout=SECS
pg_repack needs to take an exclusive lock at the end of the reorganization. This setting controls how many seconds pg_repack will wait to acquire this lock. If the lock cannot be taken after this duration and --no-kill-backend option is not specified, pg_repack will forcibly cancel the conflicting queries. If you are using PostgreSQL version 8.4 or newer, pg_repack will fall back to using pg_terminate_backend() to disconnect any remaining backends after twice this timeout has passed. The default is 60 seconds.

Per the docs, the wait-timeout of pg_repack only kicks in at the point of doing the pg_catalog updates for swapping a repacked table for the original table, it does not kick in (and in fact, because it uses the built in reindex sql command, cannot kick int) when rebuilding indexes.

I wonder if it might be possible to set a local lock_timeout and then nest the index reindexes using savepoints and then terminate connections if it fails due to a lock_timeout error and retry, but the locking ramifications seem concerning. As a work around, it is probably easier to turn off the lock timeout on your pg_repack run. Out of curiosity, do you know what the query/queries are that lead to these 30 second lock waits?

@markosutic
Copy link
Author

You are correct, probably turning off lock timeout for pg_repack run would resolve issue. I just wasn't aware that --wait-timeout will work only for swapping repacked table for the original table.

It would be useful if pg_repack would terminate conflicting transactions when you specify wait-timeout for repacking indexes also.

I don't have conflicting query/queries - I think problem is with highly concurrent tables. For example even when one conflict transaction finishes, newly opened transaction started meanwhile on the table will block CREATE INDEX CONCURRENTLY. If you have very concurrent table CREATE INDEX CONCURRENTLY could wait long time. This is why it would be useful to have option to terminate conflicting transactions for repacking indexes.

Another problem is, if you have lock_timeout, after every pg_repack failure you are left with INVALID indexes on the table.

@MichaelDBA
Copy link
Collaborator

MichaelDBA commented Oct 24, 2022

IMHO, I would avoid using pg_repack with --only-indexes option. PG can only do 1 concurrent index creation at a time (per official documentation). So if you have multiple indexes to be concurrently created, this could cause problems for other competing transactions that might come in between the 1st and 2nd concurrent index build. I think this --only-indexes option should be done away with or else pg_repack has to get a lot more complicated to handle this option.

The official pg_repack documentation alludes to this scenario when describing error outcomes:

ERROR: Another pg_repack command may be running on the table. Please try again later.
There is a chance of deadlock when two concurrent pg_repack commands are run on the same table. So, try to run the command after some time.
WARNING: Cannot create index "schema"."index_xxxxx", already exists
DETAIL: An invalid index may have been left behind by a previous pg_repack on the table which was interrupted. Please use DROP INDEX "schema"."index_xxxxx" to remove this index and try again.

I would suggest using ** REINDEX TABLE CONCURRENTLY table**, which rebuilds all the indexes without blocking and without regard to whether or not they already exist and are in INVALID state. In fact, I fail to see the difference between pg_repack --only-indexes and REINDEX table CONCURRENTLY table.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants