Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use plain "create table as" if possible #251

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Melkij
Copy link
Collaborator

@Melkij Melkij commented Oct 12, 2020

Hello

We need to split the copying of data into the new table into separate commands if we need any per-column storage settings.
In this case we must use this order:

  • create table as .. with no data;
  • alter table .. alter column ..
  • insert into .. select

But this is pointless if we don't need to change the column.
At the same time, a single "create table as" command may be very useful for performing --order-by because it may use parallel query execution feature in PostgreSQL. INSERT .. SELECT currently cannot use parallel plans.

We need to split the copying of data into the new table into separate commands if we need any per-column storage settings.
In this case we must use this order:
- create table as .. with no data;
- alter table .. alter column ..
- insert into .. select

But this is pointless if we don't need to change the column.
At the same time, a single "create table as" command may be very useful for performing --order-by because it may use parallel query execution feature in PostgreSQL. INSERT .. SELECT currently cannot use parallel plans.
@dvarrazzo
Copy link
Member

This is interesting, but I wonder what is the locking behaviour?

If a CREATE TABLE AS takes a long time, is the system catalog locked? Can you run concurrent DDL instructions concurrently?

@Melkij
Copy link
Collaborator Author

Melkij commented Oct 12, 2020

is the system catalog locked?

There is no such global system catalog locking in postgresql.

Can you run concurrent DDL instructions concurrently?

On other objects - yes. You will wait transaction with another create table with same tablename in same schema, but running of two parallel pg_repack is already wrong idea. pg_repack also forbid DDL on processed table (by holding explicit lock table).

Also transaction locks released only at commit (or rollback). LWLock should not holding in long time (otherwise it's postgresql bug). So here is no actual difference in behaviour, currently create table and insert select are called in same transaction.

@xzilla
Copy link

xzilla commented Dec 5, 2020

In non-parallel cases, this seems like it might cause repack to be slower, but the locking mechanics are no worse, so it should be just as safe as the current repack.

@Melkij
Copy link
Collaborator Author

Melkij commented Dec 5, 2020

In non-parallel cases, this seems like it might cause repack to be slower

hm, why?

@xzilla
Copy link

xzilla commented Dec 6, 2020

You'll have to sort the data before it can be inserted; the larger the table, the more expensive / time consuming that process will be. I don't see it as a show stopper, but something people may want to think about (especially as I don't see a lot of systems being well tuned for parallel work).

@Melkij
Copy link
Collaborator Author

Melkij commented Dec 6, 2020

Right. But there is no performance regression:

  • when cluster mode was requested (by --order-by= option or table was clustered) currently we going to insert .. select .... order by ... - which is always non-parallel. create table as in worst case will be the same.
  • but when cluster mode was not requested (table was not clustered or --no-order option), we have no order by clause at all.

@andreasscherbaum
Copy link
Collaborator

@Melkij Any way you can test the correct creation of the table, with all options and also include the order by?

@bz007
Copy link

bz007 commented Apr 15, 2024

@Melkij would you update the patch?

bz007 pushed a commit to bz007/pg_repack that referenced this pull request Apr 22, 2024
Use CREATE TABLE as SELECT allowing parallel plans if it is possible.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants