Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-2343] [Bug] snapshot with custom datatypes breaks when source table is regenerated #13

Open
2 tasks done
Bonnevie opened this issue Mar 31, 2023 · 8 comments
Open
2 tasks done
Labels
bug Something isn't working snapshots Issues related to dbt's snapshot functionality

Comments

@Bonnevie
Copy link

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

I've set-up a snapshot table in a postgres database, and it has a column named annotation_type that uses a custom enum datatype. This has worked without problems for weeks, but after the table had to be truncated and restored due to a faulty replica setup we encountered the following error when trying t:

12:22:53  Database Error in snapshot annotations_image_snapshot (snapshots/annotations_image_snapshot.sql)
12:22:53    syntax error at or near "USER"
12:22:53    LINE 3: ...ions_image_snapshot" add column "annotation_type" USER-DEFIN...

I cannot find the sql file that the error is in reference to.
the snapshot table was also suddenly lacking the annotation_type column, despite snapshots supposedly never deleting columns?
We are at a development stage where we can still afford to do a full restart of the snapshot, but we'd rather not encounter this in the future.
Our hypothesis is that the custom type is somehow not matched appropriately following regeneration?

Expected Behavior

I would expect snapshots to be robust to no-ops like this, and at least to never incur data loss.
If custom types are unsupported/problematic, it'd be nice to be warned.

Steps To Reproduce

Not easily reproduced, but most likely course:

  1. build source table with custom enum type in postgres
  2. take snapshot
  3. truncate table, then rebuild
  4. take snapshot

Relevant log output

No response

Environment

No response

Which database adapter are you using with dbt?

postgres

Additional Context

No response

@Bonnevie Bonnevie added bug Something isn't working triage labels Mar 31, 2023
@github-actions github-actions bot changed the title [Bug] snapshot with custom datatypes breaks when source table is regenerated [CT-2343] [Bug] snapshot with custom datatypes breaks when source table is regenerated Mar 31, 2023
@dbeatty10
Copy link
Contributor

Thanks for opening this @Bonnevie !

I made an attempt at reproducing this, but wasn't able to trigger the same error as you. Could you tweak the simple example below to reproduce what you were seeing?

I am using a local postgres database as described here so you'll need to tweak your database, schema, etc as-needed.

dbt project

models/_sources.yml

version: 2

sources:
  - name: dbt_dbeatty
    database: postgres
    schema: dbt_dbeatty 
    tables:
      - name: person

snapshots/my_snapshot.sql

{% snapshot my_snapshot %}

{{
    config(
      target_database='postgres',
      target_schema='snapshots',
      strategy='check',
      check_cols='all',
      unique_key='name'
    )
}}

select * from {{ source("dbt_dbeatty", "person") }}

{% endsnapshot %}

Steps

  1. Build source table with custom enum type in postgres (using example here)

Using postgres IDE of choice:

CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy');

CREATE TABLE dbt_dbeatty.person (
    name text,
    current_mood mood
);
INSERT INTO dbt_dbeatty.person VALUES ('Moe', 'happy');
  1. Take a snapshot:
dbt build
  1. Truncate table, then rebuild

Using postgres IDE of choice again:

truncate TABLE dbt_dbeatty.person;
INSERT INTO dbt_dbeatty.person VALUES ('Curly', 'ok');
  1. Take another snapshot:
dbt build

@benorourke
Copy link

benorourke commented Apr 25, 2023

I'm also experiencing this issue. The macro postgres__get_columns_in_relation, which is called when the snapshot checks for any column changes, selects data_type from the information_schema. For any UDT this will return USER-DEFINED. We can instead conditionally select udt_name to get the name of the enum.

As a temporary fix, I have overridden postgres__get_columns_in_relation in override_default_macros.sql to resolve udt names along with their schema:

{% macro postgres__get_columns_in_relation(relation) -%}
  {% call statement('get_columns_in_relation', fetch_result=True) %}
      select
          column_name,
          case
            when (data_type = 'USER-DEFINED') then concat(udt_schema || '.' || udt_name)
            else data_type
          end as data_type,
          character_maximum_length,
          numeric_precision,
          numeric_scale

      from {{ relation.information_schema('columns') }}
      where table_name = '{{ relation.identifier }}'
        {% if relation.schema %}
        and table_schema = '{{ relation.schema }}'
        {% endif %}
      order by ordinal_position

  {% endcall %}
  {% set table = load_result('get_columns_in_relation').table %}
  {{ return(sql_convert_columns_in_relation(table)) }}
{% endmacro %}

@dbeatty10
Copy link
Contributor

dbeatty10 commented Apr 25, 2023

After attempting again with the following steps, I was able to reproduce the reported issue.

Also confirmed that the solution from @benorourke worked 🏅

Steps

  1. Create a simple table that uses only built-in data types

Using postgres IDE of choice:

CREATE TABLE dbt_dbeatty.person (
    name text
);

INSERT INTO dbt_dbeatty.person VALUES ('Moe');
  1. Take a snapshot:
dbt snapshot
  1. Add a column that uses a custom data type

Build source table with custom enum type in postgres (using example here).

Using postgres IDE of choice again:

CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy');

-- Yeah, a simple ALTER TABLE ... would be better here! But this is sufficient to reproduce
DROP TABLE dbt_dbeatty.person;
CREATE TABLE dbt_dbeatty.person (
    name text,
    current_mood mood
);

INSERT INTO dbt_dbeatty.person VALUES ('Curly', 'ok');
  1. Take another snapshot:
dbt snapshot

💥

  1. Override the postgres__get_columns_in_relation macro

Copy-paste the solution described here (untested) or here into macros/overrides.sql

  1. Retry taking another snapshot:
dbt snapshot

@dbeatty10 dbeatty10 removed the triage label Apr 25, 2023
@aBBDnGus
Copy link

aBBDnGus commented Sep 6, 2023

I have a similar bug, which relates to this:
If I have an incremental table with option on_schema_change='sync_all_columns'. If I know add a column with user-defined type to the incremental model, the same error occurs.
It can be fixed with the provided solution.

Environment:
PostgreSQL, dbt 1.6.1

@dbeatty10
Copy link
Contributor

Here is an alternative override for postgres__get_columns_in_relation that might be better (but I didn't test it out with this issue):

{% macro postgres__get_columns_in_relation(relation) -%}
  {% call statement('get_columns_in_relation', fetch_result=True) %}

    select
      col.attname as column_name,

      -- Two options for formatting the data type:
      -- 1) do not include the type modifier (e.g., `geometry`)
      -- 2) include the type modifier when appliable (e.g., `geometry(Point,4326)`)

      -- Option 1:
      pg_catalog.format_type(col.atttypid, null) as data_type,

      -- Option 2:
      -- pg_catalog.format_type(col.atttypid, col.atttypmod) as data_type,

      null as character_maximum_length,
      null as numeric_precision,
      null as numeric_scale

    from pg_catalog.pg_namespace sch
    join pg_catalog.pg_class tbl on tbl.relnamespace = sch.oid
    join pg_catalog.pg_attribute col on col.attrelid = tbl.oid
    left outer join pg_catalog.pg_description tbl_desc on (tbl_desc.objoid = tbl.oid and tbl_desc.objsubid = 0)
    left outer join pg_catalog.pg_description col_desc on (col_desc.objoid = tbl.oid and col_desc.objsubid = col.attnum)

    where 1=1
      and not pg_is_other_temp_schema(sch.oid) -- not a temporary schema belonging to another session
      and tbl.relpersistence in ('p', 'u') -- [p]ermanent table or [u]nlogged table. Exclude [t]emporary tables
      and tbl.relkind in ('r', 'v', 'f', 'p') -- o[r]dinary table, [v]iew, [f]oreign table, [p]artitioned table. Other values are [i]ndex, [S]equence, [c]omposite type, [t]OAST table, [m]aterialized view
      and col.attnum > 0 -- negative numbers are used for system columns such as oid
      and not col.attisdropped -- column has not been dropped

      and tbl.relname = '{{ relation.identifier }}'
        {% if relation.schema %}
        and sch.nspname = '{{ relation.schema }}'
        {% endif %}

    order by
        sch.nspname,
        tbl.relname,
        col.attnum

  {% endcall %}
  {% set table = load_result('get_columns_in_relation').table %}
  {{ return(sql_convert_columns_in_relation(table)) }}
{% endmacro %}

@mikealfare mikealfare transferred this issue from dbt-labs/dbt-core Feb 13, 2024
@dbeatty10 dbeatty10 added the snapshots Issues related to dbt's snapshot functionality label Feb 13, 2024
Copy link

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Aug 12, 2024
@magno32
Copy link

magno32 commented Aug 19, 2024

This issue also exists in unit tests. The workaround provided by @dbeatty10 gets us past the USER-DEFINED cast in the generated SQL for the unit-test, but any default geometry values given in the input/expect rows are wrapped as strings. I'm guessing there is another macro somewhere that would fix that.

@dbeatty10
Copy link
Contributor

@magno32 could you please open an issue here for using the geometry data type within unit tests?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working snapshots Issues related to dbt's snapshot functionality
Projects
None yet
5 participants