Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] 'dbt docs generate' creates catalog for the relations not used in the models #1316

Open
2 tasks done
maxmullerfitu opened this issue Aug 2, 2024 · 1 comment
Open
2 tasks done
Labels
bug Something isn't working

Comments

@maxmullerfitu
Copy link

maxmullerfitu commented Aug 2, 2024

Is this a new bug in dbt-bigquery?

  • I believe this is a new bug in dbt-bigquery
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Our project contains sources in multiple GCP projects. The source table for the given model changes at runtime and this is parameterized via vars.
when the dbt docs generate command is called, the fault is generated:
Encountered an error while generating catalog: Database Error Access Denied: Table xxxxxxxxx-gold-dev:xxxxx_gold.INFORMATION_SCHEMA.COLUMNS: User does not have permission to query table
Service account cannot access the table in the other project.
According to the logs, dbt docs generate tries to create catalog for all the sources, where as before, it was only creating the catalog for the relations used in the models.
To avoid this issue we had to remove the sources that refer to the multiple projects.
This behavior started after we upgraded our composer environment and dbt-bigquery version changed v1.5.4 -> v1.8.2

Expected Behavior

It is exected that 'dbt docs generate' crates documentation for the relations used in models and sources could contain tables from multiple projects.

Steps To Reproduce

  1. Create Sources in several projects:
  • name:source1
    project: project1
    tables:
    • name: table1
  • name: source2
    project: project2
    schema: schema2
    tables:
    • name: table2
  1. Alternate the sources in the ref():
    `FROM {{ ref("some_model") }} t1
    JOIN
    {% if var('env') == 'dev' %}
    {{ source("source1", "table1") }} AS t2
    {% else %}
    {{ source("source2", "table2") }} AS t2
    {% endif %}
    ON t1.id = t2.id

3 Set {%- set env = "dev" -%} - source1 will be used in the model
Run 'dbt docs generate'

  1. Observe dbt logs to confirm that the query exists to retrieve information from source2.information_schema

Relevant log output

for dbt-bigquery 1.5.4: the source2 is skipped
20:43:05.556313 [debug] [MainThread]: Acquiring new bigquery connection 'generate_catalog'
20:43:05.557479 [info ] [MainThread]: Building catalog
20:43:05.560116 [debug] [MainThread]: Opening a new connection, currently in state init
20:43:07.064956 [debug] [MainThread]: BigQuery adapter: Skipping catalog for xxxxxxxxxxx - schema does not exist
20:43:07.066819 [debug] [ThreadPool]: Acquiring new bigquery connection 'yyyyyyyyy'
20:43:07.073633 [debug] [ThreadPool]: Acquiring new bigquery connection 'zzzzzzzzz'

for dbt-bigquery 1.5.4: the source2 is not skipped:
14:20:51.521164 [debug] [MainThread]: Acquiring new bigquery connection 'generate_catalog'
14:20:51.524088 [info ] [MainThread]: Building catalog
14:20:51.538783 [debug] [ThreadPool]: Acquiring new bigquery connection 'yyyyyyyyy'
14:20:51.540263 [debug] [ThreadPool]: Acquiring new bigquery connection 'zzzzzzzzz'

Environment

- OS:GCP Composer image composer-2.8.5-airflow-2.7.3

- Python:3.11.8
- dbt-core:1.8.3
- dbt-bigquery:1.8.2

Additional Context

No response

@maxmullerfitu maxmullerfitu added bug Something isn't working triage labels Aug 2, 2024
@jeremychia
Copy link

we are experiencing the same error as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants