Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: enable persist_docs for views #337

Merged
merged 12 commits into from
Aug 4, 2023
Merged

fix: enable persist_docs for views #337

merged 12 commits into from
Aug 4, 2023

Conversation

roslovets
Copy link
Contributor

Description

Since Athena view is just a Glue table it's an omission that persist_docs feature is disabled for views. It works exactly the same as with Athena tables.

Checklist

  • You followed contributing section
  • You kept your Pull Request small and focused on a single feature or bug fix.
  • You added unit testing when necessary
  • You added functional testing when necessary

Copy link
Contributor

@Jrmyy Jrmyy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last time I implemented the feature there was an error when trying to put the comments in the glue view, leading to the view being unusable.
Can you give us a model to reproduce, have you tried to persist the docs ? Run the queries on the view afterwards ?

@roslovets
Copy link
Contributor Author

I'm sorry, you are right. I thought, I tested it enough, but views don't work after creating.
Fortunately the problem is not Athena related. It's a bug in the current Glue logic implementation.
I'm working on fix and will provide it with test results soon.

@roslovets
Copy link
Contributor Author

roslovets commented Jun 19, 2023

Current verison of adapter (1.5.0) breaks views when executing persistent_docs because some important fields of Glue Table representing Athena views are missing in source code: https://github.com/dbt-athena/dbt-athena/blob/ff479c147eb922119b774acb42375f924bc968a1/dbt/adapters/athena/impl.py#L647
Like 'ViewOriginalText' - the whole SQL code of view is truncated from view with current implementation.

I fixed updating of Glue table Description taking code from awswrangler library as a reference:
https://github.com/roslovets/dbt-athena/blob/main/dbt/adapters/athena/impl.py#L859

Now it works like a charm for tables and views updating Glue Table Description (straightforward way) and custom Parameter called 'comment' (optional, but useful for compatibility with common approach in databases world).

Additionally I put some efforts trying optimize creation of new version of Glue Tables during deployment. I think that ideally:

  • you should have 1 new version of Glue Table after dbt run (not 2-3)
  • for incremental tables: new version should not be created if descriptions (table+columns) were not modified in dbt project

Manual testing

View

First run of view model:

view1

Populated: Table Description, Table 'comment' Property, Column Comment.

Only one version of table exists after single dbt run.

Deployed view works and it is selectable.

Seconf run of view model:

view2

One Table Version added.

Table

table1

We have two versions from scratch (this is a flaw in table creation process).

Incremental table

Initial run:

inc1

We have two versions right after creation (similar to previous example).

Second run (with the same model description in schema.yml):

inc2

Table description was not updated because there is no need to do it. No new Table Version was produced.

Third run (with new model description):

inc3

We have new Table Version with updated description.

Conclusion

Suggested fix is simple and it allows to populate model/columns descriptions to Glue Table for Athena views.

Tested and verified in my AWS environment.

@roslovets roslovets requested a review from Jrmyy June 19, 2023 23:26
@roslovets
Copy link
Contributor Author

Thank you for checking this fix. It will help me to simplify postprocessing of Glue Tables deployed with dbt by custom scheduled script.

Copy link
Contributor

@Jrmyy Jrmyy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, a few comments 🔥 👍🏻

dbt/adapters/athena/impl.py Outdated Show resolved Hide resolved
dbt/adapters/athena/impl.py Outdated Show resolved Hide resolved
dbt/include/athena/macros/adapters/persist_docs.sql Outdated Show resolved Hide resolved
dbt/include/athena/macros/adapters/persist_docs.sql Outdated Show resolved Hide resolved
@roslovets
Copy link
Contributor Author

@Jrmyy Thanks a lot for your time and comments. I thinks I fixed it all.

@roslovets roslovets requested a review from Jrmyy June 30, 2023 22:29
@Jrmyy Jrmyy added the enable-functional-tests Label to trigger functional testing label Jul 3, 2023
Jrmyy
Jrmyy previously approved these changes Jul 3, 2023
dbt/adapters/athena/impl.py Outdated Show resolved Hide resolved
@Jrmyy Jrmyy self-requested a review July 12, 2023 13:36
@svdimchenko
Copy link
Contributor

@roslovets great job done with updating table metadata 👍 🌟 🚀 ! Some minor comments, mostly about typing which will make the code more easy to understand.

@roslovets
Copy link
Contributor Author

Guys, thank you for smart suggestions. It looks beter now.

@Jrmyy Jrmyy merged commit 8d2a80c into dbt-labs:main Aug 4, 2023
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enable-functional-tests Label to trigger functional testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants