Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: sqlparse fallback for formatting queries #30578

Merged
merged 2 commits into from
Oct 11, 2024
Merged

Conversation

betodealmeida
Copy link
Member

@betodealmeida betodealmeida commented Oct 10, 2024

SUMMARY

An improvement to #30350.

Today, when a DB engine spec has no corresponding sqlglot dialect, we use a generic one to parse queries, which works fine. When converting the AST back to a string to pretty-print SQL, though, we hit a problem: in some databases the query becomes invalid. For example, this Firebolt query:

SELECT * FROM t1 WHERE col1 NOT IN (1, 2);

Gets formatted to:

SELECT * FROM t1 WHERE NOT col1 IN (1, 2);

While these two queries are equivalent, the latter is invalid in Firebolt:

Line 1, Column 24: operator 'not' for input types (integer) not found, try adding explicit casts.

The long term solution is to add a Firebolt dialect to sqlglot, at least handling this edge case. As a more generic solution, this PR implements logic where, when a DB engine spec doesn't have a corresponding dialect in sqlglot, the query is pretty-printed using sqlparse.format. This way we have the strictness and correctness of sqlglot when parsing queries, and the leniency of sqlparse when formatting them.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A

TESTING INSTRUCTIONS

Added unit tests covering the problem.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@betodealmeida betodealmeida force-pushed the reformat-sqlglot branch 3 times, most recently from 6d62f0f to 0588776 Compare October 11, 2024 14:28
if isinstance(statement, str)
else statement
)
self._sql = statement
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this PR the Statement classes didn't store a reference to the original SQL, only the AST. So I had to modify it here.

@michael-s-molina
Copy link
Member

michael-s-molina commented Oct 11, 2024

@betodealmeida We discussed this PR during the daily PR review meeting, and we think we should aim for removing any dependency of sqlparse in accordance to [SIP-117] Improve SQL parsing. This means that we prefer to remove formatting for these databases in favor of removing sqlparse. That being said, we understand that this would constitute a breaking change given that this feature exists in 4.0 for these databases. We agreed on keeping your solution but adding a deprecated annotation/comment to the code and remove the sqlparse dependency in 5.0 even if the dialects are not correctly supported by then. I added the following ticket to the 5.0 board. Could you add a deprecated annotation/comment to the code?

cc @villebro @justinpark @sadpandajoe

@betodealmeida
Copy link
Member Author

@michael-s-molina I agree, I think ideally we want to have a close relationship with sqlglot (since we need to have a robust parser in order to enforce DAR and RLS reliably), so I hope we'll able to quickly address problems like this in the future by contributing upstream — creating a Firebolt dialect instead of relying on sqlparse, eg.

I'll add the deprecation notice. For 5.0 we can simply return the query as-is for DB engine specs without a sqlglot dialect.

@sadpandajoe sadpandajoe added the v4.1 Label added by the release manager to track PRs to be included in the 4.1 branch label Oct 11, 2024
Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@betodealmeida betodealmeida merged commit 47c1e09 into master Oct 11, 2024
34 checks passed
sadpandajoe pushed a commit that referenced this pull request Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preset-io size/L v4.1 Label added by the release manager to track PRs to be included in the 4.1 branch
Projects
Status: Cherried
Development

Successfully merging this pull request may close these issues.

4 participants