Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chore: Bump to Sqlalchemy 2.0 #396

Merged
merged 12 commits into from
Sep 23, 2024
Merged

Chore: Bump to Sqlalchemy 2.0 #396

merged 12 commits into from
Sep 23, 2024

Conversation

aaronsteers
Copy link
Contributor

@aaronsteers aaronsteers commented Sep 23, 2024

Following this guide: https://docs.sqlalchemy.org/en/20/changelog/migration_20.html

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced SQL command handling with the adoption of sqlalchemy.text() for improved SQL execution.
    • Improved SQL connection management during command execution in tests for better reliability.
  • Bug Fixes

    • Addressed issues with SQL command execution and connection handling in integration tests.
  • Chores

    • Updated SQLAlchemy dependency to version ^2.0.35 to leverage new features and improvements.
    • Streamlined import statements for better organization and clarity.
    • Added new dependency sqlalchemy2-stubs to support type checking.

Copy link

coderabbitai bot commented Sep 23, 2024

Note

Currently processing new changes in this PR. This may take a few minutes, please wait...

Commits

Files that changed from the base of the PR and between dda85a4 and c281a8d.

Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
Files selected for processing (1)
  • pyproject.toml (3 hunks)
__________________
< Zero bugs given. >
------------------
 \
  \   \
       \ /\
       ( )
     .( o ).
Walkthrough

Walkthrough

The changes involve updates to SQL command handling and execution across multiple files in the Airbyte codebase. Key modifications include the adoption of sqlalchemy.text() for SQL execution, simplification of SQL statement construction, and adjustments to import statements for consistency. Additionally, method return statements have been explicitly defined to clarify control flow. The pyproject.toml file reflects an update to the SQLAlchemy version constraint, indicating a transition to a newer major version.

Changes

File Path Change Summary
airbyte/_processors/sql/bigquery.py Modified BigQueryTypeConverter methods to return specific SQLAlchemy types instead of strings.
airbyte/_processors/sql/snowflake.py Updated SQL statement construction using f-strings and sqlalchemy.text() for execution.
airbyte/caches/_catalog_backend.py Changed import of declarative_base to streamline SQLAlchemy imports.
airbyte/caches/_state_backend.py Consolidated import statements for SQLAlchemy without altering functionality.
airbyte/datasets/_sql.py Updated import statements and changed count_query construction from alias() to subquery().
airbyte/shared/sql_processor.py Moved TextClause import, updated SQL engine options, and modified _execute_sql method.
pyproject.toml Updated sqlalchemy version constraint from 1.4.51 to ^2.0.35 and removed warning suppression.
tests/integration_tests/conftest.py Added future=True to engine creation and improved SQL command execution in schema management.
tests/integration_tests/test_all_cache_types.py Enhanced SQL command execution for dropping a column using a context manager.

Possibly related PRs

Wdyt?

Tip

You can validate your CodeRabbit configuration file in your editor.

If your editor has YAML language server, you can enable auto-completion and validation by adding # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json at the top of your CodeRabbit configuration file.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    -- @coderabbitai generate unit testing code for this file.
    -- @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    -- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    -- @coderabbitai read src/utils.ts and generate unit testing code.
    -- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    -- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Outside diff range and nitpick comments (9)
airbyte/_processors/sql/snowflake.py (2)

Line range hint 150-163: Great job simplifying the SQL statement construction!

The use of an f-string for the copy_statement and text() for SQL execution is a nice improvement. It's more readable and aligns well with SQLAlchemy 2.0 practices.

One small suggestion: wdyt about using a multi-line string for better readability? Something like:

copy_statement = f"""
    COPY INTO {temp_table_name}
    ({columns_list_str})
    FROM (
        SELECT {variant_cols_str}
        FROM {internal_sf_stage_name}
    )
    FILES = ({files_list})
    FILE_FORMAT = (TYPE = JSON, COMPRESSION = GZIP)
"""

This could make the SQL structure even clearer. What do you think?


177-183: Solid update to use text() here as well!

This change is consistent with the earlier updates and follows SQLAlchemy 2.0 best practices. Nice work!

A small suggestion for consistency: wdyt about using a multi-line string here too? It could look like:

connection.execute(
    text("""
        ALTER SESSION SET
        QUOTED_IDENTIFIERS_IGNORE_CASE = TRUE
        MULTI_STATEMENT_COUNT = 0
    """)
)

This would match the style used in the copy_statement. What do you think?

tests/integration_tests/conftest.py (1)

124-124: Nice touch using text() for BigQuery schema dropping!

Great job consistently applying text() for SQL queries across different database engines. It's a solid practice for SQL injection prevention and readability.

Quick thought: I noticed we're using a general Exception in the suppress context. Wdyt about catching more specific exceptions here? It might help with debugging if unexpected errors occur. Maybe something like:

from sqlalchemy.exc import SQLAlchemyError

with suppress(SQLAlchemyError):
    # ... rest of the code

This way, we're only suppressing SQLAlchemy-related errors. What do you think?

airbyte/caches/_catalog_backend.py (1)

16-16: LGTM! How about a small tweak?

The change looks good and aligns with the SQLAlchemy 2.0 migration. Nice job consolidating the imports! 👍

Just a tiny suggestion: wdyt about alphabetizing the imports from sqlalchemy.orm? It might make it easier to scan in the future. Something like:

from sqlalchemy.orm import declarative_base, Session
tests/integration_tests/test_all_cache_types.py (1)

250-256: Nice update to SQLAlchemy 2.0 style! How about a small tweak?

The changes look good and align well with SQLAlchemy 2.0 best practices. Using the context manager for the SQL connection and the text() function for the SQL command is spot on.

One small suggestion: wdyt about extracting the SQL command into a separate variable for better readability? Something like this:

sql_command = text(
    f"ALTER TABLE {new_generic_cache.schema_name}.{table_name} "
    "DROP COLUMN _airbyte_raw_id"
)
with new_generic_cache.processor.get_sql_connection() as conn:
    conn.execute(sql_command)

This could make the code a bit more readable and easier to maintain. What do you think?

airbyte/cloud/sync_results.py (2)

280-280: Explicit return looks good! Maybe we could add a type hint?

The addition of the explicit return statement improves clarity. Nice job! 👍

What do you think about adding a return type hint to make it even more explicit? Something like:

def get_sql_engine(self) -> sqlalchemy.engine.Engine:
    return self.get_sql_cache().get_sql_engine()

This would make the method's contract crystal clear. WDYT?


291-291: Explicit return looks great! How about adding type hints?

The addition of the explicit return statement enhances clarity. Well done! 🎉

To make it even more robust, what do you think about adding parameter and return type hints? Something like:

def get_sql_table(self, stream_name: str) -> sqlalchemy.Table:
    return self.get_sql_cache().processor.get_sql_table(stream_name)

This would make the method's input and output types crystal clear. Does that sound good to you?

airbyte/shared/sql_processor.py (2)

135-135: Good SQLAlchemy 2.0 preparation!

Adding future=True is a great step towards SQLAlchemy 2.0 compatibility. To make the intention clear for future readers, what do you think about adding a brief comment explaining why we're using future=True? Something like:

# Enable SQLAlchemy 2.0 future mode for forward compatibility
future=True,

This could help other developers understand the purpose of this parameter. wdyt?


1117-1119: Updated query construction - looks good!

The changes to the select statement in _emulated_merge_temp_table_to_final_table seem to align with SQLAlchemy 2.0 query construction practices. To make this clear for future readers, what do you think about adding a brief comment? Something like:

# Construct the select statement using SQLAlchemy 2.0 style
select_new_records_stmt = (
    select(temp_table).select_from(joined_table).where(where_not_exists_clause)
)

This could help other developers understand that this is intentionally using the new style. wdyt?

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 2fae5a3 and 85ef6f0.

Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
<details>
Files selected for processing (9)
  • airbyte/_processors/sql/snowflake.py (4 hunks)
  • airbyte/caches/_catalog_backend.py (1 hunks)
  • airbyte/caches/_state_backend.py (1 hunks)
  • airbyte/cloud/sync_results.py (2 hunks)
  • airbyte/datasets/_sql.py (2 hunks)
  • airbyte/shared/sql_processor.py (3 hunks)
  • pyproject.toml (1 hunks)
  • tests/integration_tests/conftest.py (3 hunks)
  • tests/integration_tests/test_all_cache_types.py (2 hunks)
Files skipped from review due to trivial changes (1)
  • airbyte/caches/_state_backend.py
Additional comments not posted (15)
airbyte/_processors/sql/snowflake.py (2)

7-7: Nice job updating the imports!

The removal of dedent and addition of text from sqlalchemy align well with the changes in SQL statement handling. This is a good step towards adopting SQLAlchemy 2.0 practices. LGTM!

Also applies to: 15-15


Line range hint 1-183: Great job on the SQLAlchemy 2.0 migration for this file!

The changes you've made consistently apply the text() function for SQL execution and simplify SQL statement construction. These updates align perfectly with the PR objective of migrating to SQLAlchemy 2.0 and follow the recommendations in the migration guide.

It's particularly commendable that you've maintained the existing functionality while making these updates. The code looks cleaner and more consistent now.

One thing to consider: have you run the test suite to ensure these changes don't introduce any unexpected behavior? If not, it might be worth doing so as a final check.

Overall, excellent work on this migration! 👍

tests/integration_tests/conftest.py (3)

19-19: LGTM! Importing SQLAlchemy's text function.

Nice job importing text from SQLAlchemy! This aligns perfectly with the migration to SQLAlchemy 2.0. Using text() for SQL queries is a best practice for safety and clarity. Wdyt about adding a comment explaining why we're using text() for future maintainers?


118-121: Awesome job updating BigQuery engine creation!

The addition of future=True in the create_engine call for BigQuery is spot on. It's great to see consistency across different database engines in adopting SQLAlchemy 2.0 features. Keep up the good work!


93-98: Great updates to Snowflake cache cleanup!

Love the use of future=True in create_engine and text() for the SQL query. These changes align nicely with SQLAlchemy 2.0 best practices.

Quick question: I noticed we switched from begin() to connect(). Was this intentional? If we want to ensure the DROP SCHEMA is in a transaction, we might want to use with engine.begin() as connection: instead. Wdyt?

Verification successful

Let's give it another shot!

It seems like rg doesn't recognize --type python. How about we try using --type py instead to search for engine.begin() instances?

Wdyt?


Verification Complete: engine.begin() Usage Confirmed

Only one instance of engine.begin() was found in tests/integration_tests/conftest.py, indicating that the change is localized and everything looks good!

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify if there are any other instances of engine.begin() in the codebase
rg --type python 'engine\.begin\(\)'

Length of output: 69


Script:

#!/bin/bash
# Verify if there are any other instances of engine.begin() in the codebase
rg --type py 'engine\.begin\(\)'

Length of output: 117

airbyte/datasets/_sql.py (3)

Line range hint 1-200: Overall, great job on the SQLAlchemy 2.0 migration!

The changes in this file look good and align well with the migration to SQLAlchemy 2.0. The import statement has been simplified, and the query construction has been updated to use more explicit methods.

Just a thought - would it be helpful to add a comment mentioning the SQLAlchemy 2.0 compatibility at the top of the file? It could help future maintainers understand the context of these changes. wdyt?


96-96: Nice update to query construction!

The change from alias() to subquery() looks good and aligns with SQLAlchemy 2.0 practices. It's a more explicit way of constructing the subquery.

Have you had a chance to test this change with various types of queries to ensure it behaves as expected? Maybe we could add a quick unit test to verify the behavior? wdyt?


10-10: LGTM! Simplified import statement.

The import statement has been updated to directly import Selectable from sqlalchemy. This change aligns well with the SQLAlchemy 2.0 update. Looks good to me!

Just to double-check, have you verified that this import works correctly with SQLAlchemy 2.0 in your local environment? wdyt?

pyproject.toml (3)

41-41: Heads up on potential dependency conflicts 🕵️‍♂️

I noticed we're updating SQLAlchemy, which is awesome! Just wanted to make sure we've checked for any potential conflicts with other dependencies. For example, I see we're using snowflake-sqlalchemy (version ^1.5.1). Have we confirmed it's compatible with SQLAlchemy 2.0?

Maybe we could add a quick note in the PR description about any dependency checks we've done? What do you think about that idea?

#!/bin/bash
# Check for any SQLAlchemy version specifications in other dependencies
rg "sqlalchemy.*=.*" pyproject.toml

41-41: Embracing the future! 🚀

I noticed we removed the warning filter for sqlalchemy.exc.MovedIn20Warning. That's a bold move! Are we confident we've addressed all the warnings? It might be helpful to run the project and check for any new warnings that pop up.

What do you think about adding a TODO comment to review and remove any deprecated SQLAlchemy 1.x code in a follow-up PR? This could help us keep track of any lingering migration tasks. Wdyt?

#!/bin/bash
# Check for any remaining MovedIn20Warning suppressions
rg "MovedIn20Warning" -g '!pyproject.toml'

41-41: Exciting update to SQLAlchemy 2.0! 🎉

I see we're making the leap to SQLAlchemy 2.0. That's a big move! Have we thoroughly tested this across the project? It might be worth adding a note in the PR description about any major changes or gotchas we encountered during the upgrade process. What do you think about that idea? 🤔

Also, just to double-check, are we sure all our code is compatible with 2.0? Sometimes these major version jumps can be tricky. Wdyt about running a full test suite if we haven't already?

airbyte/cloud/sync_results.py (1)

Line range hint 280-291: Overall, these changes look great!

The explicit return statements in both get_sql_engine and get_sql_table methods improve code clarity and align well with SQLAlchemy 2.0 migration best practices. These changes make the code more readable without altering functionality.

If you decide to add the suggested type hints, it would be the cherry on top for this clean-up. Great work on making the codebase more explicit and maintainable! 🚀

airbyte/shared/sql_processor.py (3)

65-65: Nice type checking optimization!

Moving TextClause to the TYPE_CHECKING block is a good way to improve startup time and avoid potential circular imports. Just wanted to double-check: are we sure that TextClause is not used anywhere in the runtime code? If so, this change looks great! wdyt?


1117-1119: Simplified SQL execution logic - nice!

I like the simplification of the _execute_sql method. Removing the special handling for TextClause makes the code more straightforward. However, I wanted to check: are we sure that removing the autocommit=True for TextClause instances won't cause any issues? If this change is intentional and aligns with SQLAlchemy 2.0 practices, then it looks great! wdyt?


Line range hint 1-1119: Great work on SQLAlchemy 2.0 migration!

Overall, the changes in this file look really good and align well with the goal of updating to SQLAlchemy 2.0. The code has been simplified in places, and the query construction has been updated to the new style. The suggestions I've made are mostly about adding some comments to clarify the intentions behind the changes, which could be helpful for future maintenance.

One thing to keep in mind is the removal of the special handling for TextClause in the _execute_sql method. It might be worth double-checking that this doesn't cause any unexpected behavior changes in other parts of the codebase.

Great job on this update! The code looks cleaner and more future-proof now.

@aaronsteers aaronsteers enabled auto-merge (squash) September 23, 2024 21:13
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Outside diff range and nitpick comments (1)
airbyte/_processors/sql/bigquery.py (1)

102-102: Looks good, but have you considered setting an explicit max length?

The change to sqlalchemy.types.String(length=None) is great for the SQLAlchemy 2.0 migration. However, setting None for length might lead to unexpected behavior in some databases. What do you think about using an explicit max length instead? For example, sqlalchemy.types.String(length=MAX_LENGTH) where MAX_LENGTH is a constant defined based on BigQuery's limitations. WDYT?

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 85ef6f0 and dd1d158.

Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
<details>
Files selected for processing (6)
  • airbyte/_processors/sql/bigquery.py (3 hunks)
  • airbyte/caches/_catalog_backend.py (3 hunks)
  • airbyte/caches/_state_backend.py (5 hunks)
  • airbyte/datasets/_sql.py (5 hunks)
  • airbyte/shared/sql_processor.py (4 hunks)
  • pyproject.toml (2 hunks)
Files skipped from review as they are similar to previous changes (5)
  • airbyte/caches/_catalog_backend.py
  • airbyte/caches/_state_backend.py
  • airbyte/datasets/_sql.py
  • airbyte/shared/sql_processor.py
  • pyproject.toml
Additional comments not posted (1)
airbyte/_processors/sql/bigquery.py (1)

17-17: Nice import addition!

The specific import of sqlalchemy.types as sqlalchemy_types looks good. It's a clean way to bring in just what we need. LGTM!

airbyte/_processors/sql/bigquery.py Outdated Show resolved Hide resolved
@aaronsteers
Copy link
Contributor Author

aaronsteers commented Sep 23, 2024

/poetry-lock

poetry lock job started... Check job output.

poetry lock applied successfully.

Resolving dependencies... changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update SQLAlchemy to 2.x
1 participant