Chore: Bump to Sqlalchemy 2.0 #396

aaronsteers · 2024-09-23T20:39:16Z

Following this guide: https://docs.sqlalchemy.org/en/20/changelog/migration_20.html

Summary by CodeRabbit

Release Notes

New Features
- Enhanced SQL command handling with the adoption of sqlalchemy.text() for improved SQL execution.
- Improved SQL connection management during command execution in tests for better reliability.
Bug Fixes
- Addressed issues with SQL command execution and connection handling in integration tests.
Chores
- Updated SQLAlchemy dependency to version ^2.0.35 to leverage new features and improvements.
- Streamlined import statements for better organization and clarity.
- Added new dependency sqlalchemy2-stubs to support type checking.

coderabbitai · 2024-09-23T20:39:24Z

Note

Currently processing new changes in this PR. This may take a few minutes, please wait...

Commits

Files that changed from the base of the PR and between dda85a4 and c281a8d.

Files ignored due to path filters (1)

poetry.lock is excluded by !**/*.lock

Files selected for processing (1)

pyproject.toml (3 hunks)

__________________
< Zero bugs given. >
------------------
 \
  \   \
       \ /\
       ( )
     .( o ).

Walkthrough

The changes involve updates to SQL command handling and execution across multiple files in the Airbyte codebase. Key modifications include the adoption of sqlalchemy.text() for SQL execution, simplification of SQL statement construction, and adjustments to import statements for consistency. Additionally, method return statements have been explicitly defined to clarify control flow. The pyproject.toml file reflects an update to the SQLAlchemy version constraint, indicating a transition to a newer major version.

Changes

File Path	Change Summary
`airbyte/_processors/sql/bigquery.py`	Modified `BigQueryTypeConverter` methods to return specific SQLAlchemy types instead of strings.
`airbyte/_processors/sql/snowflake.py`	Updated SQL statement construction using f-strings and `sqlalchemy.text()` for execution.
`airbyte/caches/_catalog_backend.py`	Changed import of `declarative_base` to streamline SQLAlchemy imports.
`airbyte/caches/_state_backend.py`	Consolidated import statements for SQLAlchemy without altering functionality.
`airbyte/datasets/_sql.py`	Updated import statements and changed `count_query` construction from `alias()` to `subquery()`.
`airbyte/shared/sql_processor.py`	Moved `TextClause` import, updated SQL engine options, and modified `_execute_sql` method.
`pyproject.toml`	Updated `sqlalchemy` version constraint from `1.4.51` to `^2.0.35` and removed warning suppression.
`tests/integration_tests/conftest.py`	Added `future=True` to engine creation and improved SQL command execution in schema management.
`tests/integration_tests/test_all_cache_types.py`	Enhanced SQL command execution for dropping a column using a context manager.

Possibly related PRs

Feat: Multiple perf improvements and viztracer performance profiling #298: The changes in the airbyte/_processors/sql/snowflake.py file involve SQL command handling, which may relate to the overall SQL processing improvements seen in the main PR's modifications to the BigQueryTypeConverter class.
Fix: Updated handling for Manifest-only connectors; Breaks: Remove Python 3.9 support #340: The updates to the DeclarativeExecutor class in airbyte/_executors/declarative.py include changes to how manifests are handled, which could be relevant to the SQL type conversions in the main PR as they both deal with data processing and type management.
Fix: Resolve issues in Postgres name normalization when names are >63 characters #359: The introduction of the PostgresNormalizer class in airbyte/_processors/sql/postgres.py focuses on name normalization for Postgres, which is conceptually related to the type conversion changes in the main PR, as both involve ensuring data integrity and compliance with specific database constraints.

Wdyt?

Tip

You can validate your CodeRabbit configuration file in your editor.

If your editor has YAML language server, you can enable auto-completion and validation by adding # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json at the top of your CodeRabbit configuration file.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
-- @coderabbitai generate unit testing code for this file.
-- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
-- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
-- @coderabbitai read src/utils.ts and generate unit testing code.
-- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
-- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

Outside diff range and nitpick comments (9)

airbyte/_processors/sql/snowflake.py (2)
Line range hint 150-163: Great job simplifying the SQL statement construction!

The use of an f-string for the copy_statement and text() for SQL execution is a nice improvement. It's more readable and aligns well with SQLAlchemy 2.0 practices.

One small suggestion: wdyt about using a multi-line string for better readability? Something like:
copy_statement = f"""
    COPY INTO {temp_table_name}
    ({columns_list_str})
    FROM (
        SELECT {variant_cols_str}
        FROM {internal_sf_stage_name}
    )
    FILES = ({files_list})
    FILE_FORMAT = (TYPE = JSON, COMPRESSION = GZIP)
"""
This could make the SQL structure even clearer. What do you think?

177-183: Solid update to use text() here as well!

This change is consistent with the earlier updates and follows SQLAlchemy 2.0 best practices. Nice work!

A small suggestion for consistency: wdyt about using a multi-line string here too? It could look like:
connection.execute(
    text("""
        ALTER SESSION SET
        QUOTED_IDENTIFIERS_IGNORE_CASE = TRUE
        MULTI_STATEMENT_COUNT = 0
    """)
)
This would match the style used in the copy_statement. What do you think?
tests/integration_tests/conftest.py (1)
124-124: Nice touch using text() for BigQuery schema dropping!

Great job consistently applying text() for SQL queries across different database engines. It's a solid practice for SQL injection prevention and readability.

Quick thought: I noticed we're using a general Exception in the suppress context. Wdyt about catching more specific exceptions here? It might help with debugging if unexpected errors occur. Maybe something like:
from sqlalchemy.exc import SQLAlchemyError

with suppress(SQLAlchemyError):
    # ... rest of the code
This way, we're only suppressing SQLAlchemy-related errors. What do you think?
airbyte/caches/_catalog_backend.py (1)
16-16: LGTM! How about a small tweak?

The change looks good and aligns with the SQLAlchemy 2.0 migration. Nice job consolidating the imports! 👍

Just a tiny suggestion: wdyt about alphabetizing the imports from sqlalchemy.orm? It might make it easier to scan in the future. Something like:
from sqlalchemy.orm import declarative_base, Session
tests/integration_tests/test_all_cache_types.py (1)
250-256: Nice update to SQLAlchemy 2.0 style! How about a small tweak?

The changes look good and align well with SQLAlchemy 2.0 best practices. Using the context manager for the SQL connection and the text() function for the SQL command is spot on.

One small suggestion: wdyt about extracting the SQL command into a separate variable for better readability? Something like this:
sql_command = text(
    f"ALTER TABLE {new_generic_cache.schema_name}.{table_name} "
    "DROP COLUMN _airbyte_raw_id"
)
with new_generic_cache.processor.get_sql_connection() as conn:
    conn.execute(sql_command)
This could make the code a bit more readable and easier to maintain. What do you think?
airbyte/cloud/sync_results.py (2)
280-280: Explicit return looks good! Maybe we could add a type hint?

The addition of the explicit return statement improves clarity. Nice job! 👍

What do you think about adding a return type hint to make it even more explicit? Something like:
def get_sql_engine(self) -> sqlalchemy.engine.Engine:
    return self.get_sql_cache().get_sql_engine()
This would make the method's contract crystal clear. WDYT?

291-291: Explicit return looks great! How about adding type hints?

The addition of the explicit return statement enhances clarity. Well done! 🎉

To make it even more robust, what do you think about adding parameter and return type hints? Something like:
def get_sql_table(self, stream_name: str) -> sqlalchemy.Table:
    return self.get_sql_cache().processor.get_sql_table(stream_name)
This would make the method's input and output types crystal clear. Does that sound good to you?
airbyte/shared/sql_processor.py (2)
135-135: Good SQLAlchemy 2.0 preparation!

Adding future=True is a great step towards SQLAlchemy 2.0 compatibility. To make the intention clear for future readers, what do you think about adding a brief comment explaining why we're using future=True? Something like:
# Enable SQLAlchemy 2.0 future mode for forward compatibility
future=True,
This could help other developers understand the purpose of this parameter. wdyt?

1117-1119: Updated query construction - looks good!

The changes to the select statement in _emulated_merge_temp_table_to_final_table seem to align with SQLAlchemy 2.0 query construction practices. To make this clear for future readers, what do you think about adding a brief comment? Something like:
# Construct the select statement using SQLAlchemy 2.0 style
select_new_records_stmt = (
    select(temp_table).select_from(joined_table).where(where_not_exists_clause)
)
This could help other developers understand that this is intentionally using the new style. wdyt?

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 2fae5a3 and 85ef6f0.

Files ignored due to path filters (1)

poetry.lock is excluded by !**/*.lock

<details>

Files selected for processing (9)

airbyte/_processors/sql/snowflake.py (4 hunks)
airbyte/caches/_catalog_backend.py (1 hunks)
airbyte/caches/_state_backend.py (1 hunks)
airbyte/cloud/sync_results.py (2 hunks)
airbyte/datasets/_sql.py (2 hunks)
airbyte/shared/sql_processor.py (3 hunks)
pyproject.toml (1 hunks)
tests/integration_tests/conftest.py (3 hunks)
tests/integration_tests/test_all_cache_types.py (2 hunks)

Files skipped from review due to trivial changes (1)

airbyte/caches/_state_backend.py

Additional comments not posted (15)

airbyte/_processors/sql/snowflake.py (2)

7-7: Nice job updating the imports!

The removal of dedent and addition of text from sqlalchemy align well with the changes in SQL statement handling. This is a good step towards adopting SQLAlchemy 2.0 practices. LGTM!

Also applies to: 15-15

Line range hint 1-183: Great job on the SQLAlchemy 2.0 migration for this file!

The changes you've made consistently apply the text() function for SQL execution and simplify SQL statement construction. These updates align perfectly with the PR objective of migrating to SQLAlchemy 2.0 and follow the recommendations in the migration guide.

It's particularly commendable that you've maintained the existing functionality while making these updates. The code looks cleaner and more consistent now.

One thing to consider: have you run the test suite to ensure these changes don't introduce any unexpected behavior? If not, it might be worth doing so as a final check.

Overall, excellent work on this migration! 👍
tests/integration_tests/conftest.py (3)
19-19: LGTM! Importing SQLAlchemy's text function.

Nice job importing text from SQLAlchemy! This aligns perfectly with the migration to SQLAlchemy 2.0. Using text() for SQL queries is a best practice for safety and clarity. Wdyt about adding a comment explaining why we're using text() for future maintainers?

118-121: Awesome job updating BigQuery engine creation!

The addition of future=True in the create_engine call for BigQuery is spot on. It's great to see consistency across different database engines in adopting SQLAlchemy 2.0 features. Keep up the good work!

93-98: Great updates to Snowflake cache cleanup!

Love the use of future=True in create_engine and text() for the SQL query. These changes align nicely with SQLAlchemy 2.0 best practices.

Quick question: I noticed we switched from begin() to connect(). Was this intentional? If we want to ensure the DROP SCHEMA is in a transaction, we might want to use with engine.begin() as connection: instead. Wdyt?

Verification successful

Let's give it another shot!

It seems like rg doesn't recognize --type python. How about we try using --type py instead to search for engine.begin() instances?

Wdyt?

Verification Complete: engine.begin() Usage Confirmed

Only one instance of engine.begin() was found in tests/integration_tests/conftest.py, indicating that the change is localized and everything looks good!
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Verify if there are any other instances of engine.begin() in the codebase
rg --type python 'engine\.begin'
Length of output: 69

Script:
#!/bin/bash
# Verify if there are any other instances of engine.begin() in the codebase
rg --type py 'engine\.begin'
Length of output: 117
airbyte/datasets/_sql.py (3)

Line range hint 1-200: Overall, great job on the SQLAlchemy 2.0 migration!

The changes in this file look good and align well with the migration to SQLAlchemy 2.0. The import statement has been simplified, and the query construction has been updated to use more explicit methods.

Just a thought - would it be helpful to add a comment mentioning the SQLAlchemy 2.0 compatibility at the top of the file? It could help future maintainers understand the context of these changes. wdyt?

96-96: Nice update to query construction!

The change from alias() to subquery() looks good and aligns with SQLAlchemy 2.0 practices. It's a more explicit way of constructing the subquery.

Have you had a chance to test this change with various types of queries to ensure it behaves as expected? Maybe we could add a quick unit test to verify the behavior? wdyt?

10-10: LGTM! Simplified import statement.

The import statement has been updated to directly import Selectable from sqlalchemy. This change aligns well with the SQLAlchemy 2.0 update. Looks good to me!

Just to double-check, have you verified that this import works correctly with SQLAlchemy 2.0 in your local environment? wdyt?
pyproject.toml (3)
41-41: Heads up on potential dependency conflicts 🕵️‍♂️

I noticed we're updating SQLAlchemy, which is awesome! Just wanted to make sure we've checked for any potential conflicts with other dependencies. For example, I see we're using snowflake-sqlalchemy (version ^1.5.1). Have we confirmed it's compatible with SQLAlchemy 2.0?

Maybe we could add a quick note in the PR description about any dependency checks we've done? What do you think about that idea?
#!/bin/bash
# Check for any SQLAlchemy version specifications in other dependencies
rg "sqlalchemy.*=.*" pyproject.toml
41-41: Embracing the future! 🚀

I noticed we removed the warning filter for sqlalchemy.exc.MovedIn20Warning. That's a bold move! Are we confident we've addressed all the warnings? It might be helpful to run the project and check for any new warnings that pop up.

What do you think about adding a TODO comment to review and remove any deprecated SQLAlchemy 1.x code in a follow-up PR? This could help us keep track of any lingering migration tasks. Wdyt?
#!/bin/bash
# Check for any remaining MovedIn20Warning suppressions
rg "MovedIn20Warning" -g '!pyproject.toml'
41-41: Exciting update to SQLAlchemy 2.0! 🎉

I see we're making the leap to SQLAlchemy 2.0. That's a big move! Have we thoroughly tested this across the project? It might be worth adding a note in the PR description about any major changes or gotchas we encountered during the upgrade process. What do you think about that idea? 🤔

Also, just to double-check, are we sure all our code is compatible with 2.0? Sometimes these major version jumps can be tricky. Wdyt about running a full test suite if we haven't already?
airbyte/cloud/sync_results.py (1)

Line range hint 280-291: Overall, these changes look great!

The explicit return statements in both get_sql_engine and get_sql_table methods improve code clarity and align well with SQLAlchemy 2.0 migration best practices. These changes make the code more readable without altering functionality.

If you decide to add the suggested type hints, it would be the cherry on top for this clean-up. Great work on making the codebase more explicit and maintainable! 🚀

airbyte/shared/sql_processor.py (3)

65-65: Nice type checking optimization!

Moving TextClause to the TYPE_CHECKING block is a good way to improve startup time and avoid potential circular imports. Just wanted to double-check: are we sure that TextClause is not used anywhere in the runtime code? If so, this change looks great! wdyt?

1117-1119: Simplified SQL execution logic - nice!

I like the simplification of the _execute_sql method. Removing the special handling for TextClause makes the code more straightforward. However, I wanted to check: are we sure that removing the autocommit=True for TextClause instances won't cause any issues? If this change is intentional and aligns with SQLAlchemy 2.0 practices, then it looks great! wdyt?

Line range hint 1-1119: Great work on SQLAlchemy 2.0 migration!

Overall, the changes in this file look really good and align well with the goal of updating to SQLAlchemy 2.0. The code has been simplified in places, and the query construction has been updated to the new style. The suggestions I've made are mostly about adding some comments to clarify the intentions behind the changes, which could be helpful for future maintenance.

One thing to keep in mind is the removal of the special handling for TextClause in the _execute_sql method. It might be worth double-checking that this doesn't cause any unexpected behavior changes in other parts of the codebase.

Great job on this update! The code looks cleaner and more future-proof now.

coderabbitai

Actionable comments posted: 1

Outside diff range and nitpick comments (1)

airbyte/_processors/sql/bigquery.py (1)

102-102: Looks good, but have you considered setting an explicit max length?

The change to sqlalchemy.types.String(length=None) is great for the SQLAlchemy 2.0 migration. However, setting None for length might lead to unexpected behavior in some databases. What do you think about using an explicit max length instead? For example, sqlalchemy.types.String(length=MAX_LENGTH) where MAX_LENGTH is a constant defined based on BigQuery's limitations. WDYT?

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 85ef6f0 and dd1d158.

Files ignored due to path filters (1)

poetry.lock is excluded by !**/*.lock

<details>

Files selected for processing (6)

airbyte/_processors/sql/bigquery.py (3 hunks)
airbyte/caches/_catalog_backend.py (3 hunks)
airbyte/caches/_state_backend.py (5 hunks)
airbyte/datasets/_sql.py (5 hunks)
airbyte/shared/sql_processor.py (4 hunks)
pyproject.toml (2 hunks)

Files skipped from review as they are similar to previous changes (5)

airbyte/caches/_catalog_backend.py
airbyte/caches/_state_backend.py
airbyte/datasets/_sql.py
airbyte/shared/sql_processor.py
pyproject.toml

Additional comments not posted (1)

airbyte/_processors/sql/bigquery.py (1)

17-17: Nice import addition!

The specific import of sqlalchemy.types as sqlalchemy_types looks good. It's a clean way to bring in just what we need. LGTM!

airbyte/_processors/sql/bigquery.py

aaronsteers · 2024-09-23T21:48:16Z

/poetry-lock

poetry lock job started... Check job output.

✅ poetry lock applied successfully.

Resolving dependencies... changes

aaronsteers added 6 commits September 23, 2024 10:28

chore: remove warning filter for SQLAlchemy20

6515aba

bump sqlalchemy to latest pre-2.0 version

4128eec

fix imports

5502509

apply sqlalchemy 2.0 compat fixes

8fcd035

poetry add sqlalchemy@latest

d185c53

chore: more sqlalchemy 2.0 syntax updates

85ef6f0

coderabbitai bot reviewed Sep 23, 2024

View reviewed changes

coderabbitai bot approved these changes Sep 23, 2024

View reviewed changes

aaronsteers added 2 commits September 23, 2024 14:12

improve linting situation

7c27f10

add sqlalchemy2 stubs

dd1d158

aaronsteers enabled auto-merge (squash) September 23, 2024 21:13

coderabbitai bot requested changes Sep 23, 2024

View reviewed changes

airbyte/_processors/sql/bigquery.py Outdated Show resolved Hide resolved

coderabbitai bot approved these changes Sep 23, 2024

View reviewed changes

aaronsteers added 2 commits September 23, 2024 14:31

Use 'string' instead of 'varchar'

dda85a4

bump versions

7c53b64

aaronsteers mentioned this pull request Sep 23, 2024

Chore(deps): Bump sqlalchemy from 1.4.51 to 2.0.35 #389

Closed

Merge branch 'main' into chore/bump-to-sqlalchemy-2x

a5360ab

Auto-commit Updating dependencies

c281a8d

Resolving dependencies... changes

aaronsteers linked an issue Sep 23, 2024 that may be closed by this pull request

Update SQLAlchemy to 2.x #382

Closed

aaronsteers merged commit b22a6b4 into main Sep 23, 2024
15 checks passed

aaronsteers deleted the chore/bump-to-sqlalchemy-2x branch September 23, 2024 22:09

coderabbitai bot mentioned this pull request Sep 23, 2024

Chore: Bump CDK dependency, DuckDB, Postgres, Pandas, and others #383

Merged

aaronsteers mentioned this pull request Sep 23, 2024

Apply security and compatibility updates for PyAirbyte dependencies (Sep 2024) #386

Closed

This was referenced Oct 1, 2024

Feat: Add 'execute_sql' command on caches, add DuckDB WAL cleanup step #407

Merged

Feat: Support Snowflake's travel time #414

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chore: Bump to Sqlalchemy 2.0 #396

Chore: Bump to Sqlalchemy 2.0 #396

aaronsteers commented Sep 23, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 23, 2024 •

edited

Loading

Walkthrough

Changes

Possibly related PRs

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot left a comment

aaronsteers commented Sep 23, 2024 •

edited by github-actions bot

Loading

Chore: Bump to Sqlalchemy 2.0 #396

Chore: Bump to Sqlalchemy 2.0 #396

Conversation

aaronsteers commented Sep 23, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

Release Notes

coderabbitai bot commented Sep 23, 2024 • edited Loading

Walkthrough

Changes

Possibly related PRs

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

aaronsteers commented Sep 23, 2024 • edited by github-actions bot Loading

aaronsteers commented Sep 23, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 23, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

aaronsteers commented Sep 23, 2024 •

edited by github-actions bot

Loading