💡 Feature request: Postgres source #82

WangCHEN9 · 2024-02-29T22:31:25Z

Will be really nice if We can support more database source connectors :)

aaronsteers · 2024-02-29T22:58:18Z

@WangCHEN9 - Thanks for logging this. We're interested in learning more about your use case. Specifically:

Do you want to replicate data from Postgres to another cache/destination, like Snowflake or a different Postgres DB? Or do you just want to get that data locally so it is available to your python code, in pandas/AI/etc.?
For your use case, do you want to take advantage of built-in Potgres-native CDC features, such as auto-detecting new records with the WAL log (described here)? The alternative would be column-based incremental sync, for instance using an updated_at column or similar to detect new records.

WangCHEN9 · 2024-03-01T07:21:34Z

Hi @aaronsteers ,

I have 2 main use cases in mind:

A seamless workflow using PyAirbyte/DBT/DuckDB for quick POCs locally, This will able to give the ELT power to data analysts.(they might not very good at python compare to data scientists, but enough to use PyAirbyte/DBT)
Use PyAirbyte as light weighted EL tool. (Here probably we will replicate to files in S3 first, before ingest it to Snowflake, so that we can switch with airbyte OSS/Enterprise later on)

For your questions :

Yes, I am interested in replicate data from Postgres to S3 (with the help of DuckDB COPY function)
I will prefer to use updated_at column for incremental loading new records. (it is easier for ingestion later on when you want load it as file)

Thanks,
Wang

aaronsteers · 2024-03-01T18:06:54Z

@WangCHEN9 - Thanks very much for this explanation.

I've logged a couple different paths forwards. None of these approaches are trivial, unfortunately...

The most direct/obvious solution would be #87, but there are some technical barriers to us implementing this. There's another path forward in #85, which might be a smoother path for your use case. This 'cache-to-cache' implementation also has its own challenges, but those are more on us designing a good developer experience, less so on actual technical hurtles.

I noted in #87 a workaround which would be to pre-install the Java connector. Would love your thoughts and upvotes on any of those approaches. Thanks! 🙏

WangCHEN9 · 2024-03-04T08:06:07Z

Hi @aaronsteers ,

I will definitive upvote #85. Because it will able to unlock more much usecases, especially with the power of DuckDB.

For #87, Personally I don't like it. Asking user install java or docker is too much work. we kind of lost the advantage of PyAirbyte.

Wang

aaronsteers · 2024-03-04T16:48:13Z

@WangCHEN9 - This feedback is very helpful. Thank you!

Will keep you posted.

aaronsteers changed the title ~~Support postgres source~~ Feature request: Postgres source Feb 29, 2024

aaronsteers changed the title ~~Feature request: Postgres source~~ 💡 Feature request: Postgres source Mar 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

💡 Feature request: Postgres source #82

💡 Feature request: Postgres source #82

WangCHEN9 commented Feb 29, 2024

aaronsteers commented Feb 29, 2024 •

edited

Loading

WangCHEN9 commented Mar 1, 2024

aaronsteers commented Mar 1, 2024 •

edited

Loading

WangCHEN9 commented Mar 4, 2024

aaronsteers commented Mar 4, 2024 •

edited

Loading

💡 Feature request: Postgres source #82

💡 Feature request: Postgres source #82

Comments

WangCHEN9 commented Feb 29, 2024

aaronsteers commented Feb 29, 2024 • edited Loading

WangCHEN9 commented Mar 1, 2024

aaronsteers commented Mar 1, 2024 • edited Loading

WangCHEN9 commented Mar 4, 2024

aaronsteers commented Mar 4, 2024 • edited Loading

aaronsteers commented Feb 29, 2024 •

edited

Loading

aaronsteers commented Mar 1, 2024 •

edited

Loading

aaronsteers commented Mar 4, 2024 •

edited

Loading