Skip to content
This repository has been archived by the owner on Sep 23, 2024. It is now read-only.

Reuse connection #112

Closed

Conversation

ivanovyordan
Copy link

Problem

Tap-postgres opens a new connection every time it needs to cast a value. This is highly inefficient as opening a connection is usually a slow and resource-intensive operation.

Proposed changes

An easy fix would be to use something like PgBouncer, but it's even better if we open just one connection and reuse it for all queries.

To fix the issue we do two things:

  1. We created a Singleton Postgres connection wrapper. This wrapper actually holds up to two connections, since we need two different connection factories. The connect method returns the connection we need based on the arguments provided.
  2. Remove when statements when asking for a connection. When statements are great every time we need to ensure a resource is properly closed after it's being used, but in our specific case, we don't want to close connections after each query.

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation Update (if none of the other choices apply)

Checklist

  • Description above provides context of the change
  • I have added tests that prove my fix is effective or that my feature works
  • Unit tests for changes (not needed for documentation changes)
  • CI checks pass with my changes
  • Bumping version in setup.py is an individual PR and not mixed with feature or bugfix PRs
  • Commit message/PR title starts with [AP-NNNN] (if applicable. AP-NNNN = JIRA ID)
  • Branch name starts with AP-NNN (if applicable. AP-NNN = JIRA ID)
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions

@tomslade1
Copy link

Hey @Samira-El or @koszti, sorry to bother you both! Would you be able to take a look at this PR for us?

@Samira-El
Copy link
Contributor

Hi @tomslade1 and @ivanovyordan, thank you for this contribution, this is a substantial change so will take a look at it some this or next week when have time.

@Samira-El
Copy link
Contributor

Can you please in the meantime make sure the PR is passing the CI checks? TY

@tomslade1
Copy link

Hi @tomslade1 and @ivanovyordan, thank you for this contribution, this is a substantial change so will take a look at it some this or next week when have time.

Can you please in the meantime make sure the PR is passing the CI checks? TY

Thanks @Samira-El - very much appreciated 🙇 ; Will cover off the CI checks in the mean time!

@ivanovyordan
Copy link
Author

Hey @Samira-El. I fixed a few more warnings, but it looks like the CI fails because of some config. I'm still not sure how I'm supposed to pass the -c argument. Can you advise on that, please?

@Samira-El
Copy link
Contributor

Samira-El commented Sep 6, 2021

Hey @Samira-El. I fixed a few more warnings, but it looks like the CI fails because of some config. I'm still not sure how I'm supposed to pass the -c argument. Can you advise on that, please?

The issue is in the method Postgres.get_configuration, it's calling utils.parse_args with empty dict but some args are required.
You'll have to rethink the design of this class.

@ivanovyordan
Copy link
Author

Hey @Samira-El Thank you so much for helping me with this.

I decided to use the original design of the project. All I do now is to extract the code of the open_connection function to the singleton class. That way we have minimal changes in the code except it's not all wrapped in with statements.

It seems to be working well on my machine, and I really hope I did it right this time. :)

@Samira-El
Copy link
Contributor

Hey @ivanovyordan, something is wrong, the tests in CI are hanging. Can you please run the tests locally to make sure it's all good code-wise .

@ivanovyordan
Copy link
Author

Hey @ivanovyordan, something is wrong, the tests in CI are hanging. Can you please run the tests locally to make sure it's all good code-wise .

Yep. Looking at it ATM. :)

Tap-postgres opens a new connection every time it needs to cast a value.
This is highly inefficient as opening a connection is usually a slow and
resource-intensive operation. An easy fix would be to use something like
PgBouncer, but it's even better if we open just once connection and
reuse it for all queries.

An easy way to fix that is to implement Singleton, but we decided to go
even simpler and store our connections in the db module we already have.
All we do is to store two types of connections (transactional and
logical). Every time we need a connection we check if we already have an
open connection. If we don't we open it and store it. Then just return
that connection.
@ivanovyordan
Copy link
Author

Hey @Samira-El . I finally had the time to look into my changes properly. I decided to go super simple and store the two connections in the db file.

Something I'm not sure you'd like is I use global variables.

Other than that, I have two failing tests, that also fail on the master branch. Not sure why they are failing on my machine and not on the CI.

FAILED tests/test_discovery.py::TestColumnGrants::test_catalog - AssertionError: {(): [25 chars][], 'schema-name': 'public', 'database-name': [495 chars]rue}} != {(): [25 chars][], 'database-name': 'postgres', 'schema-name'[147 chars]er'}}
FAILED tests/test_full_table_interruption.py::LogicalInterruption::test_catalog - AssertionError: {'col[67 chars]1T10:40:59+00:00', 'timestamp_tz': '2020-09-01T01:50:59+03:00'} != {'col[67 chars]1T10:40:59+00:00', 'timestamp_tz': '2020-08-31T22:50:59+00:00'}

@Tolsto
Copy link

Tolsto commented Apr 24, 2022

I don't think that opening multiple connections is the problem that should be solved here. Instead, the entire remote casting should be removed (see #173).
Another approach would be to use a separate UNIX socket connection to a Postgres instance that runs on the same host as the tap. I've tested this but it's still a severe bottleneck and much much slower compared to in-code type casting.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants