Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix database existence check #463

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

bernt-matthias
Copy link
Contributor

@bernt-matthias bernt-matthias commented Jul 9, 2020

follow up to #372
fixes #462

TODO:

  • always dispose engines (also for other dialects)
  • just use the else branch for postgres? other drivers starting with "postgres" used the else branch before anyway
  • docs

@CaselIT
Copy link

CaselIT commented Jul 9, 2020

Other than the than the disposal of the engines, I think this should work fine 👍

@bernt-matthias bernt-matthias force-pushed the topic/372-followup branch 2 times, most recently from 3243079 to cc53be0 Compare July 9, 2020 13:27
@bernt-matthias
Copy link
Contributor Author

Wondering if one could extend a test case such that a postgres DB without CONNECT privilege for the postgres database is tested.

class TestDatabasePostgres(DatabaseTest):

Co-authored-by: Nicola Soranzo <[email protected]>
Co-authored-by: Nicola Soranzo <[email protected]>
- postgres: return for the first positive test
- use immutable for default argument
sqlalchemy_utils/functions/database.py Outdated Show resolved Hide resolved
sqlalchemy_utils/functions/database.py Outdated Show resolved Hide resolved
Co-authored-by: Nicola Soranzo <[email protected]>
Co-authored-by: Nicola Soranzo <[email protected]>
except (ProgrammingError, OperationalError):
pass
finally:
engine.dispose()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to set engine to None here -- is it safe to call dispose() twice on an engine? A second dispose() will be called on line 510

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true. how about this e481582

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think now we aren't disposing engines between each iteration of the loop if there are not exceptions. I'm not the author of this code, so don't want to offer too many opinions, but this code looks a little brittle in terms of the manual calls to Engine.dispose(). I would just use a context manager here...one that called dispose() in exit and then use with everywhere I created an Engine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand https://docs.sqlalchemy.org/en/14/core/connections.html?highlight=dispose#connectionless-execution-implicit-execution correctly, the current code uses implicit connections (i.e. calling execute on the engine). This may be changed to

with engine.connect() as connection:
    connection.execute(...)

then we might no need to use dispose at all https://docs.sqlalchemy.org/en/14/core/connections.html?highlight=dispose#engine-disposal ..

but I'm not an sqlalchemy expert at all .. but I'm fine with implementing the change if you agree

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also dispose is called in get_scalar_result .. i.e. in the case of a successful existence check dispose is currently called twice... but according to the docs it just closes connections.

Copy link

@jtbeach jtbeach Jul 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do add a context manager similar to contextlib.closing (https://docs.python.org/3/library/contextlib.html#contextlib.closing) that does this:

from contextlib import contextmanager

@contextmanager
def disposing_engine(engine):
    try:
        yield engine
    finally:
        engine.dispose()

That way you can just write:

for pdb in postgres_db:
    url.database = pdb
    with disposing_engine(sa.create_engine(url)) as engine:
        if get_scalar_result(engine, text):
            return True
return False

You can follow this pattern for other dialects as well so you don't need an engine var or 'ret` var.

elif dialect_name == 'mysql':
    with disposing_engine(sa.create_engine(url)) as engine:
        text = ...
        return bool(get_scalar_result(engine, text))

It seems a little strange that get_scalar_result would call Engine.dispose() but seems like it is safe to call twice then

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanations. I tried a slightly different way in 6b5cdb2

  1. use a connection for the execution and close the connection (via with). Already with this there should be no need to dispose the engine (since there should be no open connections in the pool that need to be closed).
  2. explicitly use the Null connection pool .. just to be sure.

Lets see if this passes tests.

- use a connection (which is closed automatically) for
  data base existence check
- explicitely use Null connection pool

already with the 1st change disposal of the engine (which closes all
open connections) is not necessary anymore. with the second change
we are completely sure.
@bernt-matthias
Copy link
Contributor Author

Tests are passing. Will test now the postgres part with Galaxy.

@bernt-matthias
Copy link
Contributor Author

Works.

"""Check if a database exists.

:param url: A SQLAlchemy engine URL.
:postgres_db: Only applies to postgres. List of databases to try to connect
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do this parameter apply to any other databases? In other words does for example Oracle or MSSQL have similar things that a default database should be provided in order to check the existence of another database. If so, we should use some different naming for this.

Furthermore the description of the parameter suggests this parameter is a list of values but the name does not. Either the parameter should be named postgres_databases (or something similar) OR this parameter could be string.

I'm leaning towards the latter option as the end user who is using this function should know which default database to use and just provide that as a string parameter for this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do this parameter apply to any other databases?

I have no idea. But it seems to be postgres specific (I also found no issue that indicates such a problem for other databases). Still - if desired - I could change the parameter name to databases and indicate in the docs that it currently only applies to postgres.

Furthermore the description of the parameter suggests this parameter is a list of values but the name does not.

I would go for postgres_databases. The list has the advantage will work for many settings automatically.

engine.dispose()
return result
with engine.connect() as conn:
return conn.scalar(sql)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's only a two-liner, is the inline function still needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say so. Still avoids some code duplication. Do you have some alternative in mind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, but personally I don't really like dynamically defined functions. Maybe move it to the module level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personally I don't really like dynamically defined functions

Can do. Out of interest: is there a downside, apart from violating style preferences?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is created every time and that most likely eats up a bit of performance.

@kvesteri
Copy link
Owner

kvesteri commented Aug 3, 2020

The more I think about this the more I'm convinced that we should change the function signature of create_database(url), drop_database(url) and database_exists(url) to following:

drop_database(engine_or_connection, database_name)
create_database(engine_or_connection, database_name)
drop_database(engine_or_connection, database_name)

This would solve the issue raised in this PR. It would also solve #467 and it would have the benefit of being able to use existing connection / engine. What do you guys think?

@bernt-matthias
Copy link
Contributor Author

Thanks for the feedback @kvesteri .

I guess I agree .. given also the code duplication for the engine creation in the current implementation of the three functions.

But I think one would also like to add a function to establish a connection (or engine) -- that could be passed to the three functions. Otherwise one would just delegate the "complexity" to establish a connection which is currently hidden in the three functions to the calling code.

Alternatively one could also add such a function and call it in each of the three function -- just to remove the code duplication. This would have the advantage that the API is unchanged.

@ziima
Copy link
Contributor

ziima commented Aug 4, 2020

I would definitely keep the current API, since the most common use case, in my opinion, is most likely to check/create/drop a database using the same connection string as is used in create_engine. I find that extremely helpful.

On the other hand, I agree that a new set of functions *_database_internal(engine_or_connection, database_name) could help to solve some of the issues and reduce some of the complexity of the current implementation.

@kvesteri
Copy link
Owner

kvesteri commented Aug 4, 2020

@ziima I'd like to hear more of your reasoning since I don't find it cumbersome at all to first create an engine / connection object and then re-use that in subsequent check / create / drop calls.

I do find the proposed solution quite cumbersome though with a postgresql specific keyword argument. The solution proposed in this PR does not solve #467 . It can't reuse an existing connection (= creates unnecessary connections). Furthermore I want the function signature to avoid database specific parameters unless absolutely necessary.

@ziima
Copy link
Contributor

ziima commented Aug 5, 2020

In my use case I create a test database when running unittests, and as such I try to keep it as simple as possible.

Minimal working example

import unittest

from sqlalchemy import create_engine
from sqlalchemy_utils.functions import (create_database, database_exists, drop_database)

TEST_DATABASE = 'postgresql:///test_db'

class MyTest(unittest.TestCase):
    def setUp(self):
        if not database_exists(TEST_DATABASE):
            create_database(TEST_DATABASE)

    def tearDown(self):
        drop_database(TEST_DATABASE)

    def test_foo(self):
        engine = create_engine(TEST_DATABASE)
        self.assertTrue(False)

I don't know much about other databases than sqlite3 and postgres, but using your proposal, the example would expanded significantly:

import unittest
from copy import copy

from sqlalchemy import create_engine
from sqlalchemy.engine.url import make_url
from sqlalchemy_utils.functions import (create_database, database_exists, drop_database)

TEST_DATABASE = 'postgresql:///test_db'

class MyTest(unittest.TestCase):
    def setUp(self):
        test_database_url = make_url(TEST_DATABASE)
        self.test_db_name = test_database_url.database
        # Here the whole backend specific code would have been
        admin_database = copy(test_database_url)
        if test_database_url.drivername == 'postgresql':
            admin_database.database = 'template1'  # Still simple version, I ignore other possible database names and a case for #467
        # elif...
        self.admin_connection = create_engine(admin_database)
        if not database_exists(self.admin_connection, self.test_db_name):
            create_database(self.admin_connection, self.test_db_name)

    def tearDown(self):
        drop_database(self.admin_connection, self.test_db_name)

    def test_foo(self):
        engine = create_engine(TEST_DATABASE)
        self.assertTrue(False)

As @bernt-matthias noted above, the complexity would just have to be handled before the functions are called. That's not very handy.

Regarding the problems you mentioned, they might be easier to manage if we create some sort of DatabaseManager with methods check, create and drop. It could maintain a connection between the subsequent operations. The current API could be kept for simple cases, where you don't care about connection reusability.

@jakabk
Copy link

jakabk commented Oct 8, 2020

Waiting for release!

@kvesteri
Copy link
Owner

kvesteri commented Dec 1, 2020

@ziima the problems you referred to in your comment seem to be limitations of UnitTest. With pytest for example, one could just use fixtures and reuse connection fixture on subsequent function calls. Thus I'd like to see this changed so that the functions would take connection / engine as the first parameter. Introducing DatabaseManager seems counter-intuitive and a bit clumsy.

@bernt-matthias
Copy link
Contributor Author

Thanks for coming back to this @kvesteri

Sounds like a larger change and I'm not sure if I will be able to implement this since actually I have never used the sqlalchemy library as a programmer (it's just used in a project that I'm involved in). So I have for instance no idea of the concepts behind an engine.

With a more precise plan I might give it a go. Maybe something like pseudo code for the at least one of the functions...

@ziima
Copy link
Contributor

ziima commented Dec 2, 2020

@kvesteri I didn't quite befriended pytests, but it seems it would require the same setup, just using other tools. Do you have any particular use case in mind?

Comment on lines 156 to +157
import random
import string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this standard library imports just go to the top of the file?


elif engine.dialect.name == 'mysql':
if databases is None:
databases = ('postgres', 'template0', 'template1', None)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If url passed into this function provided a db name, for example:

postgresql://user:[email protected]/dbname

Wouldn't we want to check dbname first here?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok nevermind, I see that the SQL command provided by text will handle that.

@nsoranzo
Copy link
Contributor

FYI, in the https://github.com/nsoranzo/sqlalchemy-utils/tree/sqlalchemy14 branch I'm trying to combine this PR with #487 (and more).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0.36.8 breaks database_exists() call
8 participants