Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix database existence check #463

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
48 changes: 31 additions & 17 deletions sqlalchemy_utils/functions/database.py
Original file line number Diff line number Diff line change
Expand Up @@ -420,10 +420,12 @@ def is_auto_assigned_date_column(column):
)


def database_exists(url):
def database_exists(url, postgres_db=None):
"""Check if a database exists.

:param url: A SQLAlchemy engine URL.
:postgres_db: Only applies to postgres. List of databases to try to connect
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do this parameter apply to any other databases? In other words does for example Oracle or MSSQL have similar things that a default database should be provided in order to check the existence of another database. If so, we should use some different naming for this.

Furthermore the description of the parameter suggests this parameter is a list of values but the name does not. Either the parameter should be named postgres_databases (or something similar) OR this parameter could be string.

I'm leaning towards the latter option as the end user who is using this function should know which default database to use and just provide that as a string parameter for this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do this parameter apply to any other databases?

I have no idea. But it seems to be postgres specific (I also found no issue that indicates such a problem for other databases). Still - if desired - I could change the parameter name to databases and indicate in the docs that it currently only applies to postgres.

Furthermore the description of the parameter suggests this parameter is a list of values but the name does not.

I would go for postgres_databases. The list has the advantage will work for many settings automatically.

to.

Performs backend-specific testing to quickly determine if a database
exists on the server. ::
Expand Down Expand Up @@ -459,41 +461,53 @@ def sqlite_file_exists(database):

url = copy(make_url(url))
database, url.database = url.database, None
engine = sa.create_engine(url)
engine = None
dialect_name = url.get_dialect().name

if engine.dialect.name == 'postgresql':
if dialect_name == 'postgresql':
ret = False
text = "SELECT 1 FROM pg_database WHERE datname='%s'" % database
return bool(get_scalar_result(engine, text))
if postgres_db is None:
postgres_db = ('postgres', 'template0', 'template1', None)
for pdb in postgres_db:
bernt-matthias marked this conversation as resolved.
Show resolved Hide resolved
url.database = pdb
engine = sa.create_engine(url)
try:
ret = bool(get_scalar_result(engine, text))
break
except (ProgrammingError, OperationalError):
engine.dispose()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to set engine to None here -- is it safe to call dispose() twice on an engine? A second dispose() will be called on line 510

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true. how about this e481582

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think now we aren't disposing engines between each iteration of the loop if there are not exceptions. I'm not the author of this code, so don't want to offer too many opinions, but this code looks a little brittle in terms of the manual calls to Engine.dispose(). I would just use a context manager here...one that called dispose() in exit and then use with everywhere I created an Engine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand https://docs.sqlalchemy.org/en/14/core/connections.html?highlight=dispose#connectionless-execution-implicit-execution correctly, the current code uses implicit connections (i.e. calling execute on the engine). This may be changed to

with engine.connect() as connection:
    connection.execute(...)

then we might no need to use dispose at all https://docs.sqlalchemy.org/en/14/core/connections.html?highlight=dispose#engine-disposal ..

but I'm not an sqlalchemy expert at all .. but I'm fine with implementing the change if you agree

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also dispose is called in get_scalar_result .. i.e. in the case of a successful existence check dispose is currently called twice... but according to the docs it just closes connections.

Copy link

@jtbeach jtbeach Jul 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do add a context manager similar to contextlib.closing (https://docs.python.org/3/library/contextlib.html#contextlib.closing) that does this:

from contextlib import contextmanager

@contextmanager
def disposing_engine(engine):
    try:
        yield engine
    finally:
        engine.dispose()

That way you can just write:

for pdb in postgres_db:
    url.database = pdb
    with disposing_engine(sa.create_engine(url)) as engine:
        if get_scalar_result(engine, text):
            return True
return False

You can follow this pattern for other dialects as well so you don't need an engine var or 'ret` var.

elif dialect_name == 'mysql':
    with disposing_engine(sa.create_engine(url)) as engine:
        text = ...
        return bool(get_scalar_result(engine, text))

It seems a little strange that get_scalar_result would call Engine.dispose() but seems like it is safe to call twice then

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanations. I tried a slightly different way in 6b5cdb2

  1. use a connection for the execution and close the connection (via with). Already with this there should be no need to dispose the engine (since there should be no open connections in the pool that need to be closed).
  2. explicitly use the Null connection pool .. just to be sure.

Lets see if this passes tests.

engine = None

elif engine.dialect.name == 'mysql':
elif dialect_name == 'mysql':
engine = sa.create_engine(url)
text = ("SELECT SCHEMA_NAME FROM INFORMATION_SCHEMA.SCHEMATA "
"WHERE SCHEMA_NAME = '%s'" % database)
return bool(get_scalar_result(engine, text))
ret = bool(get_scalar_result(engine, text))

elif engine.dialect.name == 'sqlite':
elif dialect_name == 'sqlite':
engine = sa.create_engine(url)
if database:
return database == ':memory:' or sqlite_file_exists(database)
ret = database == ':memory:' or sqlite_file_exists(database)
else:
# The default SQLAlchemy database is in memory,
# and :memory is not required, thus we should support that use-case
return True

ret = True
else:
engine.dispose()
engine = None
text = 'SELECT 1'
try:
url.database = database
engine = sa.create_engine(url)
result = engine.execute(text)
result.close()
return True
ret = True

except (ProgrammingError, OperationalError):
return False
finally:
if engine is not None:
engine.dispose()
ret = False

if engine is not None:
engine.dispose()
return ret


def create_database(url, encoding='utf8', template=None):
Expand Down