-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix database existence check #463
base: master
Are you sure you want to change the base?
fix database existence check #463
Conversation
follow up to kvesteri#372
Other than the than the disposal of the engines, I think this should work fine 👍 |
3243079
to
cc53be0
Compare
Wondering if one could extend a test case such that a postgres DB without CONNECT privilege for the postgres database is tested.
|
Co-authored-by: Nicola Soranzo <[email protected]>
cc53be0
to
15dc668
Compare
Co-authored-by: Nicola Soranzo <[email protected]>
- postgres: return for the first positive test - use immutable for default argument
…s/sqlalchemy-utils into topic/372-followup
Co-authored-by: Nicola Soranzo <[email protected]>
Co-authored-by: Nicola Soranzo <[email protected]>
except (ProgrammingError, OperationalError): | ||
pass | ||
finally: | ||
engine.dispose() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to set engine to None here -- is it safe to call dispose() twice on an engine? A second dispose() will be called on line 510
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true. how about this e481582
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think now we aren't disposing engines between each iteration of the loop if there are not exceptions. I'm not the author of this code, so don't want to offer too many opinions, but this code looks a little brittle in terms of the manual calls to Engine.dispose(). I would just use a context manager here...one that called dispose() in exit and then use with
everywhere I created an Engine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand https://docs.sqlalchemy.org/en/14/core/connections.html?highlight=dispose#connectionless-execution-implicit-execution correctly, the current code uses implicit connections (i.e. calling execute on the engine). This may be changed to
with engine.connect() as connection:
connection.execute(...)
then we might no need to use dispose
at all https://docs.sqlalchemy.org/en/14/core/connections.html?highlight=dispose#engine-disposal ..
but I'm not an sqlalchemy expert at all .. but I'm fine with implementing the change if you agree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also dispose is called in get_scalar_result
.. i.e. in the case of a successful existence check dispose is currently called twice... but according to the docs it just closes connections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would do add a context manager similar to contextlib.closing
(https://docs.python.org/3/library/contextlib.html#contextlib.closing) that does this:
from contextlib import contextmanager
@contextmanager
def disposing_engine(engine):
try:
yield engine
finally:
engine.dispose()
That way you can just write:
for pdb in postgres_db:
url.database = pdb
with disposing_engine(sa.create_engine(url)) as engine:
if get_scalar_result(engine, text):
return True
return False
You can follow this pattern for other dialects as well so you don't need an engine
var or 'ret` var.
elif dialect_name == 'mysql':
with disposing_engine(sa.create_engine(url)) as engine:
text = ...
return bool(get_scalar_result(engine, text))
It seems a little strange that get_scalar_result would call Engine.dispose() but seems like it is safe to call twice then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed explanations. I tried a slightly different way in 6b5cdb2
- use a connection for the execution and close the connection (via
with
). Already with this there should be no need to dispose the engine (since there should be no open connections in the pool that need to be closed). - explicitly use the Null connection pool .. just to be sure.
Lets see if this passes tests.
6b5cdb2
to
fd6b773
Compare
- use a connection (which is closed automatically) for data base existence check - explicitely use Null connection pool already with the 1st change disposal of the engine (which closes all open connections) is not necessary anymore. with the second change we are completely sure.
fd6b773
to
74b3513
Compare
Tests are passing. Will test now the postgres part with Galaxy. |
Works. |
"""Check if a database exists. | ||
|
||
:param url: A SQLAlchemy engine URL. | ||
:postgres_db: Only applies to postgres. List of databases to try to connect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do this parameter apply to any other databases? In other words does for example Oracle or MSSQL have similar things that a default database should be provided in order to check the existence of another database. If so, we should use some different naming for this.
Furthermore the description of the parameter suggests this parameter is a list of values but the name does not. Either the parameter should be named postgres_databases
(or something similar) OR this parameter could be string.
I'm leaning towards the latter option as the end user who is using this function should know which default database to use and just provide that as a string parameter for this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do this parameter apply to any other databases?
I have no idea. But it seems to be postgres specific (I also found no issue that indicates such a problem for other databases). Still - if desired - I could change the parameter name to databases
and indicate in the docs that it currently only applies to postgres.
Furthermore the description of the parameter suggests this parameter is a list of values but the name does not.
I would go for postgres_databases
. The list has the advantage will work for many settings automatically.
engine.dispose() | ||
return result | ||
with engine.connect() as conn: | ||
return conn.scalar(sql) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's only a two-liner, is the inline function still needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say so. Still avoids some code duplication. Do you have some alternative in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, but personally I don't really like dynamically defined functions. Maybe move it to the module level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
personally I don't really like dynamically defined functions
Can do. Out of interest: is there a downside, apart from violating style preferences?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function is created every time and that most likely eats up a bit of performance.
The more I think about this the more I'm convinced that we should change the function signature of
This would solve the issue raised in this PR. It would also solve #467 and it would have the benefit of being able to use existing connection / engine. What do you guys think? |
Thanks for the feedback @kvesteri . I guess I agree .. given also the code duplication for the engine creation in the current implementation of the three functions. But I think one would also like to add a function to establish a connection (or engine) -- that could be passed to the three functions. Otherwise one would just delegate the "complexity" to establish a connection which is currently hidden in the three functions to the calling code. Alternatively one could also add such a function and call it in each of the three function -- just to remove the code duplication. This would have the advantage that the API is unchanged. |
I would definitely keep the current API, since the most common use case, in my opinion, is most likely to check/create/drop a database using the same connection string as is used in On the other hand, I agree that a new set of functions |
@ziima I'd like to hear more of your reasoning since I don't find it cumbersome at all to first create an engine / connection object and then re-use that in subsequent I do find the proposed solution quite cumbersome though with a postgresql specific keyword argument. The solution proposed in this PR does not solve #467 . It can't reuse an existing connection (= creates unnecessary connections). Furthermore I want the function signature to avoid database specific parameters unless absolutely necessary. |
In my use case I create a test database when running unittests, and as such I try to keep it as simple as possible. Minimal working example import unittest
from sqlalchemy import create_engine
from sqlalchemy_utils.functions import (create_database, database_exists, drop_database)
TEST_DATABASE = 'postgresql:///test_db'
class MyTest(unittest.TestCase):
def setUp(self):
if not database_exists(TEST_DATABASE):
create_database(TEST_DATABASE)
def tearDown(self):
drop_database(TEST_DATABASE)
def test_foo(self):
engine = create_engine(TEST_DATABASE)
self.assertTrue(False) I don't know much about other databases than sqlite3 and postgres, but using your proposal, the example would expanded significantly: import unittest
from copy import copy
from sqlalchemy import create_engine
from sqlalchemy.engine.url import make_url
from sqlalchemy_utils.functions import (create_database, database_exists, drop_database)
TEST_DATABASE = 'postgresql:///test_db'
class MyTest(unittest.TestCase):
def setUp(self):
test_database_url = make_url(TEST_DATABASE)
self.test_db_name = test_database_url.database
# Here the whole backend specific code would have been
admin_database = copy(test_database_url)
if test_database_url.drivername == 'postgresql':
admin_database.database = 'template1' # Still simple version, I ignore other possible database names and a case for #467
# elif...
self.admin_connection = create_engine(admin_database)
if not database_exists(self.admin_connection, self.test_db_name):
create_database(self.admin_connection, self.test_db_name)
def tearDown(self):
drop_database(self.admin_connection, self.test_db_name)
def test_foo(self):
engine = create_engine(TEST_DATABASE)
self.assertTrue(False) As @bernt-matthias noted above, the complexity would just have to be handled before the functions are called. That's not very handy. Regarding the problems you mentioned, they might be easier to manage if we create some sort of |
Waiting for release! |
@ziima the problems you referred to in your comment seem to be limitations of UnitTest. With pytest for example, one could just use fixtures and reuse connection fixture on subsequent function calls. Thus I'd like to see this changed so that the functions would take connection / engine as the first parameter. Introducing DatabaseManager seems counter-intuitive and a bit clumsy. |
Thanks for coming back to this @kvesteri Sounds like a larger change and I'm not sure if I will be able to implement this since actually I have never used the sqlalchemy library as a programmer (it's just used in a project that I'm involved in). So I have for instance no idea of the concepts behind an engine. With a more precise plan I might give it a go. Maybe something like pseudo code for the at least one of the functions... |
@kvesteri I didn't quite befriended pytests, but it seems it would require the same setup, just using other tools. Do you have any particular use case in mind? |
import random | ||
import string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this standard library imports just go to the top of the file?
|
||
elif engine.dialect.name == 'mysql': | ||
if databases is None: | ||
databases = ('postgres', 'template0', 'template1', None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If url
passed into this function provided a db name, for example:
postgresql://user:[email protected]/dbname
Wouldn't we want to check dbname
first here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok nevermind, I see that the SQL command provided by text
will handle that.
FYI, in the https://github.com/nsoranzo/sqlalchemy-utils/tree/sqlalchemy14 branch I'm trying to combine this PR with #487 (and more). |
follow up to #372
fixes #462
TODO:
"postgres"
used the else branch before anyway