Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sqlalchemy use connection pooling for catalog db #710

Merged
merged 5 commits into from
Apr 25, 2024

Conversation

nmaytan
Copy link
Contributor

@nmaytan nmaytan commented Apr 5, 2024

This PR results from the investigation done in #663, as well as a group coding session to implement a new benchmark for the catalog db (especially with input from @genematx, @Kezzsim, @danielballan).

I spent some extra time experimenting with asv to understand better what it does before putting up this PR. The new benchmark can be singled out by running: asv run --environment existing:same --bench "time_repeated_write".

While some initial benchmarks seemed to show a modest performance improvement from connection pooling, I found (after running additional benchmarks) that this improvement was really just variance in the benchmark results. Instead, benchmarks seem to show that performance is ~identical regardless of connection pooling. This at least means that there is no apparent regression, which was our primary concern.

asv has several parameters that can adjust how benchmarks are taken, with the ability to customize things like warmup time, rounds per benchmark, samples per round, benchmark timeouts, etc. I played with these settings to see if they had any impact on our results, and it seemed the answer is "no".

It is worth pointing out that asv defaults to timeit.default_timer - and so it should measure the time spent in subprocesses. I was worried about this, since aiosqlite drops into a thread (per db connection).

@danielballan
Copy link
Member

For test failures, it looks like you are missing an import:

        engine = create_async_engine(
>           uri, echo=echo, json_serializer=json_serializer, poolclass=AsyncAdaptedQueuePool
        )
E       NameError: name 'AsyncAdaptedQueuePool' is not defined

For linter issues:

pip install pre-commit
pre-commit install  # set up git hooks
pre-commit run --all-files
git commit -am "Fix linter issues."

@nmaytan
Copy link
Contributor Author

nmaytan commented Apr 5, 2024

Sorry, silly blunder.

@danielballan
Copy link
Member

I see some pytest failures. They look related but it is not immediately obvious to me what broke.

@nmaytan
Copy link
Contributor Author

nmaytan commented Apr 5, 2024

Correction: I had misunderstood that asv was running against the version of tiled installed in my environment, and not against the local repository I was working in. Now performing proper benchmarks where the change between pooling / not pooling is actually captured, the speedup looks significant when using pooling:

Pool Run Time Uncertainty Unit
Yes 1 2.69 0.02 s
Yes 2 2.84 0.2 s
Yes 3 2.94 0.05 s
No 4 5.17 0.07 s
No 5 5.26 0.07 s
No 6 5.31 0.1 s
Yes 7 2.87 0.02 s
Yes 8 3.04 0.1 s
Yes 9 2.77 0.02 s
No 10 4.84 0 s
No 11 5.18 0.1 s
No 12 5.41 0.01 s

edit: this improvement scales, here is a run for range(1000) rather than 100:

Pool Run Time Uncertainty Unit
No 1 failed
Yes 2 25.8 0.3 s
No 3 54.5 0.3 s

The first run was too slow for the default timeout, and I had to increase it:

asv run --environment existing:same --bench "time_repeated_write" -a timeout=300

@danielballan
Copy link
Member

The test failures arose from trying to use a pool with an in-memory (:memory:) SQLite database. Making the pool conditional resolved the problem in a commit pushed above. With this change in place, I re-verified the benchmarks, before and after this diff:

diff --git a/tiled/catalog/adapter.py b/tiled/catalog/adapter.py
index e1c2f1af..b297155d 100644
--- a/tiled/catalog/adapter.py
+++ b/tiled/catalog/adapter.py
@@ -1293,7 +1293,7 @@ def from_uri(
         uri = f"sqlite+aiosqlite:///{uri}"
 
     parsed_url = make_url(uri)
-    if (parsed_url.get_dialect().name == "sqlite") and (
+    if False and (parsed_url.get_dialect().name == "sqlite") and (
         parsed_url.database != ":memory:"
     ):
         # For file-backed SQLite databases, connection pooling offers a
Pool Run Time Uncertainty Unit
Yes 1 1.36 0.01 s
Yes 2 1.51 0.04 s
Yes 3 1.43 0.01 s
No 4 2.04 0.05 s
No 5 1.97 0.04 s
No 6 2.07 0.01 s

@danielballan danielballan merged commit f44177e into bluesky:main Apr 25, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants