-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Improve caching of work_group lookup #678
Conversation
sorry, will undo the code formatting and repush |
8386f54
to
29126e9
Compare
There, that's a better diff |
I don't understand why the test has failed. It doesn't seem like this change should cause a build to suddenly succeed (which is what I think has caused the test failure?) |
@namelessjon the functional test issue it's a random one due to some iceberg parallel testing - I re-trigger it, and it should work fine. |
Fixes dbt-labs#676 This moves client creation into the cached method, which makes the lru_cache depend only on the work_group name (which it has to depend on, as that drives the output) and the AthenaAdapter instance (via `self`, which is instantiated once per dbt thread it seems) but not the athena client (which gets instantiated once per lookup in the original implementation).
42f089b
to
9cd4995
Compare
That should now be fixed. Sorry, I'd not setup pre-commit |
@namelessjon don't worry, thanks for the quick fix. I think that what you propose make sense. Given the fact that the function cached has always the same workgroup input, there should be only 1 value in the cache. |
Yeah, exactly. Because this is caching an instance method, we actually end up with what seems to be 1 per thread (as the adapter itself is also part of the cache key) but I can live with that. Or perhaps the caching is per thread anyway with EDIT: and from my local testing, the memory usage caps out at the same ~400Mb I observed with configuring the decorator like |
Ah right, the amount of items in the cache it's the number of the threads, so you have a way to indirect controlling it. Let's merge this one, and consider if really necessary to make the amount of the cache controllable via a parameter if really necessary. Well done. |
Thanks for merging Reading some more, the multiple hits might be as simple as "the 4 threads make 4 simultaneous calls, and so the cache is not yet set" |
Yes, adding some debug logging about cache hits confirms that's the case. I see 4 cache misses, but the current size remains at 1 after all the calls stop |
This PR contains the following updates: | Package | Update | Change | |---|---|---| | [dbt-athena-community](https://togithub.com/dbt-athena/dbt-athena) | patch | `==1.8.2` -> `==1.8.3` | --- ### Release Notes <details> <summary>dbt-athena/dbt-athena (dbt-athena-community)</summary> ### [`v1.8.3`](https://togithub.com/dbt-athena/dbt-athena/releases/tag/v1.8.3) [Compare Source](https://togithub.com/dbt-athena/dbt-athena/compare/v1.8.2...v1.8.3) #### What's Changed ##### Features - feat: Update `AthenaColumn` to parse array types by [@​jeancochrane](https://togithub.com/jeancochrane) in [https://github.com/dbt-athena/dbt-athena/pull/672](https://togithub.com/dbt-athena/dbt-athena/pull/672) ##### Fixes - fix: edge case fix on iceberg target table being dropped instead on being renamed first by [@​nicor88](https://togithub.com/nicor88) in [https://github.com/dbt-athena/dbt-athena/pull/667](https://togithub.com/dbt-athena/dbt-athena/pull/667) - fix: Improve caching of work_group lookup by [@​namelessjon](https://togithub.com/namelessjon) in [https://github.com/dbt-athena/dbt-athena/pull/678](https://togithub.com/dbt-athena/dbt-athena/pull/678) ##### Chore - chore: improve function tests for iceberg retries by [@​nicor88](https://togithub.com/nicor88) in [https://github.com/dbt-athena/dbt-athena/pull/674](https://togithub.com/dbt-athena/dbt-athena/pull/674) - chore: increase num_iceberg retries by [@​nicor88](https://togithub.com/nicor88) in [https://github.com/dbt-athena/dbt-athena/pull/679](https://togithub.com/dbt-athena/dbt-athena/pull/679) - chore: prepare release 1.8.3 by [@​nicor88](https://togithub.com/nicor88) in [https://github.com/dbt-athena/dbt-athena/pull/680](https://togithub.com/dbt-athena/dbt-athena/pull/680) ##### Dependencies - chore: Update moto requirement from ~=5.0.8 to ~=5.0.9 by [@​dependabot](https://togithub.com/dependabot) in [https://github.com/dbt-athena/dbt-athena/pull/666](https://togithub.com/dbt-athena/dbt-athena/pull/666) - chore: Update pyupgrade requirement from ~=3.15 to ~=3.16 by [@​dependabot](https://togithub.com/dependabot) in [https://github.com/dbt-athena/dbt-athena/pull/669](https://togithub.com/dbt-athena/dbt-athena/pull/669) - chore: Update flake8 requirement from ~=7.0 to ~=7.1 by [@​dependabot](https://togithub.com/dependabot) in [https://github.com/dbt-athena/dbt-athena/pull/670](https://togithub.com/dbt-athena/dbt-athena/pull/670) - chore: Update dbt-tests-adapter requirement from ~=1.8.0 to ~=1.9.1 by [@​dependabot](https://togithub.com/dependabot) in [https://github.com/dbt-athena/dbt-athena/pull/673](https://togithub.com/dbt-athena/dbt-athena/pull/673) ##### Docs - chore: Added information about location table to readme ([#​628](https://togithub.com/dbt-athena/dbt-athena/issues/628)) by [@​cuff-links](https://togithub.com/cuff-links) in [https://github.com/dbt-athena/dbt-athena/pull/665](https://togithub.com/dbt-athena/dbt-athena/pull/665) #### New Contributors - [@​cuff-links](https://togithub.com/cuff-links) made their first contribution in [https://github.com/dbt-athena/dbt-athena/pull/665](https://togithub.com/dbt-athena/dbt-athena/pull/665) - [@​namelessjon](https://togithub.com/namelessjon) made their first contribution in [https://github.com/dbt-athena/dbt-athena/pull/678](https://togithub.com/dbt-athena/dbt-athena/pull/678) **Full Changelog**: dbt-labs/dbt-athena@v1.8.2...v1.8.3 </details> --- ### Configuration 📅 **Schedule**: Branch creation - "before 4am on the first day of the month" (UTC), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://togithub.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy40MjEuMiIsInVwZGF0ZWRJblZlciI6IjM3LjQyMS4yIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJhdXRvbWVyZ2UiXX0=-->
Description
Fixes #676
This moves client creation into the cached method, which makes the lru_cache depend only on the work_group name (which it has to depend on, as that drives the output) and the AthenaAdapter instance (via
self
, which is instantiated once per dbt thread it seems) but not the athena client (which gets instantiated once per lookup in the original implementation).Checklist