Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Improve caching of work_group lookup #678

Merged
merged 1 commit into from
Jun 27, 2024

Conversation

namelessjon
Copy link
Contributor

Description

Fixes #676

This moves client creation into the cached method, which makes the lru_cache depend only on the work_group name (which it has to depend on, as that drives the output) and the AthenaAdapter instance (via self, which is instantiated once per dbt thread it seems) but not the athena client (which gets instantiated once per lookup in the original implementation).

Checklist

  • You followed contributing section
  • You kept your Pull Request small and focused on a single feature or bug fix.
  • You added unit testing when necessary
  • You added functional testing when necessary

@namelessjon namelessjon changed the title Improve caching of work_group lookup fix: Improve caching of work_group lookup Jun 27, 2024
@namelessjon
Copy link
Contributor Author

sorry, will undo the code formatting and repush

@nicor88 nicor88 added the enable-functional-tests Label to trigger functional testing label Jun 27, 2024
@namelessjon namelessjon force-pushed the fix_workgroup_caching branch 2 times, most recently from 8386f54 to 29126e9 Compare June 27, 2024 13:22
@namelessjon
Copy link
Contributor Author

There, that's a better diff

@namelessjon
Copy link
Contributor Author

I don't understand why the test has failed. It doesn't seem like this change should cause a build to suddenly succeed (which is what I think has caused the test failure?)

@nicor88
Copy link
Contributor

nicor88 commented Jun 27, 2024

@namelessjon the functional test issue it's a random one due to some iceberg parallel testing - I re-trigger it, and it should work fine.
but have a look at the pre-commit checks, there we have a linting issue I guess.
Edit: I run this pre-commit run --files dbt/adapters/athena/impl.py to make pre-commit happy.

Fixes dbt-labs#676

This moves client creation into the cached method, which makes the
lru_cache depend only on the work_group name (which it has to depend on,
as that drives the output) and the AthenaAdapter instance (via `self`,
which is instantiated once per dbt thread it seems) but not the athena
client (which gets instantiated once per lookup in the original
implementation).
@namelessjon
Copy link
Contributor Author

That should now be fixed. Sorry, I'd not setup pre-commit

@nicor88
Copy link
Contributor

nicor88 commented Jun 27, 2024

@namelessjon don't worry, thanks for the quick fix.

I think that what you propose make sense. Given the fact that the function cached has always the same workgroup input, there should be only 1 value in the cache.
Before it wasn't the case because for each model we had a different athena client for every call (where each client had a different signature), that was leading to many value in the cache equal to the amount of model runs/tests.

@namelessjon
Copy link
Contributor Author

namelessjon commented Jun 27, 2024

Yeah, exactly. Because this is caching an instance method, we actually end up with what seems to be 1 per thread (as the adapter itself is also part of the cache key) but I can live with that. Or perhaps the caching is per thread anyway with lru_cache? Whatever the reason for the 4/8 calls rather than the 1 it seems it should be, it's a lot better than one per model.

EDIT: and from my local testing, the memory usage caps out at the same ~400Mb I observed with configuring the decorator like @lru_cache(maxsize=4)

@nicor88
Copy link
Contributor

nicor88 commented Jun 27, 2024

Ah right, the amount of items in the cache it's the number of the threads, so you have a way to indirect controlling it.

Let's merge this one, and consider if really necessary to make the amount of the cache controllable via a parameter if really necessary. Well done.

@nicor88 nicor88 merged commit ab93867 into dbt-labs:main Jun 27, 2024
10 checks passed
@namelessjon
Copy link
Contributor Author

Thanks for merging

Reading some more, the multiple hits might be as simple as "the 4 threads make 4 simultaneous calls, and so the cache is not yet set"

@namelessjon
Copy link
Contributor Author

Yes, adding some debug logging about cache hits confirms that's the case. I see 4 cache misses, but the current size remains at 1 after all the calls stop

@namelessjon namelessjon deleted the fix_workgroup_caching branch June 27, 2024 14:52
kodiakhq bot referenced this pull request in cloudquery/policies Jul 1, 2024
This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [dbt-athena-community](https://togithub.com/dbt-athena/dbt-athena) | patch | `==1.8.2` -> `==1.8.3` |

---

### Release Notes

<details>
<summary>dbt-athena/dbt-athena (dbt-athena-community)</summary>

### [`v1.8.3`](https://togithub.com/dbt-athena/dbt-athena/releases/tag/v1.8.3)

[Compare Source](https://togithub.com/dbt-athena/dbt-athena/compare/v1.8.2...v1.8.3)

#### What's Changed

##### Features

-   feat: Update `AthenaColumn` to parse array types by [@&#8203;jeancochrane](https://togithub.com/jeancochrane) in [https://github.com/dbt-athena/dbt-athena/pull/672](https://togithub.com/dbt-athena/dbt-athena/pull/672)

##### Fixes

-   fix: edge case fix on iceberg target table being dropped instead on being renamed first by [@&#8203;nicor88](https://togithub.com/nicor88) in [https://github.com/dbt-athena/dbt-athena/pull/667](https://togithub.com/dbt-athena/dbt-athena/pull/667)
-   fix: Improve caching of work_group lookup by [@&#8203;namelessjon](https://togithub.com/namelessjon) in [https://github.com/dbt-athena/dbt-athena/pull/678](https://togithub.com/dbt-athena/dbt-athena/pull/678)

##### Chore

-   chore: improve function tests for iceberg retries by [@&#8203;nicor88](https://togithub.com/nicor88) in [https://github.com/dbt-athena/dbt-athena/pull/674](https://togithub.com/dbt-athena/dbt-athena/pull/674)
-   chore: increase num_iceberg retries by [@&#8203;nicor88](https://togithub.com/nicor88) in [https://github.com/dbt-athena/dbt-athena/pull/679](https://togithub.com/dbt-athena/dbt-athena/pull/679)
-   chore: prepare release 1.8.3 by [@&#8203;nicor88](https://togithub.com/nicor88) in [https://github.com/dbt-athena/dbt-athena/pull/680](https://togithub.com/dbt-athena/dbt-athena/pull/680)

##### Dependencies

-   chore: Update moto requirement from ~=5.0.8 to ~=5.0.9 by [@&#8203;dependabot](https://togithub.com/dependabot) in [https://github.com/dbt-athena/dbt-athena/pull/666](https://togithub.com/dbt-athena/dbt-athena/pull/666)
-   chore: Update pyupgrade requirement from ~=3.15 to ~=3.16 by [@&#8203;dependabot](https://togithub.com/dependabot) in [https://github.com/dbt-athena/dbt-athena/pull/669](https://togithub.com/dbt-athena/dbt-athena/pull/669)
-   chore: Update flake8 requirement from ~=7.0 to ~=7.1 by [@&#8203;dependabot](https://togithub.com/dependabot) in [https://github.com/dbt-athena/dbt-athena/pull/670](https://togithub.com/dbt-athena/dbt-athena/pull/670)
-   chore: Update dbt-tests-adapter requirement from ~=1.8.0 to ~=1.9.1 by [@&#8203;dependabot](https://togithub.com/dependabot) in [https://github.com/dbt-athena/dbt-athena/pull/673](https://togithub.com/dbt-athena/dbt-athena/pull/673)

##### Docs

-   chore: Added information about location table to readme ([#&#8203;628](https://togithub.com/dbt-athena/dbt-athena/issues/628)) by [@&#8203;cuff-links](https://togithub.com/cuff-links) in [https://github.com/dbt-athena/dbt-athena/pull/665](https://togithub.com/dbt-athena/dbt-athena/pull/665)

#### New Contributors

-   [@&#8203;cuff-links](https://togithub.com/cuff-links) made their first contribution in [https://github.com/dbt-athena/dbt-athena/pull/665](https://togithub.com/dbt-athena/dbt-athena/pull/665)
-   [@&#8203;namelessjon](https://togithub.com/namelessjon) made their first contribution in [https://github.com/dbt-athena/dbt-athena/pull/678](https://togithub.com/dbt-athena/dbt-athena/pull/678)

**Full Changelog**: dbt-labs/dbt-athena@v1.8.2...v1.8.3

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "before 4am on the first day of the month" (UTC), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://togithub.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy40MjEuMiIsInVwZGF0ZWRJblZlciI6IjM3LjQyMS4yIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJhdXRvbWVyZ2UiXX0=-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enable-functional-tests Label to trigger functional testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] dbt-athena leaks memory via workgroups cache
2 participants