Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: vacuum more runs needed error #703

Merged
merged 2 commits into from
Aug 23, 2024
Merged

Conversation

Jrmyy
Copy link
Contributor

@Jrmyy Jrmyy commented Aug 13, 2024

When there is too many files to process during a vacuum, dbt model fails with this error: "ICEBERG_VACUUM_MORE_RUNS_NEEDED: Removed 20000 files in this round of vacuum, but there are more files remaining. Please run another VACUUM command to process the remaining files."

We apply therefore the same logic as we did for the optimize. There is also an attempt to gather the code since they have the same logic.

Description

Models used to test - Optional

Checklist

  • You followed contributing section
  • You kept your Pull Request small and focused on a single feature or bug fix.
  • You added unit testing when necessary
  • You added functional testing when necessary

When there is too many files to process during a vacuum, dbt model fails
with this error: "ICEBERG_VACUUM_MORE_RUNS_NEEDED: Removed 20000 files
in this round of vacuum, but there are more files remaining. Please
run another VACUUM command to process the remaining files."

We apply therefore the same logic as we did for the optimize. There
is also an attempt to gather the code since they have the same logic.
@Jrmyy Jrmyy added the enable-functional-tests Label to trigger functional testing label Aug 13, 2024
@nicor88
Copy link
Contributor

nicor88 commented Aug 13, 2024

@Jrmyy do we have a way to test this in the CI?

I can imaging to setup an iceberg table with vacuum_max_snapshot_age_seconds set to 1 second, then insert many times to the same table to lead to have a situation where iceberg has many snapshot to expire, and finally try to run the vacuum on an iceberg table with many commit.

PS: code looks good, I re-triggered the CI that randomly failed due to a functional test where we run concurrent iceberg inserts

@Jrmyy
Copy link
Contributor Author

Jrmyy commented Aug 22, 2024

I can try what you suggest ! Since the VACUUM fails when there are more than 20000 files to remove, it means we will have to insert a lot of lines ahah 🙈

I will give it a try and let you know if it works 🔥

@nicor88
Copy link
Contributor

nicor88 commented Aug 22, 2024

@Jrmyy I totally understand that reproducing a failure of a vacuum can be cumbersome, if we don't manager to reproduce it, well leave it like that.

@Jrmyy
Copy link
Contributor Author

Jrmyy commented Aug 22, 2024

I tried to create some sql queries which maximise entropy in order to generate as much files as possible to be sure vacuum will perform several times. But it did not scale very well.
With 100 epochs of doing this query, I managed to get "only" 1000 files, which is not close to the 20k files needed to an other round of vacuum. And it tooks near 7 minutes ...
So I think testing it will be counter-productive 😞

@nicor88
Copy link
Contributor

nicor88 commented Aug 22, 2024

Thanks @Jrmyy let's leave it as it is.

@Jrmyy Jrmyy marked this pull request as ready for review August 23, 2024 06:55
@nicor88 nicor88 merged commit 9b79444 into main Aug 23, 2024
10 checks passed
@nicor88 nicor88 deleted the fix-vacuum-more-runs-needed branch August 23, 2024 07:37
kodiakhq bot referenced this pull request in cloudquery/policies Sep 1, 2024
This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [dbt-athena-community](https://togithub.com/dbt-athena/dbt-athena) | patch | `==1.8.3` -> `==1.8.4` |

---

### Release Notes

<details>
<summary>dbt-athena/dbt-athena (dbt-athena-community)</summary>

### [`v1.8.4`](https://togithub.com/dbt-athena/dbt-athena/releases/tag/v1.8.4)

[Compare Source](https://togithub.com/dbt-athena/dbt-athena/compare/v1.8.3...v1.8.4)

#### What's Changed

##### Fixes

-   fix: Remove catalog from the DDL SQL generated by on_schema_change=sync_all_columns by [@&#8203;iconara](https://togithub.com/iconara) in [https://github.com/dbt-athena/dbt-athena/pull/684](https://togithub.com/dbt-athena/dbt-athena/pull/684)
-   fix: Query comment for create table statement by [@&#8203;sanromeo](https://togithub.com/sanromeo) in [https://github.com/dbt-athena/dbt-athena/pull/702](https://togithub.com/dbt-athena/dbt-athena/pull/702)
-   fix: remove leading whitespaces on post-hook operations by [@&#8203;sanromeo](https://togithub.com/sanromeo) in [https://github.com/dbt-athena/dbt-athena/pull/705](https://togithub.com/dbt-athena/dbt-athena/pull/705)
-   fix: vacuum more runs needed error by [@&#8203;Jrmyy](https://togithub.com/Jrmyy) in [https://github.com/dbt-athena/dbt-athena/pull/703](https://togithub.com/dbt-athena/dbt-athena/pull/703)

##### Dependencies

-   chore: Update dbt-tests-adapter requirement from ~=1.9.1 to ~=1.9.2 by [@&#8203;dependabot](https://togithub.com/dependabot) in [https://github.com/dbt-athena/dbt-athena/pull/687](https://togithub.com/dbt-athena/dbt-athena/pull/687)
-   chore: Update pytest requirement from ~=8.2 to ~=8.3 by [@&#8203;dependabot](https://togithub.com/dependabot) in [https://github.com/dbt-athena/dbt-athena/pull/690](https://togithub.com/dbt-athena/dbt-athena/pull/690)
-   chore: Update pyupgrade requirement from ~=3.16 to ~=3.17 by [@&#8203;dependabot](https://togithub.com/dependabot) in [https://github.com/dbt-athena/dbt-athena/pull/692](https://togithub.com/dbt-athena/dbt-athena/pull/692)
-   chore: Update tenacity requirement from ~=8.2 to >=8.2,<10.0 by [@&#8203;dependabot](https://togithub.com/dependabot) in [https://github.com/dbt-athena/dbt-athena/pull/693](https://togithub.com/dbt-athena/dbt-athena/pull/693)
-   chore: Update black requirement from ~=24.4 to ~=24.8 by [@&#8203;dependabot](https://togithub.com/dependabot) in [https://github.com/dbt-athena/dbt-athena/pull/694](https://togithub.com/dbt-athena/dbt-athena/pull/694)
-   chore: Update boto3-stubs\[s3] requirement from ~=1.34 to ~=1.35 by [@&#8203;dependabot](https://togithub.com/dependabot) in [https://github.com/dbt-athena/dbt-athena/pull/707](https://togithub.com/dbt-athena/dbt-athena/pull/707)
-   chore: Update moto requirement from ~=5.0.12 to ~=5.0.13 by [@&#8203;dependabot](https://togithub.com/dependabot) in [https://github.com/dbt-athena/dbt-athena/pull/708](https://togithub.com/dbt-athena/dbt-athena/pull/708)
-   chore: Update pyparsing requirement from ~=3.1.2 to ~=3.1.4 by [@&#8203;dependabot](https://togithub.com/dependabot) in [https://github.com/dbt-athena/dbt-athena/pull/709](https://togithub.com/dbt-athena/dbt-athena/pull/709)

#### New Contributors

-   [@&#8203;iconara](https://togithub.com/iconara) made their first contribution in [https://github.com/dbt-athena/dbt-athena/pull/684](https://togithub.com/dbt-athena/dbt-athena/pull/684)

**Full Changelog**: dbt-labs/dbt-athena@v1.8.3...v1.8.4

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "before 4am on the first day of the month" (UTC), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://togithub.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy40NDAuNyIsInVwZGF0ZWRJblZlciI6IjM3LjQ0MC43IiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJhdXRvbWVyZ2UiXX0=-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enable-functional-tests Label to trigger functional testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants