Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exploit iceberg row-count metadata #74

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

peterboncz
Copy link

@peterboncz peterboncz commented Oct 9, 2024

Examine the row-counts in the manifest and count how many rows are in the (existing or added) data resp. deletion files.

Use these two counts to pass the new named_parameter "explicit_cardinality" into the generated parquet_scans.

Note that because iceberg is passing in an explicit schema into the parquet_scan, it does not open any file during bind. That means the parquet_scans it generates did not have any cardinality information during query optimization.

That can cause rather bad query plans. This PR (pb/explicit-iceberg-cardinality) fixes that. Note that this PR needs the DuckDB PR pb/explicit-parquet-cardinality that adds the "explicit_cardinality" named_parameter to parquet_scan.

peter added 2 commits October 9, 2024 14:09
Examine the row-counts in the manifest and count how many rows are in the (existing or added) data an deletion files.

Use these two counts to pass the new named_parameter "explicit_cardinality" into the generated parquet_scans.

Note that because iceberg is passing in an explicit schema into the parquet_scan, it does not open any file during
bind. That means the parquet_scans it generates did not have any cardinality information during query optimization.

That can cause rather bad query plans. This PR (pb/explicit-iceberg-cardinality) fixes that. Note that this PR needs
the DuckDB PR pb/exicit-parquet-cardinality that adds the "explicit_cardinality" named_parameter to parquet_scan.
@peterboncz
Copy link
Author

This CI is expected to fail (unknown named_parameter 'explicit_cardinality') until duckdb/duckdb#14292 would have been merged

@peterboncz peterboncz marked this pull request as draft October 11, 2024 17:59
@peterboncz peterboncz marked this pull request as ready for review October 11, 2024 17:59
@peterboncz peterboncz marked this pull request as draft October 11, 2024 18:00
@peterboncz peterboncz marked this pull request as ready for review October 11, 2024 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant