You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Delta Lake may be able to generate partition filters for a query whenever a partition column is defined by one of the following expressions:
CAST(col AS DATE) and the type of col is TIMESTAMP.
YEAR(col) and the type of col is TIMESTAMP.
Two partition columns defined by YEAR(col), MONTH(col) and the type of col is TIMESTAMP.
Three partition columns defined by YEAR(col), MONTH(col), DAY(col) and the type of col is TIMESTAMP.
Four partition columns defined by YEAR(col), MONTH(col), DAY(col), HOUR(col) and the type of col is TIMESTAMP. SUBSTRING(col, pos, len) and the type of col is STRING DATE_FORMAT(col, format) and the type of col is TIMESTAMP.
Particularly of interest is partition columns defined by YEAR(col), MONTH(col), DAY(col), and/or HOUR(col) derived from tsdf.ts_col attribute.
Example of partition pruning using "/databricks-datasets/amazon/test4K/" with and without generated columns below.
Using generated columns, we could get some partition filtering on year, month, and/or day when using the ts_col for filtering at file scan.
https://docs.delta.io/latest/delta-batch.html#use-generated-columns
From the docs on Generated Columns:
Particularly of interest is partition columns defined by
YEAR(col)
,MONTH(col)
,DAY(col)
, and/orHOUR(col)
derived fromtsdf.ts_col
attribute.Example of partition pruning using "/databricks-datasets/amazon/test4K/" with and without generated columns below.
Produces the below plan for Parquet scan:
Changing the create table statement to use generated columns for
year
,month
, andday
produces the additional partition filters in the physical plan.The text was updated successfully, but these errors were encountered: