You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Collecting statistics on a column containing long values such as string or binary is an expensive operation. To avoid collecting statistics on such columns you can configure the table property delta.dataSkippingNumIndexedCols. This property indicates the position index of a column in the table’s schema. All columns with a position index less than the delta.dataSkippingNumIndexedCols property will have statistics collected. For the purposes of collecting statistics, each field within a nested column is considered as an individual column. To avoid collecting statistics on columns containing long values, either set the delta.dataSkippingNumIndexedCols property so that the long value columns are after this index in the table’s schema, or move columns containing long strings to an index position greater than the delta.dataSkippingNumIndexedCols property by using ALTER TABLE ALTER COLUMN.
We should provide a helper method that orders DataFrames with the "best" column types for data skipping first. We should let the user specify the columns they commonly filter on (put those first), then the integer columns, etc. Not sure how this would work with Z ORDER. Need to think about this one more, but seems like it's important.
The text was updated successfully, but these errors were encountered:
See the docs on data skipping.
Specifically this section:
We should provide a helper method that orders DataFrames with the "best" column types for data skipping first. We should let the user specify the columns they commonly filter on (put those first), then the integer columns, etc. Not sure how this would work with Z ORDER. Need to think about this one more, but seems like it's important.
The text was updated successfully, but these errors were encountered: