You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Food for thought: I've had to impose limits on the size of the local symbol table to avoid requiring a reader to hold a hashmap with, say, 1 Million+ entries in memory. We may wish to either warn users not to write high cardinality columns out as symbols, or we might wish to have another property that caps the number of different symbols that can be added from a given column before we give up and write the remaining values as strings.
Add a warning on the documentation regarding this issue, example:
Warning: When writing or reading Ion data the LST must be kept in memory for the current split, for a large number
of unique symbols the worker may run into memory issues. Ion symbols are used both for textual columns configured to
serialize as symbol and column names
We recommend avoiding serializing high cardinality textual columns to Ion symbol to avoid this issue. Apart from the
memory consideration high cardinality columns won't reap much benefit from Ion symbols as the compression gains are
directly correlated with the values repetition
Create a new SerDe property to limit the size of the LST with two modes:
Flips Ion symbols to Ion string after reaching a certain LST size
Fails
The text was updated successfully, but these errors were encountered:
Zack's comment:
Add a warning on the documentation regarding this issue, example:
Create a new SerDe property to limit the size of the LST with two modes:
The text was updated successfully, but these errors were encountered: