Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a configurable limit to the LocalSymbolTable (LST) size #5

Open
raganhan opened this issue Aug 30, 2018 · 0 comments
Open

Add a configurable limit to the LocalSymbolTable (LST) size #5

raganhan opened this issue Aug 30, 2018 · 0 comments

Comments

@raganhan
Copy link
Contributor

Zack's comment:

Food for thought: I've had to impose limits on the size of the local symbol table to avoid requiring a reader to hold a hashmap with, say, 1 Million+ entries in memory. We may wish to either warn users not to write high cardinality columns out as symbols, or we might wish to have another property that caps the number of different symbols that can be added from a given column before we give up and write the remaining values as strings.

Add a warning on the documentation regarding this issue, example:

Warning: When writing or reading Ion data the LST must be kept in memory for the current split, for a large number
of unique symbols the worker may run into memory issues. Ion symbols are used both for textual columns configured to
serialize as symbol and column names

We recommend avoiding serializing high cardinality textual columns to Ion symbol to avoid this issue. Apart from the
memory consideration high cardinality columns won't reap much benefit from Ion symbols as the compression gains are
directly correlated with the values repetition

Create a new SerDe property to limit the size of the LST with two modes:

  • Flips Ion symbols to Ion string after reaching a certain LST size
  • Fails
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant