Skip to content

Latest commit

 

History

History
30 lines (20 loc) · 1.12 KB

README.md

File metadata and controls

30 lines (20 loc) · 1.12 KB

Approaches on Enums in Spark 2.x

The project showcases approaches you can take on Spark missing Enum support.

The limitation is caused by lack of possibility to encode an ADT in DataSet column. The approach would require to provide a custom Encoder, which is not possible at the moment.

Kryo will not help you either, check "Fake case class parent" to understand why.

Articles or SO posts you can find useful:

What and how?

Each approach is showcased with a test suite that compares two situations:

  • Regular Scala collection with created objects
  • Spark-ingested Dataset based on the above collection

The test is linked in each title.

Keep in mind that in some cases, Spark looses certain data during encoding/decoding process, which is always reflected in the assertions!

Approaches

  1. Case class wrapper
  2. Extra field with primitive column
  3. Fake case class parent
  4. Type alias