Approaches on Enums in Spark 2.x

The project showcases approaches you can take on Spark missing Enum support.

The limitation is caused by lack of possibility to encode an ADT in DataSet column. The approach would require to provide a custom Encoder, which is not possible at the moment.

Kryo will not help you either, check "Fake case class parent" to understand why.

Articles or SO posts you can find useful:

Custom encoders
Encode an ADT in Spark DataSet column

What and how?

Each approach is showcased with a test suite that compares two situations:

Regular Scala collection with created objects
Spark-ingested Dataset based on the above collection

The test is linked in each title.

Keep in mind that in some cases, Spark looses certain data during encoding/decoding process, which is always reflected in the assertions!

Approaches

Case class wrapper
Extra field with primitive column
Fake case class parent
Type alias

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Approaches on Enums in Spark 2.x

What and how?

Approaches

Files

README.md

Latest commit

History

README.md

File metadata and controls

Approaches on Enums in Spark 2.x

What and how?

Approaches