-
Notifications
You must be signed in to change notification settings - Fork 35
Tuples
Inspired by ScalaTuplesInKotlin, the API introduces a lot of helper- extension functions to make working with Scala Tuples a breeze in your Kotlin Spark projects. While working with data classes is encouraged, for pair-like Datasets / RDDs / DStreams Scala Tuples are recommended, both for the useful helper functions, as well as for Spark performance. To enable these features simply add
import org.jetbrains.kotlinx.spark.api.tuples.*
to the start of your file.
Tuple creation can be done in the following manners:
val a: Tuple2<Int, Long> = tupleOf(1, 2L)
val b: Tuple3<String, Double, Int> = t("test", 1.0, 2)
val c: Tuple3<Float, String, Int> = 5f X "aaa" X 1
NOTE: While the X
method is the quickest way to create a tuple, some caution is necessary, as
tupleOf(1) X 2 !== tupleOf(tupleOf(1), 2)
but due to the way the infix method works:
tupleOf(1) X 2 == tupleOf(1, 2)
Tuples can be expanded and merged like this:
// expand
tupleOf(1, 2).appendedBy(3) == tupleOf(1, 2, 3)
tupleOf(1, 2) + 3 == tupleOf(1, 2, 3)
tupleOf(2, 3).prependedBy(1) == tupleOf(1, 2, 3)
1 + tupleOf(2, 3) == tupleOf(1, 2, 3)
// merge
tupleOf(1, 2) concat tupleOf(3, 4) == tupleOf(1, 2, 3, 4)
tupleOf(1, 2) + tupleOf(3, 4) == tupleOf(1, 2, 3, 4)
// extend tuple instead of merging with it
tupleOf(1, 2).appendedBy(tupleOf(3, 4)) == tupleOf(1, 2, tupleOf(3, 4))
tupleOf(1, 2) + tupleOf(tupleOf(3, 4)) == tupleOf(1, 2, tupleOf(3, 4))
NOTE: Prepending a tuple with a String might result in unexpected behavior like this, since String has the operator fun plus(other: Any?)
:
"some string" + tupleOf(1, 2) == "some string(1,2)"
In these cases you can turn to
tupleOf(1, 2).prependedBy("some string") == tupleOf("some string", 1, 2)
The concept of EmptyTuple
from Scala 3 is also already present:
tupleOf(1).dropLast() == tupleOf() == emptyTuple() == EmptyTuple
Finally, all these tuple helper functions are also baked in:
-
componentX()
- for destructuring:
val (a, b) = tuple
- for destructuring:
-
contains(x)
- for
if (x in tuple) { ... }
- for
-
iterator()
- for
for (x in tuple) { ... }
- generalizes types to smallest common ancestor
- for
-
asIterable()
- generalizes types to smallest common ancestor
size
-
get(n) / get(i..j)
- for
tuple[1] / tuple[i..j]
- returns single item or list of items
- generalizes types to smallest common ancestor
- can throw IndexOutOfBoundsException
- for
-
getOrNull(n) / getOrNull(i..j)
- same as
get(n)
, but returnsnull
instead of throwing an exception
- same as
-
getAs<T>(n) / getAs<T>(i..j)
- returns a single item or list of items cast to
T
- can throw ClassCastException and IndexOutOfBoundsException
- returns a single item or list of items cast to
-
getAsOrNull<T>(n) / getAsOrNull<T>(i..j)
- same as
getAs<T>(n)
but returnsnull
instead of throwing an exception
- same as
-
copy(_1 = ..., _5 = ...)
- similar to datasets, this returns a copy of the Tuple with only the provided arguments replaced
first() / last()
-
_1
,_6
etc. (instead of_1()
,_6()
) -
zip
- zips two tuples as one large Tuple of Tuple2s
- is infix
- on different sizes, the smallest size is kept
-
dropLast() / dropFirst()
- returns a new tuple without the first or last element
- same as
dropLast1() / dropFirst1()
-
dropN() / dropLastN()
- returns a new tuple with the first or last
N
elements dropped - used like
drop11()
-
drop0()
simply copies the tuple - returns
EmptyTuple
if all elements are dropped
- returns a new tuple with the first or last
-
takeN() / takeLastN()
- returns a new tuple with the first or last
N
elements dropped - used like
take11()
-
take0()
simply returnsEmptyTuple
- returns a new tuple with the first or last
-
splitAtN()
- returns a Tuple2 with the original split at position
N
- for:
- returns a Tuple2 with the original split at position
val a: Tuple3<Int, Double, String> = tupleOf(1, 2.0, "3.0")
val (c: Tuple2<Int, Double>, d: Tuple1<String>) = a.splitAt2()
- can also return
EmptyTuple
whensplitAt0()
-
map
- generalizes types to smallest common ancestor
- can be used to convert all values in a tuple at once
-
cast
- used to cast contents of a tuple
- used like
tuple.cast<Int, String, Int>()
- can throw ClassCastException