-
Notifications
You must be signed in to change notification settings - Fork 35
Column functions
Jolan Rensen edited this page Aug 2, 2022
·
3 revisions
Similar to the Scala API for Columns
, many of the operator functions could be ported over.
For example:
ds.select( col("colA") + 5 )
// datasets can also be invoked to get a column
ds.select( ds("colA") / ds("colB") )
dataset.where( col("colA") `===` 6 )
// or alternatively
dataset.where( col("colA") eq 6)
In short, all supported operators are:
-
==
- same as
equals()
- same as
-
!=
- same as
!equals()
- same as
-
eq
/`===`
- in Scala:
===
- in Java:
equalTo()
- in Scala:
-
neq
/`=!=`
- in Scala:
=!=
- in Java:
notEqual()
- in Scala:
-
-col(...)
- same in Scala
- in Java:
negate(col())
-
!col(...)
- same in Scala
- in Java:
not(col())
-
gt
- in Scala:
>
- same in Java but also infix
- in Scala:
-
lt
- in Scala:
<
- same in Java but also infix
- in Scala:
-
geq
- in Scala:
>=
- same in Java but also infix
- in Scala:
-
leq
- in Scala:
<=
- same in Java but also infix
- in Scala:
-
or
- in Scala:
||
- same in Java but also infix
-
`||`
is unfortunately an illegal function name on Windows
- in Scala:
-
and
/`&&`
- in Scala:
&&
- in Java:
and()
- in Scala:
-
+
- same in Scala
- in Java:
plus()
-
-
- same in Scala
- in Java:
minus()
-
*
- same in Scala
- in Java:
multiply()
-
/
- same in Scala
- in Java:
divide()
-
%
- same in Scala
- in Java:
mod()
Secondly, there are some quality of life additions as well:
In Kotlin, Ranges are often
used to solve inclusive/exclusive situations for a range. So, instead of between(a, b)
you can now do:
dataset.where( col("colA") inRangeOf 0..2 )
Also, for columns containing map- or array-like types, instead of getItem()
we have:
dataset.where( col("colB")[0] geq 5 )
Finally, thanks to Kotlin reflection, we can provide a type- and refactor safe way
to create TypedColumn
s and with those, a new Dataset from pieces of another using the select()
function:
val dataset: Dataset<YourClass> = ...
val newDataset: Dataset<Tuple2<TypeA, TypeB>> = dataset.select(col(YourClass::colA), col(YourClass::colB))
// Alternatively, for instance when working with a Dataset<Row>
val typedDataset: Dataset<Tuple2<String, Int>> = otherDataset.select(col<_, String>("a"), col<_, Int>("b"))