WindowedDStream
(aka windowed stream) is an internal DStream with dependency on the parent
stream.
Note
|
It is the result of window operators. |
windowDuration
has to be a multiple of the parent stream’s slide duration.
slideDuration
has to be a multiple of the parent stream’s slide duration.
Note
|
When windowDuration or slideDuration are not multiples of the parent stream’s slide duration, Exception is thrown.
|
The parent’s RDDs are automatically changed to be persisted at MEMORY_ONLY_SER
storage level (since they need to last longer than the parent’s slide duration for this stream to generate its own RDDs).
Obviously, slide duration of the stream is given explicitly (and must be a multiple of the parent’s slide duration).
parentRememberDuration
is extended to cover the parent’s rememberDuration
and the window duration.
compute
method always returns a RDD, either PartitionerAwareUnionRDD
or UnionRDD
, depending on the number of the Partitioner defined by the RDDs in the window. It uses slice operator on the parent stream (using the slice window of [now - windowDuration + parent.slideDuration, now]
).
If only one partitioner is used across the RDDs in window, PartitionerAwareUnionRDD
is created and you should see the following DEBUG message in the logs:
DEBUG WindowedDStream: Using partition aware union for windowing at [time]
Otherwise, when there are multiple different partitioners in use, UnionRDD
is created and you should see the following DEBUG message in the logs:
DEBUG WindowedDStream: Using normal union for windowing at [time]
Tip
|
Enable Add the following line to
|