Streaming listeners are Spark listeners interested in streaming events like batch submitted, started or completed.
Streaming listeners implement org.apache.spark.streaming.scheduler.StreamingListener listener interface and process StreamingListenerEvent events.
The following streaming listeners are available in Spark Streaming:
-
StreamingListenerBatchSubmitted
is posted when streaming jobs are submitted for execution and triggersStreamingListener.onBatchSubmitted
(see StreamingJobProgressListener.onBatchSubmitted). -
StreamingListenerBatchStarted
triggersStreamingListener.onBatchStarted
-
StreamingListenerBatchCompleted
is posted to inform that a collection of streaming jobs has completed, i.e. all the streaming jobs in JobSet have stopped their execution.
StreamingJobProgressListener
is a streaming listener that collects information for StreamingSource and Streaming page in web UI.
Note
|
A StreamingJobProgressListener is created while StreamingContext is created and later registered as a StreamingListener and SparkListener when Streaming tab is created.
|
For StreamingListenerBatchSubmitted(batchInfo: BatchInfo)
events, it stores batchInfo
batch information in the internal waitingBatchUIData
registry per batch time.
The number of entries in waitingBatchUIData
registry contributes to numUnprocessedBatches
(together with runningBatchUIData
), waitingBatches
, and retainedBatches
. It is also used to look up the batch data for a batch time (in getBatchUIData
).
numUnprocessedBatches
, waitingBatches
are used in StreamingSource.
Note
|
waitingBatches and runningBatches are displayed together in Active Batches in Streaming tab in web UI.
|
retainedBatches
are waiting, running, and completed batches that web UI uses to display streaming statistics.
The number of retained batches is controlled by spark.streaming.ui.retainedBatches.