Skip to content

CDR Processing with Spark

Tugdual Grall edited this page Mar 16, 2016 · 2 revisions

1 - MapR Streams + Spark Streaming : Here you directly consume the messages into Spark, and check the "state of the towers" based on the CDR.

If the last 500 messages (or x minutes) we have specific % of failure we should send an alert/change the tower state (we can for example push a new message on a event topic to the UI, more or less what you have done in the racing car event), the idea:

  • 0 < 10% failure on the sliding window: tower in GREEN
  • 11% to 60% failure : tower in ORANGE
  • 61 to 90% : tower in RED (+ special alert)
  • 91% : tower in BLACK (+Special alert)

2 - Analytical Processing: here we will use Spark to create aggregated view (aggregated document) based on the CDR and Tower data, for example:

  • stats by caller id (for example 1 document in JSON DB for each caller id) with some aggregated data: number of calls, avg duration, min/max duration, and % of failure
  • stats by tower : number of calls, avg duration, min/max duration, and % of failure

We can aggregate also by day, hours.... using pre-aggregated document this job can runs every x minutes and do incremental update to keep stats of the whole dataset

3- Machine Learning: the idea here is to create a simple model and show how you can use this in applications, Something like:

  • if time since last CDR is > 99%-ile of time, mark as failed
Clone this wiki locally