- Install Node.js
- Install RethinkDB
- Install zookeeper with brew (brew install zookeeper)
- Install Kafka with brew (brew install kafka)
- Run
npm install
to install the dependencies. - Update
app.js
to include the correct path toconfig.js
- Start zookeeper : zkServer start (To start in background: brew services start zookeeper)
- Start Kafka: kafka-server-start /usr/local/etc/kafka/server.properties
- Create a Kafka Topic : bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic ops-data (You can start Kafka server on different ports and increase the replication-factor to the number of brokers)
- Start RethinkDB
- Java Project - SparkJApp contains 4 main Classes
- OpsDataProducer.java : Streaming Producer
- OpsDataStreams.java : Streaming Consumer
- OpsResult.java : A model that is used to write data to a persistent store
- SparkLogisticRegression.java : Training class used to create a model for Operational data
- Run SparkLogisticRegression.java by modifying the path to training dataset. This will create a model in filesyste,
- Run OpsDataStreams.java. This will kick off the consumer from Spark Topic. Configure the millisecond interval based on your need.
- Run OpsDataProducer.java to read data and drop messages into kafka.
- Start the webapp server with
node app.js
- Browse to localhost:3000. The app loads a Donut chart based on the predicted outcomes that are written to RethinkDB.