Distributed deep learning on Spark. Keras-like API and integration. Spark Package homepage.
import org.apache.spark.ml.dl._
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)
val data = sqlContext.read.format("libsvm").load("path_to_dataset.txt")
val dataset = data.withColumnRenamed("label", "labels")
# Set up architecture for convolutional neural network
val model = new Sequential()
model.add(new Convolution2D(8, 1, 3, 3, 28, 28))
model.add(new Activation("relu"))
model.add(new Dropout(0.5))
model.add(new Dense(6272, 10))
model.add(new Activation("softmax"))
model.compile(loss="categorical_crossentropy",
optimizer=new Optimizer().adam(lr=.001),
metrics="Accuracy")
val trained = model.fit(dataset, num_iters=500)
import org.apache.spark.ml.dl._
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)
val data = sqlContext.read.format("libsvm").load("path_to_dataset.txt")
val dataset = data.withColumnRenamed("label", "labels")
# Set up architecture for feedforward neural network
val model = new Sequential()
model.add(new Dense(784, 100))
model.add(new Activation("relu"))
model.add(new Dense(100, 10))
model.add(new Activation("softmax"))
model.compile(loss="categorical_crossentropy",
optimizer=new Optimizer().adam(lr=.001),
metrics="Accuracy")
val trained = model.fit(dataset, num_iters=500)
Clone the project with:
git clone https://github.com/JeremyNixon/sparkdl.git
cd into the repository and run
sbt assembly
to build the project.
Next we need to publish locally to ivy or to maven. To publish to ivy, run
sbt publish-local
or to publish to maven, run
sbt publishM2
and call spark with
./spark-shell --packages default:sparkdl_2.11:0.0.1
to run spark with sparkdl.
To contribute to the project, you'll need to build and modify your cloned fork of the project.
Once you've made your changes, run:
sbt assembly
to build. Then you'll need to publish to your local maven or ivy with a different version number. Modify the version number in pom.xml and in build.sbt, for example from 1.0.0 to 1.0.1. Then call:
sbt publish-local
# sbt publishM2 to publish to maven will also work, resulting in a different name.
To publish to ivy. Once your local copy has been published, you can call it from spark packages. This call will be different from the original call - your scala version will be appended to the name and the root of the call will now be default. The version number will also change to the version you've provided. For example:
./spark-shell --packages default:sparkdl_2.11:0.0.1
Where you can run your modified code.