Docs > Spark "Getting Started" Guide

This document will walk you through how to quickly get started with local development using Apache Spark.

Prereqs

Before you proceed, you will need a few things, specifically: Git, Docker, Python, and VS Code. Please use our Windows Dev Getting Started Guide or using this script:

https://docs.dataops.tk/choco_devops.bat

Launching Spark on your local machine

The slalom.dataops library makes it quick and easy to launch a new spark server:

Install the python library:
```
pip install --upgrade slalom.dataops
```
Launch the spark server:
```
s-spark start_server --with_jupyter
```

Connect to the Spark server

After launching the server, you can connect spark applications using the following server endpoints:

Spark Web GUI: http://localhost:4040
Jupyter Web GUI: https://localhost:8888
SQL queries (JDBC): localhost:10000
Spark applications: localhost:7077

Tutorials

Tutorial 01: Running SparkSQL as a Virtual Database

This quick tutorial shows you how to run Spark as a SQL database and connect and run queries using a GUI JDBC-compliant query tool.

Download the DBeaver SQL application from Chocolatey by clicking here: choco://dbeaver.
- If you have not yet installed chocolatey, you can install it now from https://docs.dataops.tk/choco_min.bat or click here for more information.
After installing the DBeaver app, create a new "Spark" connection and connect to localhost:10000 with a blank username and password.

Open a new query window and try executing the following SQL commands:

CREATE DATABASE mydb;
SHOW DATABASES;
USE mydb;

CREATE TABLE mytable AS
SELECT 4 AS the_answer, CAST(null as string) AS the_question;

SELECT * FROM mydb.mytable;

DESCRIBE TABLE mytable;
SHOW TABLES IN mydb;

Tutorial 02: Play an ML-Based Dungeon Adventure Game, on Jupyter

This fun tutorial introduces you to Jupyter notebooks by way of an ML-based text adventure game called AIDungeon_2.

Follow the instructions above to install the slalom.dataops Python library.
Open the Jupyter Notebook GUI at https://localhost:8888?token=qwerty123
In the new browser window, navigate to the Samples directory and open AIDungeon.ipynb.
Select Terminal > Run all cells... from the menu options.
Wait for the application to initialize and then start your adventure!

Additional Info and FAQ

Q. Can I run Spark without the `slalom.dataops` python library?

A. Yes. If you do not have the python library installed, or if you want additional control over the local docker container, you can run the following command to manually launch the spark cluster using Docker:

docker run -it --rm \
    -p 4040:4040 \
    -p 7077:7077 \
    -p 8888:8888 \
    -p 10000:10000 \
    slalomggp/dataops:latest-dev \
    spark start_server --with_jupyter

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

getting_started_with_spark.md

getting_started_with_spark.md

Docs > Spark "Getting Started" Guide

Prereqs

Launching Spark on your local machine

Connect to the Spark server

Tutorials

Tutorial 01: Running SparkSQL as a Virtual Database

Tutorial 02: Play an ML-Based Dungeon Adventure Game, on Jupyter

Additional Info and FAQ

Q. Can I run Spark without the `slalom.dataops` python library?

Files

getting_started_with_spark.md

Latest commit

History

getting_started_with_spark.md

File metadata and controls

Docs > Spark "Getting Started" Guide

Prereqs

Launching Spark on your local machine

Connect to the Spark server

Tutorials

Tutorial 01: Running SparkSQL as a Virtual Database

Tutorial 02: Play an ML-Based Dungeon Adventure Game, on Jupyter

Additional Info and FAQ

Q. Can I run Spark without the slalom.dataops python library?

Q. Can I run Spark without the `slalom.dataops` python library?