Docs > Spark "Getting Started" Guide
This document will walk you through how to quickly get started with local development using Apache Spark.
Before you proceed, you will need a few things, specifically: Git, Docker, Python, and VS Code. Please use our Windows Dev Getting Started Guide or using this script:
The slalom.dataops
library makes it quick and easy to launch a new spark server:
-
Install the python library:
pip install --upgrade slalom.dataops
-
Launch the spark server:
s-spark start_server --with_jupyter
After launching the server, you can connect spark applications using the following server endpoints:
- Spark Web GUI: http://localhost:4040
- Jupyter Web GUI: https://localhost:8888
- SQL queries (JDBC):
localhost:10000
- Spark applications:
localhost:7077
This quick tutorial shows you how to run Spark as a SQL database and connect and run queries using a GUI JDBC-compliant query tool.
-
Download the DBeaver SQL application from Chocolatey by clicking here: choco://dbeaver.
- If you have not yet installed chocolatey, you can install it now from https://docs.dataops.tk/choco_min.bat or click here for more information.
-
After installing the DBeaver app, create a new "Spark" connection and connect to
localhost:10000
with a blank username and password. -
Open a new query window and try executing the following SQL commands:
CREATE DATABASE mydb; SHOW DATABASES; USE mydb; CREATE TABLE mytable AS SELECT 4 AS the_answer, CAST(null as string) AS the_question; SELECT * FROM mydb.mytable; DESCRIBE TABLE mytable; SHOW TABLES IN mydb;
This fun tutorial introduces you to Jupyter notebooks by way of an ML-based text adventure game called AIDungeon_2.
- Follow the instructions above to install the
slalom.dataops
Python library. - Open the Jupyter Notebook GUI at https://localhost:8888?token=qwerty123
- In the new browser window, navigate to the
Samples
directory and openAIDungeon.ipynb
. - Select
Terminal
>Run all cells...
from the menu options. - Wait for the application to initialize and then start your adventure!
A. Yes. If you do not have the python library installed, or if you want additional control over the local docker container, you can run the following command to manually launch the spark cluster using Docker:
docker run -it --rm \
-p 4040:4040 \
-p 7077:7077 \
-p 8888:8888 \
-p 10000:10000 \
slalomggp/dataops:latest-dev \
spark start_server --with_jupyter