This repo contains the exercises for the two lessons from the Data Lakes with Spark course in ND027 - Data Engineering Nanodegree program:
- Lesson 3: Setting up Spark Clusters using AWS, and
- Lesson 4: Debugging and Optimization.
This repo contains a folder for each lesson
Lesson 3: Submitting_spark_scripts
Lesson 4: Write_to_s3
Lesson 3 includes a demo code folder containing code and data files used from the classroom.
Each lesson
folder contains an exercises
folder. This exercises
folder should contain all files and instructions necessary for the exercises along with the solution. See the README
in the exercises
folder for information about folder structure.
Folder: Submitting_spark_scripts Folder: Write_to_s3
Folder: Exercise