Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DE임태규 - W4M2 #260

Open
wants to merge 22 commits into
base: DE임태규_W4
Choose a base branch
from
8 changes: 8 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
####### 미션에서 큰 파일
*.tar.gz
*tweets.csv
*/ml-20m
*ratings.csv
*amazon_reviews
*.tgz
*NYC_TLC_data
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand Down
1,838 changes: 1,838 additions & 0 deletions W4M2/W4M2.ipynb

Large diffs are not rendered by default.

45 changes: 45 additions & 0 deletions W4M2/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
version: '3'
services:
spark-master:
image: spark_jupyter_image
container_name: spark-master
hostname: spark-master
environment:
- SPARK_MODE=master
ports:
- "8080:8080"
- "7077:7077"
- "8888:8888"
command: bash -c "/opt/spark/sbin/start-master.sh && jupyter notebook --notebook-dir='/' --allow-root --ip=0.0.0.0 --no-browser --port=8888 --NotebookApp.token=''"
networks:
- spark_net

spark-worker-1:
image: spark_jupyter_image
container_name: spark-worker-1
hostname: spark-worker-1
environment:
- SPARK_MODE=worker
- SPARK_MASTER_URL=spark://spark-master:7077
depends_on:
- spark-master
command: /opt/spark/sbin/start-slave.sh spark://spark-master:7077
networks:
- spark_net

spark-worker-2:
image: spark_jupyter_image
container_name: spark-worker-2
hostname: spark-worker-2
environment:
- SPARK_MODE=worker
- SPARK_MASTER_URL=spark://spark-master:7077
depends_on:
- spark-master
command: /opt/spark/sbin/start-slave.sh spark://spark-master:7077
networks:
- spark_net

networks:
spark_net:
driver: bridge
Binary file not shown.
22 changes: 22 additions & 0 deletions W4M2/spark_image/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
FROM ubuntu:22.04

ENV SPARK_VERSION=3.5.1
ENV SPARK_HOME=/opt/spark
ENV PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

RUN apt update && apt install -y \
python3 python3-pip \
openjdk-8-jdk \
wget sudo

COPY spark-3.5.1-bin-hadoop3.tgz /temp/
RUN tar -xvf /temp/spark-3.5.1-bin-hadoop3.tgz -C /opt/ && \
mv /opt/spark-3.5.1-bin-hadoop3 /opt/spark && \
rm -rf /temp/spark-3.5.1-bin-hadoop3.tgz

RUN pip install --upgrade pip
RUN pip install jupyter pandas numpy matplotlib seaborn pyspark

EXPOSE 8080 7077 8888

CMD [ "bash" ]
33 changes: 0 additions & 33 deletions missions/W1/mtcars.csv

This file was deleted.

Binary file removed slides/W1 Introduction to Data Engineering.pdf
Binary file not shown.