Fall 2021
Jackson Neal Zoheb Nawaz Zachary Hillman
These components are installed:
- JDK 1.8
- Scala 2.12.13
- Hadoop 3.3.0
- Spark 3.1.2 (without bundled Hadoop)
- Maven
- AWS CLI (for EMR execution)
- Example ~/.bash_profile:
export HADOOP_HOME=/Users/landonneal/jacksonn/CS6240/hadoop-3.3.0
export JAVA_HOME=/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home
export YARN_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export SCALA_HOME=/Users/landonneal/jacksonn/CS6240/scala-2.12.13
export SPARK_HOME=/Users/landonneal/jacksonn/CS6240/spark-3.1.2-bin-without-hadoop
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
export PATH=$HADOOP_HOME:$PATH
export PATH=/opt/homebrew/bin:$PATH
export PATH=$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATH
export PATH=$SCALA_HOME/bin:$PATH
- Explicitly set JAVA_HOME in $HADOOP_HOME/etc/hadoop/hadoop-env.sh:
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
All of the build & execution commands are organized in the Makefile.
- Unzip project file.
- Open command prompt.
- Navigate to directory where project files unzipped.
- Edit the Makefile to customize the environment at the top. Sufficient for standalone: hadoop.root, jar.name, local.input Other defaults acceptable for running standalone.
- Standalone Hadoop:
- make switch-standalone -- set standalone Hadoop environment (execute once)
- make local
- Pseudo-Distributed Hadoop: (https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html#Pseudo-Distributed_Operation)
- make switch-pseudo -- set pseudo-clustered Hadoop environment (execute once)
- make pseudo -- first execution
- make pseudoq -- later executions since namenode and datanode already running
- AWS EMR Hadoop: (you must configure the emr.* config parameters at top of Makefile)
- make upload-input-aws -- only before first execution
- make aws -- check for successful execution with web interface (aws.amazon.com)
- download-output-aws -- after successful execution & termination