Here, we'll give some instructions for a simple one-node cluster setup on a Linux environment.
Navigate to the desired install location and download hadoop. You can replace version number with version of your choice:
wget https://httpd-mirror.sergal.org/apache/hadoop/common/hadoop-2.9.2/hadoop-2.9.2.tar.gz
Replace <hadoop_install> with desired hadoop install location.
mkdir <hadoop_install>/hadoop
sudo tar -zxvf hadoop-2.7.3.tar.gz -C <hadoop_install>/hadoop
cd <hadoop_install>/hadoop
sudo chmod 750 hadoop-2.9.2
Edit etc/hadoop/hadoop-env.sh with the HADOOP_CONF_DIR variable to your directory. If necessary, you can also set the JAVA_HOME variable here
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/<hadoop_install>/hadoop/hadoop-2.9.2/etc/hadoop"}
export JAVA_HOME=...
Edit etc/hadoop/core-site.xml with the following configuration (fill in your directory):
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/<hadoop_install>/hadoop/hadooptmpdata</value>
</property>
</configuration>
and etc/hadoop/hdfs-site.xml with the following configuration (fill in your directory):
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file://<hadoop_install>/hadoop/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file://<hadoop_install>/hadoop/hdfs/datanode</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
Finally, set the HADOOP_HOME variable in your .bashrc (of whichever user is running hadoop):
export HADOOP_HOME=<hadoop_install>/hadoop/hadoop-2.9.2
Create the directories referenced above:
cd /<hadoop_install>/hadoop/
mkdir hdfs
mkdir hadooptmpdata
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
and check that this worked:
ssh localhost
bin/hdfs namenode -format
cd /scratch_b/<your username>/hadoop/hadoop-2.9.2
sbin/start-dfs.sh
Each Vertica node needs to have access to a copy of the HDFS configuration. If these are on seperate machines, you can use a command such as rsync to copy the configuration over. This must be done for each Vertica node.
rsync -R --progress <hadoop_install>/hadoop/hadoop-2.9.2/etc/hadoop/hdfs-site.xml arehnby@eng-g9-158:/etc/hadoop/conf/
rsync -R --progress <hadoop_install>/hadoop/hadoop-2.9.2/etc/hadoop/core-site.xml arehnby@eng-g9-158:/etc/hadoop/conf/