Skip to content

Latest commit

 

History

History
80 lines (52 loc) · 4.42 KB

README.md

File metadata and controls

80 lines (52 loc) · 4.42 KB

ConstellationOnYarn

This repo contains some experimental code for putting the Consellation runtime system on YARN/HDFS

The nl.esciencecenter package contains a simple example application that reads an input file and computes a hash of each block.

Prerequisites

To run this example you need a machine which has the following software installed:

  • Java 1.7 or higher
  • ant 1.7 or higher
  • Hadoop 2.5 or higher

Note that Hadoop must be set up properly, i.e., the $HADOOP_HOME environment variable must be set up. In addition, make sure you use the Java version that is used to run Hadoop to compile this example. Using a different version may result in unsupported version exceptions.

Compilation

After cloning this example, change to the directory and compile using:

ant

This compiles the example, and creates a ./dist directory containing a ConstellationOnYARN.jar plus all dependencies.

Running the example

To run the example you will first need to put an input file in HDFS. As usual with Hadoop, bigger is better. You can use an ubuntu source image for example:

wget http://ftp.acc.umu.se/mirror/cdimage.ubuntu.com/releases/xenial/release/source/ubuntu-16.04-src-1.iso

Next put this image into HDFS for example (after replacing /user/jason with your home directory in HDFS):

hdfs dfs -copyFromLocal ubuntu-16.04-src-1.iso /user/jason

Next run the example using (after replacing /user/jason with your home directory in HDFS twice):

java -cp ./dist/ConstellationOnYarn.jar:`yarn classpath` nl.esciencecenter.ConstellationSubmitter /user/jason/ ./dist /user/jason/ubuntu-16.04-src-1.iso true 1

This starts the nl.esciencecenter.ConstellationSubmitter example locally, which connects to the YARN scheduler and submits the example. This class expects the following command-line parameters:

  • The HDFS root directory to which the dependencies should be staged-in, /user/jason in this example.
  • The local directory containing the dependencies that need to be staged in. Use ./dist to run this example.
  • The input file (on HDFS) to process, /user/jason/ubuntu-16.04-src-1.iso in this example.
  • true/false, indicating whether an attempt should be made to create precise activity contexts.
  • The number of YARN Workers to submit, 1 in this example.

The example first copies the ./dist directory to HDFS, as these files are needed on the YARN nodes to run the example. It then stubmits the job to YARN, and waits for it to finish.

The example code

The example code can be found in 'src' and consists of the following packages:

  • nl.esciencecenter the main example code
  • nl.esciencecenter.constellation the constellation application
  • nl.esciencecenter.yarn a somewhat generic library to access YARN

Like most YARN applications, running this application requires three parts:

  • nl.esciencecenter.ConstellationSubmitter: the main application that connects to the YARN scheduler and submits the nl.esciencecenter.ApplicationMaster
  • nl.esciencecenter.ApplicationMaster: the master application that runs on a YARN node, aquires the nodes to run one or more nl.esciencecenter.constellation.ConstellationWorker applications and starts them, starts a nl.esciencecenter.constellation.ConstellationMaster itself, and waits for everything to finish.
  • nl.esciencecenter.constellation.ConstellationWorker: the worker application that is started on the YARN nodes to processes one or more nl.esciencecenter.constellation.SHA1Jobs.

In addition, the following classes are used:

  • nl.esciencecenter.constellation.ConstellationMaster: the Constellation master application which starts a Constellation, submits the jobs, and gathers the results.
  • nl.esciencecenter.constellation.SHA1Job: a job that is created by the ConstellationMaster and executed on one of the ConstellationWorkder. Each of these jobs reads a block of data from a file in HDFS, and calculates the SHA1 hash of this block.
  • nl.esciencecenter.constellation.SHA1Result: a result object that is returned by each SHA1Job to the ConstellationMaster.
  • nl.esciencecenter.yarn.YarnSubmitter: a generic class used by 'ConstellationOnYarn' to submit the application to YARN.
  • nl.esciencecenter.yarn.YarnMaster: a generic class used by 'ApplicationMaster' to submit the workers to YARN.
  • nl.esciencecenter.yarn.Utils: various utility methods used by 'ApplicationSubmitter' and 'ApplicationMaster'.