Skip to content

...add support for new programming language

MariusDanner edited this page Mar 28, 2020 · 10 revisions

The system was built to be easily extensible. This is achieved by making use of docker images stored on the host system. These images are used to execute the algorithm in their respective environments.

Dockerfile

To add a new language to the system, the execution environment has to be available as a Docker image on the host system (or the execution system if configured). Example Dockerfiles for the generation of such images are available in the executionenvironments directory in the source code of this project. The Dockerfile should be built in a way that the following conditions are met:

  • The entrypoint is set correctly so that the script name can be passed on launch when running the image in a fresh container.
  • The image contains all dependencies after build.
  • The image already contains the script that calls the algorithm.

Once the Dockerfile is defined, you can add it to scripts/bootstrap.sh to have it built automatically, or build it manually before redeploying. It is important to label the docker image using the -t flag in order to reference this value for an algorithm's configuration in algorithms.json (see below).

Executing algorithms

The script or program that is executed in the container should be able to pursue the following tasks:

  1. Parse the command line parameters passed to it, including the host of the API
  2. Call the REST API to download the dataset (/dataset/<int:dataset_id>/load)
  3. Execute the algorithm with the data given
  4. Post the results back to the API to mark the job as done (/job/<int:job_id>/result) All API endpoints are documented in Swagger.

We recommend moving tasks 2 and 4 together with other utility functions into a separate file/module, similar to how executionenvironments/r/mpci_utils.r is implemented. This makes it much easier to add more algorithms within the same environment.

Command line parameters

The following command line parameters are passed to the script:

  • -j: The job id (important for submitting results)
  • -d: The dataset id (important when calling the API to load the dataset)
  • --api_host: The host where the REST API is available
  • --send_sepsets: Whether the REST API for result posting accepts separation sets in the results.
  • All parameters specified in valid_parameters in the algorithm.json

Adding the algorithm to the configuration

When the image containing the script file is prepared and built with a given tag, the algorithm can be added to the algorithms.json file in the conf directory. To add an algorithm there, add an object to the list with the following parameters set correctly:

  • name: Can be chosen freely, will be shown in the UI (must be unique)
  • description: Free text describing algorithm
  • script_filename: The command to be run in the image to launch the algorithm.
  • docker_image: The tag of the docker image that should be launched. It needs to be available in a docker registry.
  • valid_parameters: A dictionary that specifies additional valid command line parameters that the algorithm script accepts. Every key is a parameter, with the value being another object describing its constraints. Field type in this object describes the data type of the parameter, one out of 'str', 'enum', 'int', 'float' and 'bool'. If the type is 'enum', there must be an additional field values with all valid enum values. If the type is 'int' or 'float', optional fields minimum and maximum can limit the value range. The exact definition comes from the validator in models/algorithm.py.
  • docker_parameters: A dictionary containing additional arguments for the docker API call. Available arguments can be found here

For exact definitions, see the model defined in models/algorithm.py. Examples can be found in the current algorithms.json.

Once the algorithm is added to the file, run

garden deploy

as it is executed in scripts/update.sh.