Skip to content

Latest commit

 

History

History
245 lines (167 loc) · 9.55 KB

installation-guide.adoc

File metadata and controls

245 lines (167 loc) · 9.55 KB

Smarti Installation Guide

The Smarti installation guide is written for developers, operators and administrators who like to run Smarti on a server, as professional service or in production.

General recommendations

Run Smarti as a service in production

Installation packages are provided for Debian- and RedHat-based systems. As Smarti is a Spring Boot application it can be started by directly launching the executable jar:

$ java -jar smarti-${smarti-version}-exec.jar
Important
To run Smarti in production it is recommended to deploy the deb or rpm package on your target environment instead executing the jar directly. If you like to setup your development client or a non-productive server read section Build and Run explains how to do start Smarti by executing the executable jar.

Run MongoDB as a dedicated service

Important
In a production scenario it is absolutely recommended to run MongoDB on a dedicated system. If you like to run MongoDB on the same machine as the Smarti application 4 GB RAM are insufficient. A developer client should have at least 8 GB RAM.

Deploy / Install

Deploy with Docker

$ docker run -d --name mongodb mongo
$ docker run -d --name smarti --link mongodb:mongo -p 8080:8080 redlinkgmbh/smarti

The provided docker-image does not contain some required (GPL-licensed) dependencies. To build a contained image, use the following snippet:

$ docker build -t smarti-with-stanfordnlp -<EOF
FROM redlinkgmbh/smarti

USER root
ADD [\
    "https://repo1.maven.org/maven2/edu/stanford/nlp/stanford-corenlp/3.8.0/stanford-corenlp-3.8.0.jar", \
    "https://repo1.maven.org/maven2/edu/stanford/nlp/stanford-corenlp/3.8.0/stanford-corenlp-3.8.0-models-german.jar", \
    "/opt/ext/"]
RUN chmod -R a+r /opt/ext/
USER smarti
EOF

Deploy on Debian

$ sudo dpkg -i ${smarti-source}/dist/target/smarti_${smarti-build-version}.deb

Deploy on Red Hat

$ sudo rpm -i ${smarti-source}/dist/target/smarti_${smarti-build-version}.rpm

Installing Additional Components

To install additional components, e.g. Stanford-NLP, add the respective libraries into the folder /var/lib/smarti/ext.

Stanford-NLP can be installed using the following command:

$ wget -P /var/lib/smarti/ext -nH -nd -c \
    https://repo1.maven.org/maven2/edu/stanford/nlp/stanford-corenlp/3.8.0/stanford-corenlp-3.8.0.jar \
    https://repo1.maven.org/maven2/edu/stanford/nlp/stanford-corenlp/3.8.0/stanford-corenlp-3.8.0-models-german.jar

Changing the User

When installed using one of the provided packages (deb, rpm), smarti runs as it’s own system user smarti. This user is created during the installation process if it does not already exists.

To run smarti as a different user (assistify in this example), do the following steps:

  1. Create the new working-directory

    $ mkdir -p /data/smarti
    $ chown -R assistify: /data/smarti
  2. Populate the new working-directory with the required configuration files, e.g. by copying the default settings

    $ cp /etc/smarti/* /data/smarti
  3. Update the systemd configuration for smarti

    $ systemctl edit smarti

    and add the following content

    [Service]
    User = assistify
    WorkingDirectory = /data/smarti
  4. Restart the smarti

    $ systemctl try-restart smarti

System Maintenance

Daemon / Service

Configuration
  • /etc/default/smarti

  • JVM and JVM Options (e.g. -Xmx4g)

  • Program Arguments (overwriting settings from the application.properties)

Start / Stop / Restart
$ sudo service smarti [ start | stop | restart  ]
Status

To check the service status use the daemon

$ service smarti status

This should result in something like

● smarti.service - smarti
   Loaded: loaded (/lib/systemd/system/smarti.service; enabled; vendor preset: enabled)
   Active: active (running) since Sat 2017-08-26 12:12:28 UTC; 6s ago
 Main PID: 4606 (bash)
    Tasks: 22
   Memory: 647.8M
      CPU: 12.518s
   CGroup: /system.slice/smarti.service

Logging

  • Working-Directory at /var/lib/smarti/

  • Log-Files available under /var/lib/smarti/logs/, a symlink to /var/log/smarti/

  • Log-Configuration under /var/lib/smarti/logback.xml, a symlink to /etc/smarti/logback.xml

Note
Please keep in mind, that if smarti runs as a different user it probably also has a custom working directory. In such case, logs are stored in the new working directory (or whatever is configured in the logging-configuration).

To Check if Smarti is started search in /var/log/smarti/main.log for something like

[main] INFO  io.redlink.smarti.Application - smarti started: http://${smarti-host}:${smarti-port}/

Monitoring

System Health

A system-wide health check is available at http://${smarti-host}:${smarti-port}/system/health. In case of any problems you can call this URL in your browser or send a curl in order to check if Smarti is up and running.

$ curl -X GET http://${smarti-host}:${smarti-port}/system/health

On success, if everything is running it returns: {"status":"UP"} else {"status":"DOWN"}.

Note
{"status":"DOWN"} is also reported, if Stanford NLP libraries are not present in the classpath.

System Info

You can get a detailed information about the build version that is running by calling

$ curl -X GET http://${smarti-host}:${smarti-port}/system/info

Third-Party Dependencies and Licenses

When installing via one of the provided packages (rpm, deb) a list of used third-party libraries and their licenses are available under

  • /usr/share/doc/smarti/THIRD-PARTY.txt (backend), and

  • /usr/share/doc/smarti/UI-THIRD-PARTY.json (frontend, UI)

From the running system, similar files are served.

Backend

$ curl http://${smarti-host}:${smarti-port}/THIRD-PARTY.txt

Frontend UI

$ curl http://${smarti-host}:${smarti-port}/3rdPartyLicenses.json

Configuration

Once Smarti is up and running, go ahead and read the next section Application Configuration about Smarti’s powerful configuration options.

Requirements (Memory, CPU)

Note
The default configuration of Smarti is optimized for 4 GByte of Memory (JVM parameter: -Xmx4g) and 2 CPU cores.

Smarti performs text analysis over conversations. Such analysis are based on machine learning models that require a considerable amount of memory and CPU time. The selection of those models and the configuration of those components highly influence the memory requirements.

Analysis are precessed asynchronously in a thread pool with a fixed size. The size of this pool should not exceed the number of available CPU cores. For a full utilization of the CPU resources the number of threads should be close or equals to the number of CPU cores. However every processing thread will also require additional memory. While this number highly depends on the analysis configuration 500 MByte is a good number to start with.

As an example running Smarti with the default Analysis configuration on a host with 16 CPU cores should use a thread pool with 15 threads and will require 3 GByte + 15 * 500 MByte = 10.5 GByte Memory.

Relevant Configuration Options

  • smarti.processing.numThreads = 2: The number of analysis processing threads. This number should be close or equals to the number of CPUs. Each thread will also require additional memory to handle processing information. This amount highly depends on the analysis configuration. Especially of the Stanford NLP Parser component (see below).

  • Standord NLP includes several configuration options that influence the memory (both base and processing) memory requirements.

    • nlp.stanfordnlp.de.nerModel = edu/stanford/nlp/models/ner/german.conll.hgc_175m_600.crf.ser.gz: Name Finder models are the biggest contribution to the base memory requirement. The default model uses about 300 MByte

    • Parser: The parsed component has the highest impact on the memory requirements of Smarti. The defautl configuration uses the Shift Reduce Parser as it requires the least memory. As potential configuration options do have a huge impact it is recommended to look at additional information provided by the Stanford Parser FAQ (NOTE that values provided on this page are for a 32bit JVM. So memory requirements on modern systems will be double those values).

      • nlp.stanfordnlp.de.parseModel = edu/stanford/nlp/models/lexparser/germanPCFG.ser.gz: The parse model to use. Stanford does support three models that have hugely different memory and cpu requirements. Also not that those memory requirement are for processing memory, what means that this memory needs to be provided per analysis thread!

      • nlp.stanfordnlp.de.parseMaxlen = 30: Parsing longer sentences requires square more memory. So restricting the maximum number of tokens of sentences is essential. 30 Tokens seams to be a good tradeoff. In settings where memory is not a concern this value can be increased to 40. If memory consumption is a problem on can decrease this value to 20 tokens.

  • Open NLP provides an additional Named Entity Finder model for German. The model is provided by IXA Nerc and requires about 500 MByte of memory.