Environment variables

There is a config.py file in the services/python-images/src/master directory that contains all environment variables passed into the backend and scheduler pods. By using those variables, it is possible to manipulate the configuration of these two. The deployment.yaml files of each deployment already set them in a way that they do not need to be modified.

For executing the services locally, semi-required variables are required for the system to work, but are correctly configured in any default setup. Optional settings can be set to manipulate the system's behavior.

Those variables are available:

API_HOST (semi-required) should be set to the name that the executor scripts can use to call the API. This should not be set when the usual docker-compose files are used, the local Docker instance is used as well, and the networking is enabled (see later variables) as in those cases it will be set correctly automatically. When the execution environments are run on a remote docker instance, or when the Docker network used by the backend is not reused in the execution environments, this should be set to the correct IP/hostname where the webserver is available from the outside world.
DB_TYPE (semi-required, 'postgresql') The database type to be used for the configuration database. This should correspond to the DB types used for SQLAlchemy connection strings (https://docs.sqlalchemy.org/en/latest/core/engines.html#sqlalchemy.create_engine)
DB_HOST (semi-required, 'database') The host, where the configuration DB is available
DB_DATABASE (semi-required, 'postgres') The name of the DB on the DBMS running on the DB_HOST
DB_USER (semi-required, 'admin') The user to log in on the DB
DB_PASSWORD (semi-required, 'admin') The password for the user
SQLALCHEMY_DATABASE_URI (semi-required) Will be generated automatically from the settings above, can be set here otherwise. Should be an SQLAlchemy compatible connection string. If set, the settings above will not be used.
DATA_SOURCE_CONNECTIONS (optional) If set, should be in the form of an JSON-object. It should contain a mapping from name on SQLAlchemy compatible connection strings (see above). An example set would be: DATA_SOURCE_CONNECTIONS={"hana":"hana+pyhdb://user@host:port"}
DAEMON_CYCLE_TIME (semi-required, 5 (seconds)) The time in seconds, which the background daemon waits to check for jobs that should be running but have stopped doing so. If this is decreased, this might increase the load on the scheduler deployment but increases the interactivity of the application (as jobs are marked as failed faster).
RESULT_READ_BUFF_SIZE (semi-required, 16384 (kb)) The buffer size used by the JSON parser when an experiments result is posted. This can be set accordingly to the available RAM.
RESULT_WRITE_BUFF_SIZE (semi-required, 1024 (objects)) The number of objects stored in RAM that are sent as bulk insert to the configuration DB, when a result is written. Decreasing this decreases the RAM required by the worker that handles the request, but increases the time required to handle the result, as more DB operations are initiated.
LOAD_SEPARATION_SET (semi-required, default true) Set to 'true' to enable acceptance of separation sets when results are sent back to the server. Setting this to 'false' or any other value but 'true' disables parsing on the server side and notifies the script via command line parameter that it is supposed not to send the separation sets with the result.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Environment variables

Causal Inference Pipeline

Clone this wiki locally