Skip to content

Cylc 8 architecture security model and design decisions

Jacinta Richardson edited this page Feb 21, 2020 · 7 revisions

Cylc 8 architecture

There are several components involved in the cylc-8 architecture. These are as follows:

  • Proxy
  • Hub
  • UI Server
  • Workflow hosts
  • Job hosts
  • ZeroMQ

Proxy

A configurable HTTP proxy that provides access to the UI Servers.

Hub

Currently an un-modified Jupyter Hub, the hub exists for the following purposes.

  • Authenticating users and identifying their roles/permissions
  • Re-authenticating users where applicable
  • Spawning UI servers belonging to specific users

UI Server

A Jupyter-notebook inspired custom UI server, that runs with the permissions of a regular system user. Provides the HTML+ web UI to the user's workflows. UI Servers may be located on the same host as the Hub or on other hosts. One UI Server exists per user. The UI server:

  • Lists workflows
  • Allows interaction with specific workflows owned by the same user as the UI Server owner (stop, start, hold, edit triggers etc) by both the UI Server owner and anyone authenticated with a role that allows that interaction.
  • Provides access to workflow logs
  • Provides 'rose edit' functionality, to allow editing of workflow parameters.

Workflow host

Host and file system where the workflow files have been installed, and where cylc runs the workflows. A UI Server may have workflows across multiple hosts, but each workflow is only on one host.

A workflow is the same as a "cylc suite" and performs as defined in the workflow's suite.rc.

Job host

Host and file system where a workflow's jobs run. A workflow may have jobs across multiple hosts, including background jobs run on the same host as the workflow is defined on.

ZeroMQ

ZeroMQ is used to provide reliable communication between a workflow's jobs and itself, and between a workflow and its UI Server. By utilizing a messaging queue, messages are robust against network hiccoughs.

Architectural considerations

Two primary principles have lay behind decisions in making this architecture:

  1. Workflows have to be able to run, and submit their tasks.
  2. Users have to be able to find, start, stop, edit all of workflows they have permission to interact with from a single location.

In every case, tried-and-proven technologies have been preferred over custom-work and non-privileged actions have been preferred over privileged actions. Intra-workflow permissions rely on UNIX file system permissions, for example the UI Server acts on a workflow as its user, workflows run only as their user, and jobs run only as their user. Only files which have the execute bit set for the user can be executed, only files and directories which have the write bit set can be written to and so forth. Inter-workflow permissions rely on authentication at the hub and authorization at the UI server.

Component security

User's browser connection to proxy/hub

User connection to the proxy and hub will be via HTTPS or WebSockets over SSL/TLS (aka Websockets over HTTPS, aka WSS) with a signed certificate as arranged by the organisation.

Where the connection can use WSS, the interaction with workflows and the UI will be appropriately faster than the equivalent over HTTPS. WebSockets over SS/TLS is well supported with modern browsers and HTTPS is available as a secure fall back option.

Proxy connection with (spawned) UI Server

The UI Server will be hosted to appear at the same domain at an address like /usr/{name} The proxy will proxy the established HTTPS/WSS connection through to the UI Server, even where the UI Server is hosted on a different machine than the hub. (? check this)

Hub

As an unedited version of the Jupyter Hub, the Jupyter Hub Security Overview is generally relevant.

Authentication is performed by the use of a Jupyter Hub authentication plugin to the organisation's host or site identity management eg PAM, LDAP, OAuth (GitHub and Google accounts), etc. See Jupyter's Authenticators page for more detail.

Successful authentication will generate a token representing the user, their roles (if applicable) and their session. This is shared with the UI Server. Authentication state (and information) is encrypted with Fernet as per the Jupyter's Authenticators page.

Authorization at the UI Server

Cylc UI servers are independent of each other and cannot share HTML fragments or code between each other. Unlike Jupyter notebooks, the HTML from UI Server is not generated by users, and indeed all user input displayed on the UI Server (such as workflow and task names) are HTML-escaped before display.

With one exception, each UI Server provides an independent view of the workflows owned by the UI Server's owner. Any action the UI Server enables is performed by that UI Server's UNIX user.

The partial exception is the gscan-like functionality. The gscan-like functionality behaves differently than the cylc UI Servers as it provides a (read-only) view into all of the running (and stopped?) workflows for multiple users.

To enable authenticated users to perform actions on other users' UI Servers, the user must be authorized to perform this action. Authorization is broken into three concepts:

  • Read-only - a user may view the workflow, its logs and its full state, but make no changes
  • Execute - a user may stop, start, pause/hold, restart the workflow and tasks
  • Write - a user may perform edit triggers, and make other code-related changes to workflow tasks and suite.rc

The precise mechanics for authorization are still under development.

Questions:

  • how are the workflow files actually deployed onto the workflow server?
  • if a workflow is started manually, but in an equivalent way to the UI Server's starting them, does the "contact" file have to be registered with the UI server in some way or will it just scan over the equivalent of ~/cylc-run/*/ looking for contact files? (Is this how it will find stopped suites? Can we therefore just delete/move old ones when we don't want those suites to show up as existing and stopped?))
  • how are command-line level interactions managed?