Skip to content

Commit

Permalink
Restructure architecture section
Browse files Browse the repository at this point in the history
This commit restructures the architecture section to start
with durable execution and the service invocation flow. The
runtime specific sections are now grouped under "Runtime".
  • Loading branch information
tillrohrmann committed Aug 7, 2023
1 parent cc31ae7 commit 9ffae9f
Show file tree
Hide file tree
Showing 4 changed files with 42 additions and 40 deletions.
82 changes: 42 additions & 40 deletions docs/restate/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,44 @@ Services run on *Service Endpoints* which can be deployed independently of the r

![High level architecture](/img/restate-architecture.png)

## Scalability
## Durable execution via journaling

Things always eventually go wrong, in particular at scale when running a distributed application.
Therefore, services that Restate invokes will also fail eventually.
In these cases, the services need to be re-invoked so that they can complete.
Ideally, services resume from the point where they failed without re-doing all the previous actions, effectively resulting into durable execution of services.

The way Restate achieves durable execution is by recording the actions a service takes in a durable journal.
When re-invoking a service, the journal is sent alongside the service state to the service endpoint.
The service endpoint resumes the invocation by replaying the recorded journal entries without re-doing the actual operations (e.g. calling another service, storing state, running a side-effect, etc.).
A useful side-effect of the journal is that Restate can always suspend an invocation and resume it at a later point in time if need should arise.

![Durable execution](/img/durable-execution.png)

## Service invocation flow

All service invocations need to go through Restate which acts as a reverse-proxy for the registered services.
Restate knows where the services are running and invokes them on behalf of the clients.
By putting Restate in charge of the actual invocation, the system can achieve the following properties:

* Enrich the invocation with extra information such as service and journal state
* Guarantee that keyed invocations are executed in order and isolation
* Retry failed invocations until they complete

The service invocation flow is as follows:

1. New invocations are received by the *Ingress service* running on the *Workers*. The ingress extracts the invocation key which determines the partition responsible for processing the invocation.
2. Based on the key, the invocation is forwarded to the right *Worker* running the *Partition processor* of the target partition.
3. The *Partition processor* makes sure that for a given key and service, the invocations are executed sequentially.
4. Once all preceding invocations have completed, the invocation request will be enriched with the service's state and sent to the service endpoint where it executes.
5. While executing, the runtime journals actions the service takes in order to be able to recover from failures without redoing these actions.
6. Once the invocation completes with a response, it will be sent back to the *Ingress service* and then to the invoking client.

![Service invocation flow](/img/service-invocation-flow.png)

## Runtime

### Scalability

Restate is highly scalable because it shards the space of service invocations into partitions which are processed each by a dedicated *Partition processor*.
Each *Worker* runs a set of these *Partition processors*.
Expand All @@ -30,7 +67,7 @@ In order to react to changing workloads, the partitions can be merged and split
Dynamic partitioning is still under development.
:::

## Consistency and fault-tolerance
### Consistency and fault-tolerance

Consistency and fault-tolerance is achieved by replicating the *Partition processors* via [Raft](https://raft.github.io/).
All commands for a partition go through the Raft log, which ensures that all partition processor replicas stay consistent and can be recovered in case of failures.
Expand All @@ -45,33 +82,19 @@ Such an architecture is known as [Multi-Raft](https://tikv.org/deep-dive/scalabi
Currently, Restate runs as a single process. The distributed implementation is still under development.
:::

## State storage
### State storage

The Raft and *Partition processor* state is stored by the *Workers* in [RocksDB](https://github.com/facebook/rocksdb).
Using RocksDB allows for graceful spilling to disk and comparatively fast writes.
Moreover, it supports asynchronous checkpointing which is required to truncate the Raft log.

## State queries
### State queries

Restate makes its internal state accessible via a SQL interface.
Any client that supports the PostgreSQL wire protocol can be used to issue queries.
Internally, the SQL queries are executed using [DataFusion](https://github.com/apache/arrow-datafusion).

## Durable execution via journaling

Things always eventually go wrong, in particular at scale when running a distributed application.
Therefore, services that Restate invokes will also fail eventually.
In these cases, the services need to be re-invoked so that they can complete.
Ideally, services resume from the point where they failed without re-doing all the previous actions, effectively resulting into durable execution of services.

The way Restate achieves durable execution is by recording the actions a service takes in a durable journal.
When re-invoking a service, the journal is sent alongside the service state to the service endpoint.
The service endpoint resumes the invocation by replaying the recorded journal entries without re-doing the actual operations (e.g. calling another service, storing state, running a side-effect, etc.).
A useful side-effect of the journal is that Restate can always suspend an invocation and resume it at a later point in time if need should arise.

![Durable execution](/img/durable-execution.png)

## Service registry
### Service registry

All servie meta information is maintained by the *Metas* via the service registry.
The service registry contains information about the registered services which includes the address of the service endpoints, the exposed service methods, their signatures and type definitions.
Expand All @@ -80,24 +103,3 @@ The service contracts and their type definitions are also exposed via [gRPC refl
Services are added to the registry by discovering service endpoints.
Upon discovery request, the *Metas* reach out to the specified service endpoint and retrieve all registered services, their methods and type definitions.
After discovery, the new service meta information is propagated to the *Workers* which enables invoking these services through Restate.

## Service invocation flow

All service invocations need to go through Restate which acts as a reverse-proxy for the registered services.
Restate knows where the services are running and invokes them on behalf of the clients.
By putting Restate in charge of the actual invocation, the system can achieve the following properties:

* Enrich the invocation with extra information such as service and journal state
* Guarantee that keyed invocations are executed in order and isolation
* Retry failed invocations until they complete

The service invocation flow is as follows:

1. New invocations are received by the *Ingress service* running on the *Workers*. The ingress extracts the invocation key which determines the partition responsible for processing the invocation.
2. Based on the key, the invocation is forwarded to the right *Worker* running the *Partition processor* of the target partition.
3. The *Partition processor* makes sure that for a given key and service, the invocations are executed sequentially.
4. Once all preceding invocations have completed, the invocation request will be enriched with the service's state and sent to the service endpoint where it executes.
5. While executing, the runtime journals actions the service takes in order to be able to recover from failures without redoing these actions.
6. Once the invocation completes with a response, it will be sent back to the *Ingress service* and then to the invoking client.

![Service invocation flow](/img/service-invocation-flow.png)
Binary file modified static/img/durable-execution.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified static/img/service-invocation-flow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified static/img/sharding.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 9ffae9f

Please sign in to comment.