Skip to content

Commit

Permalink
Refactored README.md (#97)
Browse files Browse the repository at this point in the history
* Refactored README.md

Supersedes #85

Cleared it up a little bit and fixed some bugs.

Co-authored-by: Matei David <[email protected]>
  • Loading branch information
alpeb and mateiidavid authored Mar 9, 2022
1 parent bcc14fe commit da768b5
Show file tree
Hide file tree
Showing 3 changed files with 167 additions and 51 deletions.
96 changes: 51 additions & 45 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,45 +8,15 @@ The mechanism relies on Linkerd’s traffic-splitting functionality by providing
an operator to alter the backend services' weights in real time depending on
their readiness.

## Failover criteria
## Table of contents

The failover criteria is readiness failures on the targeted Pods. This is
directly reflected on the Endpoints pointing to those Pods: only when Pods are
ready, does the `addresses` field of the relevant Endpoints get populated.

## Services declaration

The primitive used to declare the services to fail over is Linkerd's
`TrafficSplit` CRD. The `spec.service` field contains the service name addressed
by clients, and the `spec.backends` fields contain all the possible services
that apex service might be served by. The service to be considered as primary is
declared in the `failover.linkerd.io/primary-service` annotation. Those backend
services can be located in the current cluster or they can point to mirror
services backed by services in other clusters (through Linkerd's multicluster
functionality).

## Operator

Linkerd-failover is an operator to be installed in the local cluster (there
where the clients consuming the service live), whose responsibility is to watch
over the state of the Endpoints that are associated to the backends of the
`TrafficSplit`, reacting to the failover criteria explained above.

## Failover logic

The following describes the logic used to change the `TrafficSplit` weights:

- Whenever the primary backend is ready, all the weight is set to it, setting
the weights for all the secondary backends to zero.
- Whenever the primary backend is not ready, the following rules apply only if
there is at least one secondary backend that is ready:
- The primary backend’s weight is set to zero
- The weight is distributed equally among all the secondary backends that
are ready
- Whenever a secondary backend changes its readiness, the weight is
redistributed among all the secondary backends that are ready
- Whenever both the primary and secondaries are all unavailable, the connection
will fail at the client-side, as expected.
- [Requirements](#requirements)
- [Configuration](#configuration)
- [Installation](#installation)
- [Example](#example)
- [Implementation details](#implementation-details)
- [Failover criteria](#failover-criteria)
- [Failover logic](#failover-criteria)

## Requirements

Expand All @@ -60,9 +30,13 @@ The following Helm values are available:
- `selector`: determines which `TrafficSplit` instances to consider for
failover. It defaults to `failover.linkerd.io/controlled-by={{.Release.Name}}`
(the value refers to the release name used in `helm install`).
- `logLevel`, `logFormat`: for configuring the operator's logging.

## Installation

The SMI extension and the operator are to be installed in the local cluster
(where the clients consuming the service are located).

Linkerd-smi installation:

```console
Expand All @@ -74,21 +48,28 @@ helm install linkerd-smi -n linkerd-smi --create-namespace linkerd-smi/linkerd-s
Linkerd-failover installation:

```console
helm install linkerd-failover -n linkerd-failover --create-namespace --devel linkerd/linkerd-failover
```

### Running locally for testing
# In case you haven't added the linkerd-edge repo already
helm repo add linkerd-edge https://helm.linkerd.io/edge
helm repo up

```console
cargo run
helm install linkerd-failover -n linkerd-failover --create-namespace --devel linkerd-edge/linkerd-failover
```

## Example

The following `TrafficSplit` serves as the initial state for a failover setup.

Clients should send requests to the apex service `sample-svc`. The primary
service that will serve these requests is declared through the
`failover.linkerd.io/primary-service` annotation, `sample-svc` in this case.

When `sample-svc` starts failing, the weights will be switched over the other
backends.

Note that the failover services can be located in the local cluster, or they can
point to mirror services backed by services in other clusters (through Linkerd's
multicluster functionality).

```yaml
apiVersion: split.smi-spec.io/v1alpha2
kind: TrafficSplit
Expand All @@ -97,7 +78,7 @@ metadata:
annotations:
failover.linkerd.io/primary-service: sample-svc
labels:
app.kubernetes.io/managed-by: linkerd-failover
failover.linkerd.io/controlled-by: linkerd-failover
spec:
service: sample-svc
backends:
Expand All @@ -112,3 +93,28 @@ spec:
- service: sample-svc-asia1
weight: 0
```
## Implementation details
### Failover criteria
The failover criteria is readiness failures on the targeted Pods. This is
directly reflected on the Endpoints object associated with those Pods: only when
Pods are ready, does the `addresses` field of the relevant Endpoints get
populated.

### Failover logic

The following describes the logic used to change the `TrafficSplit` weights:

- Whenever the primary backend is ready, all the weight is set to it, setting
the weights for all the secondary backends to zero.
- Whenever the primary backend is not ready, the following rules apply only if
there is at least one secondary backend that is ready:
- The primary backend’s weight is set to zero.
- The weight is distributed equally among all the secondary backends that
are ready.
- Whenever a secondary backend changes its readiness, the weight is
redistributed among all the secondary backends that are ready
- Whenever both the primary and secondaries are unavailable, the connection will
fail at the client-side, as expected.
61 changes: 58 additions & 3 deletions charts/linkerd-failover/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,24 +7,79 @@

**Homepage:** <https://linkerd.io>

## Linkerd-smi Required
## Requirements

Besides Linkerd and the operator itself, since we make use of the `TrafficSplit`
CRD, it is required to install the `linkerd-smi` extension.

## Configuration

The following Helm values are available:

- `selector`: determines which `TrafficSplit` instances to consider for
failover.
- `logLevel`, `logFormat`: for configuring the operators logging.

## Installation

The SMI extension and the operator are to be installed in the local cluster
(there where the clients consuming the service are located).

Linkerd-smi installation:

```console
helm repo add linderd-smi https://linkerd.github.io/linkerd-smi
helm repo add linkerd-smi https://linkerd.github.io/linkerd-smi
helm repo up
helm install linkerd-smi -n linkerd-smi --create-namespace linkerd-smi/linkerd-smi
```

Linkerd-failover installation:

```console
helm install linkerd-failover -n linkerd-failover --create-namespace --devel linkerd/linkerd-failover
# In case you haven't added the linkerd-edge repo already
helm repo add linkerd-edge https://helm.linkerd.io/edge
helm repo up

helm install linkerd-failover -n linkerd-failover --create-namespace --devel linkerd-edge/linkerd-failover
```

## Example

The following `TrafficSplit` serves as the initial state for a failover setup.

Clients should send requests to the apex service `sample-svc`. The primary
service that will serve these requests is declared through the
`failover.linkerd.io/primary-service` annotation, `sample-svc` in this case.

When `sample-svc` starts failing, the weights will be switched over the other
backends.

Note that the failover services can be located in the local cluster, or they can
point to mirror services backed by services in other clusters (through Linkerd's
multicluster functionality).

```yaml
apiVersion: split.smi-spec.io/v1alpha2
kind: TrafficSplit
metadata:
name: sample-svc
annotations:
failover.linkerd.io/primary-service: sample-svc
labels:
failover.linkerd.io/controlled-by: linkerd-failover
spec:
service: sample-svc
backends:
- service: sample-svc
weight: 1
- service: sample-svc-central1
weight: 0
- service: sample-svc-east1
weight: 0
- service: sample-svc-east2
weight: 0
- service: sample-svc-asia1
weight: 0
```
## Get involved
Expand Down
61 changes: 58 additions & 3 deletions charts/linkerd-failover/README.md.gotmpl
Original file line number Diff line number Diff line change
Expand Up @@ -8,24 +8,79 @@

{{ template "chart.homepageLine" . }}

## Linkerd-smi Required
## Requirements

Besides Linkerd and the operator itself, since we make use of the `TrafficSplit`
CRD, it is required to install the `linkerd-smi` extension.

## Configuration

The following Helm values are available:

- `selector`: determines which `TrafficSplit` instances to consider for
failover.
- `logLevel`, `logFormat`: for configuring the operators logging.

## Installation

The SMI extension and the operator are to be installed in the local cluster
(there where the clients consuming the service are located).

Linkerd-smi installation:

```console
helm repo add linderd-smi https://linkerd.github.io/linkerd-smi
helm repo add linkerd-smi https://linkerd.github.io/linkerd-smi
helm repo up
helm install linkerd-smi -n linkerd-smi --create-namespace linkerd-smi/linkerd-smi
```

Linkerd-failover installation:

```console
helm install linkerd-failover -n linkerd-failover --create-namespace --devel linkerd/linkerd-failover
# In case you haven't added the linkerd-edge repo already
helm repo add linkerd-edge https://helm.linkerd.io/edge
helm repo up

helm install linkerd-failover -n linkerd-failover --create-namespace --devel linkerd-edge/linkerd-failover
```

## Example

The following `TrafficSplit` serves as the initial state for a failover setup.

Clients should send requests to the apex service `sample-svc`. The primary
service that will serve these requests is declared through the
`failover.linkerd.io/primary-service` annotation, `sample-svc` in this case.

When `sample-svc` starts failing, the weights will be switched over the other
backends.

Note that the failover services can be located in the local cluster, or they can
point to mirror services backed by services in other clusters (through Linkerd's
multicluster functionality).

```yaml
apiVersion: split.smi-spec.io/v1alpha2
kind: TrafficSplit
metadata:
name: sample-svc
annotations:
failover.linkerd.io/primary-service: sample-svc
labels:
failover.linkerd.io/controlled-by: linkerd-failover
spec:
service: sample-svc
backends:
- service: sample-svc
weight: 1
- service: sample-svc-central1
weight: 0
- service: sample-svc-east1
weight: 0
- service: sample-svc-east2
weight: 0
- service: sample-svc-asia1
weight: 0
```

## Get involved
Expand Down

0 comments on commit da768b5

Please sign in to comment.