Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HA mode for service-mirror #11047

Merged
merged 8 commits into from
Jul 17, 2023
Merged

Add HA mode for service-mirror #11047

merged 8 commits into from
Jul 17, 2023

Conversation

mateiidavid
Copy link
Member

@mateiidavid mateiidavid commented Jun 21, 2023

In certain scenarios, the service-mirror may act as a single point of failure. Linkerd's multicluster extension supports an --ha mode to increase reliability by adding more replicas, however, it is currently supported only in the gateway.

To avoid the service-mirror as a single point of failure, this change introduces an --ha flag for linkerd multicluster link. The HA flag will use a set of value overrides that will:

  • Configure the service-mirror with affinity and PDB policies to ensure replicas are spread across hosts to protect against (in)voluntary disruptions;
  • Configure the service-mirror to run with more than 3 replicas;
  • Configure the service-mirror deployment's rolling strategy to ensure at least one replica is available.

Additionally, with the introduction of leader election, linkerd mc gateways displays redundant information since metrics are collected from each pod. This change adds a small lookup table of currently lease claimants. Metrics are extracted only for claimants.

Depends on #11046

In certain scenarios, the service-mirror may act as a single point of
failure. Linkerd's multicluster extension supports an `--ha` mode to
increase reliability by adding more replicas, however, it is currently
supported only in the gateway.

To avoid the service-mirror as a single point of failure, this change
introduces an `--ha` flag for `linkerd multicluster link`. The HA flag
will use a set of value overrides that will:

* Configure the service-mirror with affinity and PDB policies to ensure replicas
  are spread across hosts to protect against (in)voluntary disruptions;
* Configure the service-mirror to run with more than 3 replicas;
* Configure the service-mirror deployment's rolling strategy to ensure
  at least one replica is available.

Additionally, with the introduction of leader election, `linkerd mc
gateways` displays redundant information since metrics are collected
from each pod. This change adds a small lookup table of currently lease
claimants. Metrics are extracted only for claimants.

Signed-off-by: Matei David <[email protected]>
@mateiidavid mateiidavid requested a review from a team as a code owner June 21, 2023 15:45
@mateiidavid
Copy link
Member Author

DIff:


# 
# < -- missing from unha.yaml
# > -- present in unha.yaml
:; diff ha.yaml unha.yaml
121c121
<   replicas: 3
---
>   replicas: 1
126,128d125
<   strategy:
<     rollingUpdate:
<       maxUnavailable: 1
167,182d163
< ---
< kind: PodDisruptionBudget
< apiVersion: policy/v1
< metadata:
<   name: linkerd-service-mirror-target
<   namespace: linkerd-multicluster
<   labels:
<     component: linkerd-service-mirror
<   annotations:
<     linkerd.io/created-by: linkerd/cli dev-352e404a-matei
< spec:
<   maxUnavailable: 1
<   selector:
<     matchLabels:
<       component: linkerd-service-mirror
<       mirror.linkerd.io/cluster-name: target
# Before
:; linkerd mc gateways
CLUSTER  ALIVE    NUM_SVC      LATENCY
target   False          2            -
target   True           2          3ms
target   False          2            -

# After
:; bin/linkerd mc gateways
CLUSTER  ALIVE    NUM_SVC      LATENCY
target   True           2          3ms

@@ -579,9 +579,21 @@ func (hc *healthChecker) checkIfGatewayMirrorsHaveEndpoints(ctx context.Context,
continue
}

leases, err := hc.KubeAPIClient().CoordinationV1().Leases(multiclusterNs.Name).List(ctx, selector)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a non HA mode with only one instance running will it still have a lease resource?

Copy link
Member Author

@mateiidavid mateiidavid Jun 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. This check should be safe to run regardless of the deployment model. The same applies for multicluster gateways where we pull metrics from the "leader". We can guarantee there will always be a leader since a lease is created and claimed under all circumstances.

As an aside, there might be a smarter way to pull metrics (e.g. aggregate metrics from all pods under a deployment). I thought I'd keep it simple though.

multicluster/cmd/check.go Outdated Show resolved Hide resolved
multicluster/cmd/check.go Outdated Show resolved Hide resolved
Copy link
Member

@alpeb alpeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!
Can you also also include a podAntiAffinity field like we do for the linkerd-control-plane components? I believe the PDB doesn't cover that aspect.

multicluster/values/values.go Outdated Show resolved Hide resolved
Signed-off-by: Matei David <[email protected]>
@mateiidavid
Copy link
Member Author

mateiidavid commented Jul 12, 2023

Added missing pod anti affinity partial in the deployment. For context:

  • Anti affinity works by selecting over a group of pods (using a label selector) and a topology key (e.g. kubernetes.io/hostname). Depending on the strategy (preferred vs required) pods will not be scheduled on the same node that satisfies the topology key. This allows us to have separate failure domains for HA.
  • Typically, a component or app label key is used to differentiate between pods (i.e. as the affinity label selector). All service-mirror components, irrespective of the cluster they're linked against, have the same value for "component"; component=linkerd-service-mirror
  • Users should be able to selectively turn HA on on a link-by-link basis. Enforcing affinity using component means all service-mirror pods will be affected by the scheduling policy, not just service-mirror pods specific to that link.

As a result, I have chosen to use a different label selector. Full list of labels each service-mirror receives is below:

{
"component":"linkerd-service-mirror",
"linkerd.io/control-plane-ns":"linkerd",
"linkerd.io/extension":"multicluster",
"linkerd.io/proxy-deployment":"linkerd-service-mirror-target",
"linkerd.io/workload-ns":"linkerd-multicluster",
"mirror.linkerd.io/cluster-name":"target",
"pod-template-hash":"c7c6555df"
}

The only variable value is the link cluster name. As a result, anti affinity selects pods using mirror.linkerd.io/cluster-name=<name> as a label selector. Another alternative would be to use both mirror.linkerd.io/cluster-name and component, or better yet use component and append the cluster name in the label value.

@mateiidavid mateiidavid requested a review from alpeb July 12, 2023 14:15
Copy link
Member

@alpeb alpeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mateiidavid, I think the pod anti-affinity scheme if fine enough; people could kustomize if they'd require something more sophisticated 👍

@mateiidavid mateiidavid merged commit d0e837d into main Jul 17, 2023
37 checks passed
@mateiidavid mateiidavid deleted the matei/service-mirror-ha branch July 17, 2023 08:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants