Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stable 2.14.10 #12111

Merged
merged 12 commits into from
Feb 20, 2024
Merged

stable 2.14.10 #12111

merged 12 commits into from
Feb 20, 2024

Commits on Feb 17, 2024

  1. Don't remove a Namespace Index while it still contains AuthorizationP…

    …olicy (#12088)
    
    In the inbound policy index, we maintain a policy index per namespace which holds various policy resources for that namespace.  When a per namespace index becomes empty, we remove it.  However, we were not considering authorization policy resources when determining if the index is empty.  This could result in the index being removed even while it contained authorization policy resources, as long as all other resource types did not exist.
    
    This can lead to incorrect inbound policy responses when the per namespace index is recreated, since it will not longer contain the authorization policy.
    
    We update the `is_empty()` function to properly consider authorization policies as well.  We also add some generally useful logging at debug and trace level.
    
    Signed-off-by: Alex Leong <[email protected]>
    Co-authored-by: Oliver Gould <[email protected]>
    adleong and olix0r committed Feb 17, 2024
    Configuration menu
    Copy the full SHA
    2c763fa View commit details
    Browse the repository at this point in the history
  2. Fix race condition in Destination's endpoints watcher (#12022)

    Fixes #12010
    
    ## Problem
    
    We're observing crashes in the destination controller in some scenarios, due to data race as described in #12010.
    
    ## Cause
    
    The problem is the same instance of the `AddressSet.Addresses` map is getting mutated in the endpoints watcher Server [informer handler](https://github.com/linkerd/linkerd2/blob/edge-24.1.3/controller/api/destination/watcher/endpoints_watcher.go#L1309), and iterated over in the endpoint translator [queue loop](https://github.com/linkerd/linkerd2/blob/edge-24.1.3/controller/api/destination/endpoint_translator.go#L197-L211), which run in different goroutines and the map is not guarded. I believe this doesn't result in Destination returning stale data; it's more of a correctness issue.
    
    ## Solution
    
    Make a shallow copy of `pp.addresses` in the endpoints watcher and only pass that to the listeners. It's a shallow copy because we avoid making copies of the pod reference in there, knowing it won't get mutated.
    
    ## Repro
    
    Install linkerd core and injected emojivoto and patch the endpoint translator to include a sleep call that will help surfacing the race (don't install the patch in the cluster; we'll only use it locally below):
    
    <details>
      <summary>endpoint_translator.go diff</summary>
    
    ```diff
    diff --git a/controller/api/destination/endpoint_translator.go b/controller/api/destination/endpoint_translator.go
    index d1018d5f9..7d5abd638 100644
    --- a/controller/api/destination/endpoint_translator.go
    +++ b/controller/api/destination/endpoint_translator.go
    @@ -5,6 +5,7 @@ import (
            "reflect"
            "strconv"
            "strings"
    +       "time"
    
            pb "github.com/linkerd/linkerd2-proxy-api/go/destination"
            "github.com/linkerd/linkerd2-proxy-api/go/net"
    @@ -195,7 +196,9 @@ func (et *endpointTranslator) processUpdate(update interface{}) {
     }
    
     func (et *endpointTranslator) add(set watcher.AddressSet) {
            for id, address := range set.Addresses {
    +               time.Sleep(1 * time.Second)
                    et.availableEndpoints.Addresses[id] = address
            }
    ```
    </details>
    
    Then create these two Server manifests:
    
    <details>
      <summary>emoji-web-server.yml</summary>
    
    ```yaml
    apiVersion: policy.linkerd.io/v1beta2
    kind: Server
    metadata:
      namespace: emojivoto
      name: web-http
      labels:
        app.kubernetes.io/part-of: emojivoto
        app.kubernetes.io/name: web
        app.kubernetes.io/version: v11
    spec:
      podSelector:
        matchLabels:
          app: web-svc
      port: http
      proxyProtocol: HTTP/1
    ```
    </details>
    
    <details>
      <summary>emoji-web-server-opaque.yml</summary>
    
    ```yaml
    apiVersion: policy.linkerd.io/v1beta2
    kind: Server
    metadata:
      namespace: emojivoto
      name: web-http
      labels:
        app.kubernetes.io/part-of: emojivoto
        app.kubernetes.io/name: web
        app.kubernetes.io/version: v11
    spec:
      podSelector:
        matchLabels:
          app: web-svc
      port: http
      proxyProtocol: opaque
    ```
    </details>
    
    In separate consoles run the patched destination service and a destination client:
    
    ```bash
    HOSTNAME=foobar go run -race ./controller/cmd/main.go destination -enable-h2-upgrade=true -enable-endpoint-slices=true -cluster-domain=cluster.local -identity-trust-domain=cluster.local -default-opaque-ports=25,587,3306,4444,5432,6379,9300,11211
    ```
    
    ```bash
    go run ./controller/script/destination-client -path web-svc.emojivoto.svc.cluster.local:80
    ```
    
    And run this to continuously switch the `proxyProtocol` field:
    
    ```bash
    while true; do kubectl apply -f ~/src/k8s/sample_yamls/emoji-web-server.yml; kubectl apply -f ~/src/k8s/sample_yamls/emoji-web-server-opaque.yml ; done
    ```
    
    You'll see the following data race report in the Destination controller logs:
    
    <details>
      <summary>destination logs</summary>
    
    ```console
    ==================
    WARNING: DATA RACE
    Write at 0x00c0006d30e0 by goroutine 178:
      github.com/linkerd/linkerd2/controller/api/destination/watcher.(*portPublisher).updateServer()
          /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/watcher/endpoints_watcher.go:1310 +0x772
      github.com/linkerd/linkerd2/controller/api/destination/watcher.(*servicePublisher).updateServer()
          /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/watcher/endpoints_watcher.go:711 +0x150
      github.com/linkerd/linkerd2/controller/api/destination/watcher.(*EndpointsWatcher).addServer()
          /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/watcher/endpoints_watcher.go:514 +0x173
      github.com/linkerd/linkerd2/controller/api/destination/watcher.(*EndpointsWatcher).updateServer()
          /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/watcher/endpoints_watcher.go:528 +0x26f
      github.com/linkerd/linkerd2/controller/api/destination/watcher.(*EndpointsWatcher).updateServer-fm()
          <autogenerated>:1 +0x64
      k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate()
          /home/alpeb/go/pkg/mod/k8s.io/[email protected]/tools/cache/controller.go:246 +0x81
      k8s.io/client-go/tools/cache.(*ResourceEventHandlerFuncs).OnUpdate()
          <autogenerated>:1 +0x1f
      k8s.io/client-go/tools/cache.(*processorListener).run.func1()
          /home/alpeb/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:970 +0x1f4
      k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1()
          /home/alpeb/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:226 +0x41
      k8s.io/apimachinery/pkg/util/wait.BackoffUntil()
          /home/alpeb/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:227 +0xbe
      k8s.io/apimachinery/pkg/util/wait.JitterUntil()
          /home/alpeb/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:204 +0x10a
      k8s.io/apimachinery/pkg/util/wait.Until()
          /home/alpeb/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:161 +0x9b
      k8s.io/client-go/tools/cache.(*processorListener).run()
          /home/alpeb/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:966 +0x38
      k8s.io/client-go/tools/cache.(*processorListener).run-fm()
          <autogenerated>:1 +0x33
      k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
          /home/alpeb/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:72 +0x86
    
    Previous read at 0x00c0006d30e0 by goroutine 360:
      github.com/linkerd/linkerd2/controller/api/destination.(*endpointTranslator).add()
          /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/endpoint_translator.go:200 +0x1ab
      github.com/linkerd/linkerd2/controller/api/destination.(*endpointTranslator).processUpdate()
          /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/endpoint_translator.go:190 +0x166
      github.com/linkerd/linkerd2/controller/api/destination.(*endpointTranslator).Start.func1()
          /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/endpoint_translator.go:174 +0x45
    ```
    </details>
    
    ## Extras
    
    This also removes the unused method `func (as *AddressSet) WithPort(port Port) AddressSet` in endpoints_watcher.go
    alpeb authored and adleong committed Feb 17, 2024
    Configuration menu
    Copy the full SHA
    53b5702 View commit details
    Browse the repository at this point in the history
  3. Reflect changes to server selector in opaqueness (#12031)

    Fixes #11995
    
    If a Server is marking a Pod's port as opaque and then the Server's podSelector is updated to no longer select that Pod, then the Pod's port should no longer be marked as opaque. However, this update does not result in any updates from the destination API's Get stream and the port remains marked as opaque.
    
    We fix this by updating the endpoint watcher's handling of Server updates to consider both the old and the new Server.
    
    Signed-off-by: Alex Leong <[email protected]>
    adleong committed Feb 17, 2024
    Configuration menu
    Copy the full SHA
    5fbca8a View commit details
    Browse the repository at this point in the history
  4. Return INVALID_ARGUMENT status codes properly from GetProfile (#11980)

    When the destination controller receives a GetProfile request for an ExternalName service, it should return gRPC status code INVALID_ARGUMENT to signal to the proxy that ExternalName services do not have endpoints and service discovery should not be used.  However, the destination controller is returning gRPC status code UNKNOWN instead.  This causes the proxy to retry these requests, resulting in a flurry of warning logs in the destination controller.
    
    We fix the destination controller to properly return INVALID_ARGUMENT instead of UNKNOWN for ExternalName services.
    
    Signed-off-by: Alex Leong <[email protected]>
    adleong committed Feb 17, 2024
    Configuration menu
    Copy the full SHA
    2779d11 View commit details
    Browse the repository at this point in the history
  5. Templating prometheus url into cronjob heartbeat (#11343)

    Can not use external prometheus with hearbeat. This change adds a new variable `.prometheusUrl` in value and use it into heartbeat from linkerd-control-plane chart.
    
    Fixes #11342
    
    Signed-off-by: David ALEXANDRE <[email protected]>
    david972 authored and adleong committed Feb 17, 2024
    Configuration menu
    Copy the full SHA
    22a59ba View commit details
    Browse the repository at this point in the history
  6. Helm linkerd-controler-plane Allow Custom labels on Pod Monitor (#11222)

    Need to be able to set labels on Pod Monitors
    
    add a labels section to podMonitor
    
    Helm Lint/Helm Template
    
    Fixes #[11175]
    
    Signed-off-by: Justin S <[email protected]>
    jseiser authored and adleong committed Feb 17, 2024
    Configuration menu
    Copy the full SHA
    6d49f3b View commit details
    Browse the repository at this point in the history
  7. stable-2.14.10

    Signed-off-by: Alex Leong <[email protected]>
    adleong committed Feb 17, 2024
    Configuration menu
    Copy the full SHA
    e35896d View commit details
    Browse the repository at this point in the history

Commits on Feb 19, 2024

  1. Only process server updates on workloads affected by the server (#12017)

    We the destination controller's workload watcher receives an update for any Server resource, it recomputes opaqueness for every workload.  This is because the Server update may have changed opaqueness for that workload.  However, this is very CPU intensive for the destination controller, especially during resyncs when we get Server updates for every Server resource in the cluster.
    
    Instead, we only need to recompute opaqueness for workloads that are selected by the old version of the Server or by the new version of the Server.  If a workload is not selected by either the new or old version of the Server, then the Server update cannot have changed the workload's opaqueness.
    
    Signed-off-by: Alex Leong <[email protected]>
    adleong committed Feb 19, 2024
    Configuration menu
    Copy the full SHA
    09adb81 View commit details
    Browse the repository at this point in the history
  2. Fix typo

    Signed-off-by: Alex Leong <[email protected]>
    adleong committed Feb 19, 2024
    Configuration menu
    Copy the full SHA
    7867e55 View commit details
    Browse the repository at this point in the history
  3. update h2

    Signed-off-by: Alex Leong <[email protected]>
    adleong committed Feb 19, 2024
    Configuration menu
    Copy the full SHA
    c23b490 View commit details
    Browse the repository at this point in the history
  4. bump helm dep

    Signed-off-by: Alex Leong <[email protected]>
    adleong committed Feb 19, 2024
    Configuration menu
    Copy the full SHA
    e4b7416 View commit details
    Browse the repository at this point in the history
  5. upgrade helm dep

    Signed-off-by: Alex Leong <[email protected]>
    adleong committed Feb 19, 2024
    Configuration menu
    Copy the full SHA
    41628bb View commit details
    Browse the repository at this point in the history