Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracing downsampling #319

Merged
merged 5 commits into from
Oct 7, 2024
Merged

Tracing downsampling #319

merged 5 commits into from
Oct 7, 2024

Conversation

mmkay
Copy link
Contributor

@mmkay mmkay commented Sep 27, 2024

Issue

Grafana-agent-k8s charm passes all traces to the tracing backend as it doesn't have any downsampling configuration.

Solution

Add a sampling policy with three config variables, setting up sampling strategies for charm traces, workload traces and errors.

Context

Grafana agent uses tail sampling processor from opentelemetry-collector, its reference is here: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/v0.96.0/processor/tailsamplingprocessor

Testing Instructions

Run the following bundles:
Cos-lite + tempo model:

bundle: kubernetes
saas:
  remote-357a6f138c844be58d06481f7cc6a3d0: {}
applications:
  alertmanager:
    charm: alertmanager-k8s
    channel: latest/edge
    revision: 135
    base: [email protected]/stable
    resources:
      alertmanager-image: 98
    scale: 1
    constraints: arch=amd64
    storage:
      data: kubernetes,1,1024M
    trust: true
  catalogue:
    charm: catalogue-k8s
    channel: latest/edge
    revision: 63
    base: [email protected]/stable
    resources:
      catalogue-image: 34
    scale: 1
    options:
      description: "Canonical Observability Stack Lite, or COS Lite, is a light-weight,
        highly-integrated, \nJuju-based observability suite running on Kubernetes.\n"
      tagline: Model-driven Observability Stack deployed with a single command.
      title: Canonical Observability Stack
    constraints: arch=amd64
    trust: true
  grafana:
    charm: grafana-k8s
    channel: latest/edge
    revision: 118
    base: [email protected]/stable
    resources:
      grafana-image: 70
      litestream-image: 45
    scale: 1
    constraints: arch=amd64
    storage:
      database: kubernetes,1,1024M
    trust: true
  loki:
    charm: loki-k8s
    channel: latest/edge
    revision: 170
    base: [email protected]/stable
    resources:
      loki-image: 100
      node-exporter-image: 3
    scale: 1
    constraints: arch=amd64
    storage:
      active-index-directory: kubernetes,1,1024M
      loki-chunks: kubernetes,1,1024M
    trust: true
  prometheus:
    charm: prometheus-k8s
    channel: latest/edge
    revision: 212
    base: [email protected]/stable
    resources:
      prometheus-image: 150
    scale: 1
    constraints: arch=amd64
    storage:
      database: kubernetes,1,1024M
    trust: true
  tempo-k8s:
    charm: tempo-k8s
    channel: latest/edge
    revision: 83
    resources:
      tempo-image: 17
    scale: 1
    constraints: arch=amd64
    storage:
      data: kubernetes,1,1024M
  traefik:
    charm: traefik-k8s
    channel: latest/edge
    revision: 208
    base: [email protected]/stable
    resources:
      traefik-image: 160
    scale: 1
    constraints: arch=amd64
    storage:
      configurations: kubernetes,1,1024M
    trust: true
relations:
- - traefik:ingress-per-unit
  - prometheus:ingress
- - traefik:ingress-per-unit
  - loki:ingress
- - traefik:traefik-route
  - grafana:ingress
- - traefik:ingress
  - alertmanager:ingress
- - prometheus:alertmanager
  - alertmanager:alerting
- - grafana:grafana-source
  - prometheus:grafana-source
- - grafana:grafana-source
  - loki:grafana-source
- - grafana:grafana-source
  - alertmanager:grafana-source
- - loki:alertmanager
  - alertmanager:alerting
- - prometheus:metrics-endpoint
  - traefik:metrics-endpoint
- - prometheus:metrics-endpoint
  - alertmanager:self-metrics-endpoint
- - prometheus:metrics-endpoint
  - loki:metrics-endpoint
- - prometheus:metrics-endpoint
  - grafana:metrics-endpoint
- - grafana:grafana-dashboard
  - loki:grafana-dashboard
- - grafana:grafana-dashboard
  - prometheus:grafana-dashboard
- - grafana:grafana-dashboard
  - alertmanager:grafana-dashboard
- - catalogue:ingress
  - traefik:ingress
- - catalogue:catalogue
  - grafana:catalogue
- - catalogue:catalogue
  - prometheus:catalogue
- - catalogue:catalogue
  - alertmanager:catalogue
- - catalogue:catalogue
  - loki:catalogue
- - loki:logging
  - tempo-k8s:logging
- - loki:logging
  - traefik:logging
- - tempo-k8s:tracing
  - alertmanager:tracing
- - tempo-k8s:tracing
  - catalogue:tracing
- - tempo-k8s:grafana-dashboard
  - grafana:grafana-dashboard
- - tempo-k8s:grafana-source
  - grafana:grafana-source
- - tempo-k8s:tracing
  - grafana:tracing
- - tempo-k8s:tracing
  - loki:tracing
- - tempo-k8s:metrics-endpoint
  - prometheus:metrics-endpoint
- - tempo-k8s:tracing
  - prometheus:tracing
- - tempo-k8s:tracing
  - traefik:tracing
- - traefik:grafana-dashboard
  - grafana:grafana-dashboard
- - traefik:traefik-route
  - tempo-k8s:ingress
- - prometheus:receive-remote-write
  - remote-357a6f138c844be58d06481f7cc6a3d0:send-remote-write
- - tempo-k8s:tracing
  - remote-357a6f138c844be58d06481f7cc6a3d0:tracing
--- # overlay.yaml
applications:
  prometheus:
    offers:
      prom:
        endpoints:
        - receive-remote-write
        acl:
          admin: admin
  tempo-k8s:
    offers:
      tracing:
        endpoints:
        - tracing
        acl:
          admin: admin

Grafana-agent model:

bundle: kubernetes
saas:
  prom:
    url: microk8s:admin/test-sampling.prom
  tracing:
    url: microk8s:admin/test-sampling.tracing
applications:
  grafana-agent-k8s:
    charm: local:grafana-agent-k8s-4
    scale: 1
    options:
      workload_traces_sampling_percentage: 100
    constraints: arch=amd64
    storage:
      data: kubernetes,1,1024M
  self-signed-certificates:
    charm: self-signed-certificates
    channel: latest/edge
    revision: 192
    scale: 1
    constraints: arch=amd64
  tempo-coordinator-k8s:
    charm: tempo-coordinator-k8s
    channel: latest/edge
    revision: 12
    resources:
      nginx-image: 5
      nginx-prometheus-exporter-image: 3
    scale: 1
    constraints: arch=amd64
    storage:
      data: kubernetes,1,1024M
    trust: true
relations:
- - grafana-agent-k8s:send-remote-write
  - prom:receive-remote-write
- - grafana-agent-k8s:tracing
  - tracing:tracing
- - self-signed-certificates:tracing
  - grafana-agent-k8s:tracing-provider

You can configure charm_traces_sampling_percentage, workload_traces_sampling_percentage, error_traces_sampling_percentage via juju config grafana-agent-k8s ...

Upgrade Notes

@mmkay mmkay changed the title Downsampling Tracing downsampling Sep 27, 2024
@mmkay mmkay marked this pull request as ready for review September 27, 2024 10:31
@mmkay mmkay requested a review from a team as a code owner September 27, 2024 10:31
Copy link
Contributor

@PietroPasotti PietroPasotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly the same comments apply as the machine charm PR. I'd also copy this test over there.

@mmkay mmkay merged commit 32a2aad into main Oct 7, 2024
13 checks passed
@mmkay mmkay deleted the downsampling branch October 7, 2024 12:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants