Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

field enforce_metric_name not found using latest loki 3.0.0 #12594

Closed
fragolinux opened this issue Apr 12, 2024 · 12 comments
Closed

field enforce_metric_name not found using latest loki 3.0.0 #12594

fragolinux opened this issue Apr 12, 2024 · 12 comments
Labels
area/helm type/bug Somehing is not working as expected upgrade

Comments

@fragolinux
Copy link

fragolinux commented Apr 12, 2024

Describe the bug
deployed loki v3.0.0 and got in its pods the error:

failed parsing config: /etc/loki/config/config.yaml: yaml: unmarshal errors:
  line 14: field enforce_metric_name not found in type validation.plain. Use `-config.expand-env=true` flag if you want to expand environment variables in your config file

To Reproduce
Steps to reproduce the behavior:

  1. Installed Loki 3.0.0 via helm chart
  2. pods logs get the error above

Expected behavior
pods starting up

Environment:

  • Infrastructure: kubernetes cluster 1.28.6
  • Deployment tool: helm

Screenshots, Promtail config, or terminal output

used this flux helmrelease:

apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: ${release}
  namespace: ${namespace}
spec:
  interval: 1m
  timeout: 10m
  releaseName: ${release}
  targetNamespace: ${namespace}
  test:
    enable: false # disable tests for now
    timeout: 10m
  chart:
    spec:
      # see https://github.com/grafana/loki/blob/main/production/helm/loki
      chart: loki
      version: "3.0.0"
      sourceRef:
        kind: HelmRepository
        name: loki
      interval: 24h
  install:
    remediation:
      retries: 3
      remediateLastFailure: false
  upgrade:
    remediation:
      retries: 3
      remediateLastFailure: false
  values:
    deploymentMode: SimpleScalable

    backend:
      replicas: 3
    read:
      replicas: 3
      persistence:
        storageClass: lab-nfs-csi-test
    write:
      replicas: 3
      persistence:
        storageClass: lab-nfs-csi-test

    # Zero out replica counts of other deployment modes
    singleBinary:
      replicas: 0

    ingester:
      replicas: 0
    querier:
      replicas: 0
    queryFrontend:
      replicas: 0
    queryScheduler:
      replicas: 0
    distributor:
      replicas: 0
    compactor:
      replicas: 0
    indexGateway:
      replicas: 0
    bloomCompactor:
      replicas: 0
    bloomGateway:
      replicas: 0

    monitoring:
      selfMonitoring:
        enabled: false
        grafanaAgent:
          installOperator: false
        podLogs:
          apiVersion: monitoring.grafana.com/v1alpha2

    minio:
      enabled: true
      persistence:
        enabled: true
        storageClass: lab-nfs-csi-test
        size: 16Gi

      replicas: 4
      gateway:
        replicas: 4

    loki:
      image:
        tag: 3.0.0
      schemaConfig:
        configs:
          - from: "2024-04-01"
            store: tsdb
            object_store: s3
            schema: v13
            index:
              prefix: loki_index_
              period: 24h
      ingester:
        chunk_encoding: snappy
      tracing:
        enabled: true
      querier:
        # Default is 4, if you have enough memory and CPU you can increase, reduce if OOMing
        max_concurrent: 4

strangely we had to specify this, too, as helm previously complained about missing or wrong version crds (it complained about missing podLogs v1alpha1, which by default should be disabled, as under selfmonitoring section)... why is this needed if by default it's disabled???

    monitoring:
      selfMonitoring:
        enabled: false
        grafanaAgent:
          installOperator: false
        podLogs:
          apiVersion: monitoring.grafana.com/v1alpha2

an other problem of the actual chart is that it still tries to install loki with image version 2.6.1, which causes crashloop... fixing image 3.0.0 in the helmrelease, fixed the problem

@andrewgkew
Copy link

@fragolinux I am getting the same error, guess the field has been removed?

@fragolinux
Copy link
Author

Let's wait for help

@andrewgkew
Copy link

Could be related: #2096

@andrewgkew
Copy link

Loki 2.9
image

Loki 3.0
image

think its safe to say field has been removed

@JStickler JStickler added area/helm type/bug Somehing is not working as expected upgrade labels Apr 15, 2024
@JStickler
Copy link
Contributor

FYI the config option enforce_metric_name was removed in #11225. If you've got it set as part of your Loki configuration, you should remove it and see if that gets you unblocked.

@andrewgkew
Copy link

@JStickler thanks for the confirmation. I can confirm that removing this field does solve the issue. It would be nice if this was part of your helm chart upgrade page.

Are there any other fields like this we should be aware of?

@fragolinux
Copy link
Author

FYI the config option enforce_metric_name was removed in #11225. If you've got it set as part of your Loki configuration, you should remove it and see if that gets you unblocked.

problem is that this is a brand new deploy, on a clean cluster... we had to change loki image from 2.6.1 to 3.0.0 as 2.6.1 (default in chart) caused crashbootloop, and that parameter was in the config as put there by the chart, so how to "remove" it?

@JStickler
Copy link
Contributor

JStickler commented Apr 16, 2024

@andrewgkew

I can confirm that removing this field does solve the issue.

Thanks for confirming.

It would be nice if this was part of your helm chart upgrade page.

I will add it to my To Do list.

Are there any other fields like this we should be aware of?

A lot of the work that was done late last summer and autumn involved removing unused or deprecated code. Everything that was thought to be a breaking change was documented in the upgrade guide. This must have been missed somehow.

@fragolinux

that parameter was in the config as put there by the chart, so how to "remove" it?

Can you manually edit your loki.yaml file to remove the enforce_metric_name configuration?

@fragolinux
Copy link
Author

But as everything is managed by flux, if I do so, on next of its reconcile loops, it will put it back...

@fragolinux
Copy link
Author

i removed that line from the "loki" configmap, and now i get this in the logs of the read pods, while the write ones seem now ok:

query filtering for deletes requires 'compactor_grpc_address' or 'compactor_address' to be configured
error initialising module: cache-generation-loader
github.com/grafana/dskit/modules.(*Manager).initModule
    /src/loki/vendor/github.com/grafana/dskit/modules/modules.go:138
github.com/grafana/dskit/modules.(*Manager).InitModuleServices
    /src/loki/vendor/github.com/grafana/dskit/modules/modules.go:108
github.com/grafana/loki/v3/pkg/loki.(*Loki).Run
    /src/loki/pkg/loki/loki.go:453
main.main
    /src/loki/cmd/loki/main.go:122
runtime.main
    /usr/local/go/src/runtime/proc.go:267
runtime.goexit
    /usr/local/go/src/runtime/asm_amd64.s:1650
level=info ts=2024-04-17T08:42:21.205251478Z caller=main.go:120 msg="Starting Loki" version="(version=3.0.0, branch=HEAD, revision=b4f7181c7a)"
level=error ts=2024-04-17T08:42:21.20541848Z caller=log.go:216 msg="error running loki" err="query filtering for deletes requires 'compactor_grpc_address' or 'compactor_address' to be configured\nerror initialising module: cache-generation-loader\ngithub.com/grafana/dskit/modules.(*Manager).initModule\n\t/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:138\ngithub.com/grafana/dskit/modules.(*Manager).InitModuleServices\n\t/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:108\ngithub.com/grafana/loki/v3/pkg/loki.(*Loki).Run\n\t/src/loki/pkg/loki/loki.go:453\nmain.main\n\t/src/loki/cmd/loki/main.go:122\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:267\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650"
Stream closed EOF for loki/loki-read-0 (read)

always using loki 3.0.0 with latest chart, of course... and btw, i can't just remove that line, i had to suspend flux to prevent reconcile, but i cant' leave it that way, field should not be added to that cm by the chart itself...

@fragolinux
Copy link
Author

sorry, found the problem... we were using the chart v3.0.0, with the values of the latest one... here our working helmrelease, with chart v6.2.0... we can close this, thanks!

apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: ${release}
  namespace: ${namespace}
spec:
  interval: 1m
  timeout: 10m
  releaseName: ${release}
  targetNamespace: ${namespace}
  test:
    enable: false # disable tests for now
    timeout: 10m
  chart:
    spec:
      # see https://github.com/grafana/loki/blob/main/production/helm/loki
      chart: loki
      version: "6.2.0"
      sourceRef:
        kind: HelmRepository
        name: loki
      interval: 24h
  install:
    remediation:
      retries: 3
      remediateLastFailure: false
  upgrade:
    remediation:
      retries: 3
      remediateLastFailure: false
  values:
    deploymentMode: SimpleScalable

    backend:
      replicas: 3
      persistence:
        storageClass: lab-nfs-csi-test
    read:
      replicas: 3
      persistence:
        storageClass: lab-nfs-csi-test
    write:
      replicas: 3
      persistence:
        storageClass: lab-nfs-csi-test

    # Zero out replica counts of other deployment modes
    singleBinary:
      replicas: 0

    ingester:
      replicas: 0
    querier:
      replicas: 0
    queryFrontend:
      replicas: 0
    queryScheduler:
      replicas: 0
    distributor:
      replicas: 0
    compactor:
      replicas: 0
    indexGateway:
      replicas: 0
    bloomCompactor:
      replicas: 0
    bloomGateway:
      replicas: 0

    monitoring:
      selfMonitoring:
        enabled: false
        grafanaAgent:
          installOperator: false
        podLogs:
          apiVersion: monitoring.grafana.com/v1alpha2

    minio:
      enabled: true
      persistence:
        enabled: true
        storageClass: lab-nfs-csi-test
        size: 16Gi

      replicas: 4
      gateway:
        replicas: 4

    loki:
      schemaConfig:
        configs:
          - from: "2024-04-01"
            store: tsdb
            object_store: s3
            schema: v13
            index:
              prefix: loki_index_
              period: 24h
      ingester:
        chunk_encoding: snappy
      tracing:
        enabled: true
      querier:
        # Default is 4, if you have enough memory and CPU you can increase, reduce if OOMing
        max_concurrent: 4

    # old config used into LDO
    # loki:
      # from: https://github.com/grafana/loki/issues/1258#issuecomment-553486832
      # limits_config:
      #   retention_period: 7d # Keep 7 days
      # compactor:
      #   delete_request_cancel_period: 10m # don't wait 24h before processing the delete_request
      #   retention_enabled: true # actually do the delete
      #   retention_delete_delay: 2h # wait 2 hours before actually deleting stuff
      # table_manager:
      #   retention_deletes_enabled: true
      #   retention_period: 168h # one week log retention
      # config:
      #   ingester:
      #     wal:
      #       dir: /data/loki/wal
      #       enabled: true
      #     lifecycler:
      #       address: 127.0.0.1
      #       ring:
      #         kvstore:
      #           store: inmemory
      #         replication_factor: 1
      #       final_sleep: 0s
      #     chunk_idle_period: 24h       # Any chunk not receiving new logs in this time will be flushed
      #     max_chunk_age: 24h           # All chunks will be flushed when they hit this age, default is 1h
      #     chunk_target_size: 1048576  # Loki will attempt to build chunks up to 1MB, flushing first if chunk_idle_period or max_chunk_age is reached first
      #     chunk_retain_period: 5m    # Must be greater than index read cache TTL if using an index cache (Default index read cache TTL is 5m)
      #     max_transfer_retries: 0     # Chunk transfers disabled
      # chunk_store_config:
      #   max_look_back_period: 168h

@fragolinux
Copy link
Author

solution found, above, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/helm type/bug Somehing is not working as expected upgrade
Projects
None yet
Development

No branches or pull requests

3 participants