Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prometheus-server pod getting crashloop back off error #2546

Open
prasanthcavli opened this issue Nov 8, 2024 · 1 comment
Open

prometheus-server pod getting crashloop back off error #2546

prasanthcavli opened this issue Nov 8, 2024 · 1 comment
Labels

Comments

@prasanthcavli
Copy link

prasanthcavli commented Nov 8, 2024

This is the pod log

Defaulted container "prometheus-server-configmap-reload" out of: prometheus-server-configmap-reload, prometheus-server
level=info ts=2024-11-11T05:34:37.595571992Z caller=main.go:137 msg="Starting prometheus-config-reloader" version="(version=0.70.0, branch=refs/tags/v0.70.0, revision=c2c673f7123f3745a2a982b4a2bdc43a11f50fad)"
level=info ts=2024-11-11T05:34:37.595624649Z caller=main.go:138 build_context="(go=go1.21.4, platform=linux/amd64, user=Action-Run-ID-7048794395, date=20231130-15:42:49, tags=unknown)"
level=info ts=2024-11-11T05:34:37.595943074Z caller=reloader.go:246 msg="reloading via HTTP"
level=info ts=2024-11-11T05:34:37.596019966Z caller=reloader.go:282 msg="started watching config file and directories for changes" cfg= out= dirs=/etc/config
level=error ts=2024-11-11T05:37:37.596706711Z caller=runutil.go:100 msg="function failed. Retrying in next tick" err="trigger reload: reload request failed: Post "http://127.0.0.1:9090/-/reload\": dial tcp 127.0.0.1:9090: connect: connection refused"
-->

What happened?
I have updated the EKS cluster from 1.28 to V1.29 and after that the Prometheus-server pod went to crash loop back off state.

Did you expect to see some different?

How to reproduce it (as minimally and precisely as possible):

Environment

  • Prometheus Operator version:
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    annotations:
    deployment.kubernetes.io/revision: "7"
    meta.helm.sh/release-name: prometheus
    meta.helm.sh/release-namespace: prometheus
    creationTimestamp: "2024-11-08T07:03:40Z"
    generation: 9
    labels:
    app.kubernetes.io/component: server
    app.kubernetes.io/instance: prometheus
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: prometheus
    app.kubernetes.io/part-of: prometheus
    app.kubernetes.io/version: v2.48.1
    helm.sh/chart: prometheus-25.8.2
    name: prometheus-server
    namespace: prometheus
    resourceVersion: "156438078"
    uid: 6b46deb7-981a-427a-b30e-08ddfd593fee
    spec:
    progressDeadlineSeconds: 600
    replicas: 1
    revisionHistoryLimit: 10
    selector:
    matchLabels:
    app.kubernetes.io/component: server
    app.kubernetes.io/instance: prometheus
    app.kubernetes.io/name: prometheus
    strategy:
    type: Recreate
    template:
    metadata:
    annotations:
    kubectl.kubernetes.io/restartedAt: "2024-08-21T12:10:15Z"
    creationTimestamp: null
    labels:
    app.kubernetes.io/component: server
    app.kubernetes.io/instance: prometheus
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: prometheus
    app.kubernetes.io/part-of: prometheus
    app.kubernetes.io/version: v2.48.1
    helm.sh/chart: prometheus-25.8.2
    spec:
    affinity:
    podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
    matchExpressions:
    - key: hubble.nodeType.Nats
    operator: In
    values:
    - Allowed
    topologyKey: failure-domain.beta.kubernetes.io/zone
    containers:
    - args:
    - --watched-dir=/etc/config
    - --reload-url=http://127.0.0.1:9090/-/reload
    image: quay.io/prometheus-operator/prometheus-config-reloader:v0.70.0
    imagePullPolicy: IfNotPresent
    name: prometheus-server-configmap-reload
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /etc/config
    name: config-volume
    readOnly: true
    - args:
    - --storage.tsdb.retention.time=15d
    - --config.file=/etc/config/prometheus.yml
    - --storage.tsdb.wal-compression
    - --web.console.libraries=/etc/prometheus/console_libraries
    - --web.console.templates=/etc/prometheus/consoles
    - --web.enable-lifecycle
    image: quay.io/prometheus/prometheus:v2.51.2
    imagePullPolicy: IfNotPresent
    livenessProbe:
    failureThreshold: 3
    httpGet:
    path: /-/healthy
    port: 9090
    scheme: HTTP
    initialDelaySeconds: 90
    periodSeconds: 15
    successThreshold: 1
    timeoutSeconds: 10
    name: prometheus-server
    ports:
    - containerPort: 9090
    protocol: TCP
    readinessProbe:
    failureThreshold: 3
    httpGet:
    path: /-/ready
    port: 9090
    scheme: HTTP
    initialDelaySeconds: 90
    periodSeconds: 5
    successThreshold: 1
    timeoutSeconds: 4
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /etc/config
    name: config-volume
    - mountPath: /data
    name: storage-volume
    dnsPolicy: ClusterFirst
    enableServiceLinks: true
    nodeSelector:
    hubble.nodeType.Nats: Allowed
    restartPolicy: Always
    schedulerName: default-scheduler
    securityContext:
    fsGroup: 65534
    runAsGroup: 65534
    runAsNonRoot: true
    runAsUser: 65534
    serviceAccount: prometheus-server
    serviceAccountName: prometheus-server
    terminationGracePeriodSeconds: 300
    volumes:
    - configMap:
    defaultMode: 420
    name: prometheus-server
    name: config-volume
    - name: storage-volume
    persistentVolumeClaim:
    claimName: prometheus-server

  • Kubernetes version information:

    kubectl version
    Client Version: v1.30.1
    Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
    Server Version: v1.29.8-eks-a737599

kind: Deployment


* Prometheus Logs:

level=error ts=2024-11-08T06:19:48.406640681Z caller=runutil.go:100 msg="function failed. Retrying in next tick" err="trigger reload: reload request failed: Post "http://127.0.0.1:9090/-/reload\": dial tcp 127.0.0.1:9090: connect: connection refused"


**Anything else we need to know?**:
@prasanthcavli
Copy link
Author

@lilic Can you please help ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant