Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

errors in logs - no graphs showing #2304

Closed
miramar-labs opened this issue Aug 24, 2017 · 39 comments
Closed

errors in logs - no graphs showing #2304

miramar-labs opened this issue Aug 24, 2017 · 39 comments

Comments

@miramar-labs
Copy link

miramar-labs commented Aug 24, 2017

Using in-cluster config to connect to apiserver
Using service account token for csrf signing
No request provided. Skipping authorization header
Successful initial request to the apiserver, version: v1.7.0
No request provided. Skipping authorization header
Creating remote Heapster client for http://heapster.kube-system:80
Could not enable metric client: Health check failed: Get http://heapster.kube-system:80/healthz: dial tcp 10.100.9.151:80: getsockopt: connection refused. Continuing.

...


Getting list of all pods in the cluster
[restful] 2017/08/24 06:43:41 log.go:26: No metric client provided. Skipping metrics.
[restful] 2017/08/24 06:43:41 log.go:26: No metric client provided. Skipping metrics.```
@maciaszczykm
Copy link
Member

Which Dashboard version are you using? Can you confirm, that Heapster was up and running before Dashboard startup?

@miramar-labs
Copy link
Author

this is what my script does:

kubectl create -f https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/influxdb/grafana.yaml

kubectl create -f https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/influxdb/heapster.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/rbac/heapster-rbac.yaml

# install dashboard
kubectl create -f https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/kubernetes-dashboard.yaml```

@maciaszczykm
Copy link
Member

Try to restart Dashboard pods.

@miramar-labs
Copy link
Author

but there are more problems ... influxdb/grafana.yaml and influxdb/heapster.yaml contain references to images that don't exist in the google repository .... so I have to use older local versions
I will try restarting the dashboard pods... and maybe putting a delay between heapster and dashboard?

@miramar-labs
Copy link
Author

miramar-labs commented Aug 24, 2017

containers:
      - name: grafana
        image: gcr.io/google_containers/heapster-grafana-amd64:v4.4.3```  
and
```      containers:
      - name: influxdb
        image: gcr.io/google_containers/heapster-influxdb-amd64:v1.3.3```

neither of which are in the google repo??

@maciaszczykm
Copy link
Member

Grafana and Influx are out of our scope. You should go to their repositories and create issues there.

I can help you with telling what works for me. I am using the following file:

https://github.com/maciaszczykm/k8s-tools/blob/master/yamls/heapster-deployment.yaml

Some versions are pretty old, so you might want to update them.

@maciaszczykm
Copy link
Member

Is Dashboard connecting to Heapster after restart?

@miramar-labs
Copy link
Author

i'm having trouble restarting the dashboard pod - I delete it and then kubectl create -f https:// ... and it says it already exists

@maciaszczykm
Copy link
Member

Just delete it. Kubernetes will restart it on its own.

@miramar-labs
Copy link
Author

tried adding a pause - no joy

Could not enable metric client: Health check failed: an error on the server ("Error: 'dial tcp 10.244.0.8:8082: getsockopt: no route to host'\nTrying to reach: 'http://10.244.0.8:8082/healthz'") has prevented the request from succeeding (get services heapster). Continuing.```

@maciaszczykm
Copy link
Member

Networking seems to be not working. What do you use? Can you paste results of kubectl get pods --all-namespaces?

@miramar-labs
Copy link
Author

flannel .. plus I have 3 master HA setup ...

@miramar-labs
Copy link
Author


NAME                        STATUS                     AGE       VERSION
master1.lan.foo.com   Ready,SchedulingDisabled   5m        v1.7.4
master2.lan.foo.com   Ready,SchedulingDisabled   2m        v1.7.4
master3.lan.foo.com   Ready,SchedulingDisabled   2m        v1.7.4
node1.lan.foo.com     Ready                      1m        v1.7.4
node2.lan.foo.com     Ready                      1m        v1.7.4
node3.lan.foo.com     Ready                      1m        v1.7.4
node4.lan.foo.com     Ready                      1m        v1.7.4
node5.lan.foo.com     Ready                      1m        v1.7.4
node6.lan.foo.com     Ready                      1m        v1.7.4
node7.lan.foo.com     Ready                      1m        v1.7.4
node8.lan.foo.com     Ready                      1m        v1.7.4

NAMESPACE     NAME                                                READY     STATUS    RESTARTS   AGE       IP             NODE
kube-system   heapster-603813915-mrmc2                            1/1       Running   2          4m        10.244.0.9     master1.lan.foo.com
kube-system   kube-apiserver-master1.lan.foo.com            1/1       Running   2          3m        192.168.0.41   master1.lan.foo.com
kube-system   kube-apiserver-master2.lan.foo.com            1/1       Running   0          1m        192.168.0.42   master2.lan.foo.com
kube-system   kube-apiserver-master3.lan.foo.com            1/1       Running   0          1m        192.168.0.43   master3.lan.foo.com
kube-system   kube-controller-manager-master1.lan.foo.com   1/1       Running   3          4m        192.168.0.41   master1.lan.foo.com
kube-system   kube-controller-manager-master2.lan.foo.com   1/1       Running   0          1m        192.168.0.42   master2.lan.foo.com
kube-system   kube-controller-manager-master3.lan.foo.com   1/1       Running   0          2m        192.168.0.43   master3.lan.foo.com
kube-system   kube-dns-2425271678-qwbwm                           3/3       Running   4          4m        10.244.0.7     master1.lan.foo.com
kube-system   kube-flannel-ds-2cwdz                               2/2       Running   0          2m        192.168.0.43   master3.lan.foo.com
kube-system   kube-flannel-ds-2h7jg                               2/2       Running   1          1m        192.168.0.51   node1.lan.foo.com
kube-system   kube-flannel-ds-31x4l                               2/2       Running   0          1m        192.168.0.54   node4.lan.foo.com
kube-system   kube-flannel-ds-38kgc                               2/2       Running   0          2m        192.168.0.42   master2.lan.foo.com
kube-system   kube-flannel-ds-889h9                               2/2       Running   1          1m        192.168.0.55   node5.lan.foo.com
kube-system   kube-flannel-ds-b7j10                               2/2       Running   3          4m        192.168.0.41   master1.lan.foo.com
kube-system   kube-flannel-ds-fr16n                               2/2       Running   0          1m        192.168.0.57   node7.lan.foo.com
kube-system   kube-flannel-ds-hdw83                               2/2       Running   1          1m        192.168.0.56   node6.lan.foo.com
kube-system   kube-flannel-ds-scmj2                               2/2       Running   0          1m        192.168.0.52   node2.lan.foo.com
kube-system   kube-flannel-ds-v6h1q                               2/2       Running   0          1m        192.168.0.58   node8.lan.foo.com
kube-system   kube-flannel-ds-z9b5j                               2/2       Running   0          1m        192.168.0.53   node3.lan.foo.com
kube-system   kube-proxy-30p2c                                    1/1       Running   0          31s       192.168.0.54   node4.lan.foo.com
kube-system   kube-proxy-4h0hh                                    1/1       Running   0          33s       192.168.0.53   node3.lan.foo.com
kube-system   kube-proxy-71r35                                    1/1       Running   0          33s       192.168.0.55   node5.lan.foo.com
kube-system   kube-proxy-77j1t                                    1/1       Running   0          31s       192.168.0.58   node8.lan.foo.com
kube-system   kube-proxy-7f651                                    1/1       Running   0          32s       192.168.0.56   node6.lan.foo.com
kube-system   kube-proxy-dlzzk                                    1/1       Running   0          33s       192.168.0.51   node1.lan.foo.com
kube-system   kube-proxy-gbcmj                                    1/1       Running   0          33s       192.168.0.57   node7.lan.foo.com
kube-system   kube-proxy-krh3k                                    1/1       Running   0          33s       192.168.0.42   master2.lan.foo.com
kube-system   kube-proxy-mt1ld                                    1/1       Running   0          32s       192.168.0.52   node2.lan.foo.com
kube-system   kube-proxy-r83dc                                    1/1       Running   0          33s       192.168.0.41   master1.lan.foo.com
kube-system   kube-proxy-tqd7m                                    1/1       Running   0          32s       192.168.0.43   master3.lan.foo.com
kube-system   kube-scheduler-master1.lan.foo.com            1/1       Running   3          3m        192.168.0.41   master1.lan.foo.com
kube-system   kube-scheduler-master2.lan.foo.com            1/1       Running   0          1m        192.168.0.42   master2.lan.foo.com
kube-system   kube-scheduler-master3.lan.foo.com            1/1       Running   0          1m        192.168.0.43   master3.lan.foo.com
kube-system   kubernetes-dashboard-3313488171-s9jfq               1/1       Running   2          4m        10.244.0.8     master1.lan.foo.com
kube-system   monitoring-grafana-3781511810-dj9z6                 1/1       Running   1          4m        10.244.0.11    master1.lan.foo.com
kube-system   monitoring-influxdb-1870447071-0gcw2                1/1       Running   1          4m        10.244.0.10    master1.lan.foo.com

member 578f889c8a11ad5d is healthy: got healthy result from http://192.168.0.43:2379
member 5d1c912230424477 is healthy: got healthy result from http://192.168.0.42:2379
member a6e4b27d7a9873b1 is healthy: got healthy result from http://192.168.0.41:2379
cluster is healthy

@floreks
Copy link
Member

floreks commented Aug 24, 2017

How is it possible that you have so many nodes and only 2 flannel replicas? Flannel has to be deployed on every node in order to enable communication betwenn pods on different nodes.

@miramar-labs
Copy link
Author

miramar-labs commented Aug 24, 2017

not sure what you mean? there's a kube-flannel pod on every node... (3 masters 8 workers)

@floreks
Copy link
Member

floreks commented Aug 24, 2017

Ahh... yes my fault. I have looked at wrong number. Do you have any pod with working bash or sh? You could exec into it kubectl exec -n <namespace> <pod_name> /bin/bash and check if you can actually curl heapster curl http(s)://heapster.<namespace>.svc.<domain_name>. Default domain is cluster.local.

@floreks
Copy link
Member

floreks commented Aug 24, 2017

Additionally check out this tutorial. Deploy busybox and check if nslookup will resolve successfully.

@miramar-labs
Copy link
Author

miramar-labs commented Aug 24, 2017

going through the tutorial .. spotted this:
kubectl logs --namespace=kube-system $(kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name) -c sidecar

ERROR: logging before flag.Parse: I0824 07:58:21.400542      13 main.go:48] Version v1.14.3-4-gee838f6
ERROR: logging before flag.Parse: I0824 07:58:21.400584      13 server.go:45] Starting server (options {DnsMasqPort:53 DnsMasqAddr:127.0.0.1 DnsMasqPollIntervalMs:5000 Probes:[{Label:kubedns Server:127.0.0.1:10053 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:1} {Label:dnsmasq Server:127.0.0.1:53 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:1}] PrometheusAddr:0.0.0.0 PrometheusPort:10054 PrometheusPath:/metrics PrometheusNamespace:kubedns})
ERROR: logging before flag.Parse: I0824 07:58:21.400608      13 dnsprobe.go:75] Starting dnsProbe {Label:kubedns Server:127.0.0.1:10053 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:1}
ERROR: logging before flag.Parse: I0824 07:58:21.400645      13 dnsprobe.go:75] Starting dnsProbe {Label:dnsmasq Server:127.0.0.1:53 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:1}```

@floreks
Copy link
Member

floreks commented Aug 24, 2017

If it's not working then i'd recommend reaching out to kubernetes core. They have better knowledge about networking then me. I can help with checking out cluster overall health and if everything is working that is required to run dashboard.

@miramar-labs
Copy link
Author

I think that ERROR is benign .. what follows it each time is just an I**** informational log line...

@miramar-labs
Copy link
Author

ok this is really weird ... I haven't done anything except go through the busybox stuff .. and i just reloaded the dashboard in the browser and the missing CPU/Memory Usage charts are showing up !
what on earth is going on?

@floreks
Copy link
Member

floreks commented Aug 24, 2017

Well after starting dashboard and heapster you have to wait few minutes in order for heapster to scrape some metrics. Dashboard waits until it has enough data to actually display graphs.

@miramar-labs
Copy link
Author

my pods are all about 42 minutes old ... the graphs showed up about 5 mins ago .. is that normal?

@floreks
Copy link
Member

floreks commented Aug 24, 2017

It usually takes up to 5 min to get enough data to show graphs. Maybe dashboard pod restarted? If health check to heapster fails when dashboard starts then it is disabled. I don't see how else it could show graphs if nothing was there before.

@miramar-labs
Copy link
Author

if health check fails .. does it ever re-check during the lifetime of that dashboard pod? or is that it .. only a pod restart would re-run the check?

@floreks
Copy link
Member

floreks commented Aug 24, 2017

Only pod restart will work for now. We've had to change that behavior because some people reported that if heapster was not accessible and we constantly checked for it then dashboard load time increased. We will add settings page that will allow to manually enable/disable supported integrations.

@miramar-labs
Copy link
Author

why not just set up a timer and check every few minutes? how could that affect load time?

@floreks
Copy link
Member

floreks commented Aug 24, 2017

Because when there would be many replicas timers would have to be synced or there could be a situation where service proxies you to one replica where there are graphs and then it proxies you to different replicas where graphs are still disabled.

@maciaszczykm
Copy link
Member

Solved.

@miramar-labs
Copy link
Author

how is this solved?

@maciaszczykm
Copy link
Member

how is this solved?

and i just reloaded the dashboard in the browser and the missing CPU/Memory Usage charts are showing up

I think it tells us that issue "no graphs showing" no longer occurs.

It is strange, that it took so long though. How many resources do you have for all these nodes (maybe cluster performance is low)?

@miramar-labs
Copy link
Author

i'm running everything on an ESXi host, which in turn is a MacPro6,1, 64GB ram, 1TB flash storage, 6TB iSCSI NAS storage .. ESXi host is hardly stressed ... I have scripts that generate this cluster from scratch, so I can tear it down, bring it up over and over ... and replicate this all day long. It is a mystery to me as to what tickled what to make those graphs suddenly show up after 30mins or so that one time .. so from my point of view this is far from closed...

@floreks
Copy link
Member

floreks commented Aug 24, 2017

I'm sure that they did not just suddenly show up. Something had to trigger it and it was probably restart of dashboard pod. After restart it checked again if heapster is available and after successful connection metrics showed up.

@floreks
Copy link
Member

floreks commented Aug 24, 2017

We have opened an issue to improve heapster check #2306.

@miramar-labs
Copy link
Author

right but I didn't delete the pod or restart it .. so how did that happen.. that's what i'm trying to figure out.
Q: can the dashboard pod run on any node or does it have to run on a master?

@floreks
Copy link
Member

floreks commented Aug 24, 2017

It can run on any node. It might have crashed and restarted automatically.

@miramar-labs
Copy link
Author

fyi: I switched back to single-master setup - nothing fancy .. and still see the same thing ...

Could not enable metric client: Health check failed: an error on the server ("unknown") has prevented the request from succeeding (get services heapster). Continuing.

@miramar-labs
Copy link
Author

miramar-labs commented Aug 24, 2017

1 master, 3 workers, flannel : ```
kube-system etcd-master1.lan.foo.com 1/1 Running 0 1m
kube-system heapster-603813915-fvmdg 1/1 Running 0 2m
kube-system kube-apiserver-master1.lan.foo.com 1/1 Running 0 1m
kube-system kube-controller-manager-master1.lan.foo.com 1/1 Running 0 1m
kube-system kube-dns-2425271678-71s6d 3/3 Running 0 2m
kube-system kube-flannel-ds-24386 2/2 Running 0 2m
kube-system kube-flannel-ds-2w579 2/2 Running 0 2m
kube-system kube-flannel-ds-tfxwf 2/2 Running 0 2m
kube-system kube-flannel-ds-vnv2q 2/2 Running 0 2m
kube-system kube-proxy-1dwfc 1/1 Running 0 2m
kube-system kube-proxy-4t2vz 1/1 Running 0 2m
kube-system kube-proxy-7sfrx 1/1 Running 0 2m
kube-system kube-proxy-pj3lp 1/1 Running 0 2m
kube-system kube-scheduler-master1.lan.foo.com 1/1 Running 0 1m
kube-system kubernetes-dashboard-3313488171-1nclh 1/1 Running 0 2m
kube-system monitoring-grafana-3781511810-hfpkq 1/1 Running 0 2m
kube-system monitoring-influxdb-1870447071-1dr4v 1/1 Running 0 2m

@yinwoods
Copy link

yinwoods commented May 10, 2018

@miramar-labs same problem here, solved after I set heapster service type to NodePort, then set "- --heapster-host=http://hostip:nodePort" in dashboard.yaml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants