Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kind cluster creation fails while Waiting for a healthy kubelet during init #3760

Open
fenic-fawkes opened this issue Oct 21, 2024 · 6 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@fenic-fawkes
Copy link

fenic-fawkes commented Oct 21, 2024

What happened:

kind create cluster fails Waiting for a healthy kubelet
full command: kind create cluster --retain --config kind-config.txt --wait 5m

What you expected to happen:

cluster should be created

Anything else we need to know?:

kind logs

Environment:

  • kind version: (use kind version): kind v0.24.0 go1.22.6 linux/amd64
  • Runtime info: (use docker info, podman info or nerdctl info):
    Client: Docker Engine - Community
    Version: 26.1.3
    Context: default
    Debug Mode: false
    Plugins:
    buildx: Docker Buildx (Docker Inc.)
    Version: v0.14.0
    Path: /usr/libexec/docker/cli-plugins/docker-buildx
    compose: Docker Compose (Docker Inc.)
    Version: v2.27.0
    Path: /usr/libexec/docker/cli-plugins/docker-compose

Server:
Containers: 17
Running: 17
Paused: 0
Stopped: 0
Images: 5
Server Version: 26.1.3
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 8b3b7ca2e5ce38e8f31a34f35b2b68ceb8470d89
runc version: v1.1.12-0-g51d5e94
init version: de40ad0
Security Options:
seccomp
Profile: builtin
Kernel Version: 4.18.0-553.22.1.el8_10.x86_64
Operating System: Red Hat Enterprise Linux 8.10 (Ootpa)
OSType: linux
Architecture: x86_64
CPUs: 40
Total Memory: 251.3GiB
Name: engdev4
ID: 9e0e4231-b7e3-432b-bc97-6a263007a3b0
Docker Root Dir: /data/docker
Debug Mode: false
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false

  • OS (e.g. from /etc/os-release): Red Hat Enterprise Linux 8.10 (Ootpa)
  • Kubernetes version: (use kubectl version): 1.31.0 (default image), 1.23.0 works fine
  • Any proxies or other special environment settings?: no
@fenic-fawkes fenic-fawkes added the kind/bug Categorizes issue or PR as related to a bug. label Oct 21, 2024
@stmcginnis
Copy link
Contributor

Is it possible to upgrade your system? 4.18 is a pretty old kernel version, and cgroupv1 support has been slowly going away. I haven't been able to look yet for a specific failure, but I have a feeling those could be two contributing factors to this.

@fenic-fawkes
Copy link
Author

unfortunately no, i'm stuck with this setup for the most part.

@stmcginnis
Copy link
Contributor

Hmm, it does look like it is cgroup related:

err="failed to initialize top level QOS containers: error validating root container [kubelet kubepods] : cgroup [\"kubelet\" \"kubepods\"] has some missing paths: /sys/fs/cgroup/systemd/kubelet.slice/kubelet-kubepods.slice"

@fenic-fawkes
Copy link
Author

what's different about k8s 1.23 vs 1.24? because 1.23.17 works while 1.24.17 does not

@BenTheElder
Copy link
Member

BenTheElder commented Oct 22, 2024

what's different about k8s 1.23 vs 1.24? because 1.23.17 works while 1.24.17 does not

It could be something like the runc version in kubelet, hard to say without a lot of digging.

... Both of those versions are similarly old enough to be out of support upstream in Kubernetes, kind's support is best-effort (we cannot backport anything to those, since we're not a fork, so that really limits our options and it's a lot to support).

https://kubernetes.io/releases/


Regarding RHEL 8 and 4.18 ... please see #3558

Can you use a VM with a newer OS/kernel if you can't alter the host?

Realistically the things we depend on like Kubernetes, containerd, runc are focused on cgroups v2 and more current distros for testing etc. We don't have the resources ourselves to spend a lot of time out-supporting those projects.

https://kubernetes.io/blog/2024/08/14/kubernetes-1-31-moving-cgroup-v1-support-maintenance-mode/

@stmcginnis
Copy link
Contributor

Looks like this is related to to an older distro. Anything more to do here from the project side of thing, or can we close this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

4 participants
@BenTheElder @stmcginnis @fenic-fawkes and others