-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to use cgroupfs
as the cgroupDriver
?
#3700
Comments
Is there something pointing to cgroupfs as the issue here? I'm not 100% sure yaml anchors are supported. Or whether you need the config patches. I would start by simplifying things and just trying to create a single node cluster with default settings and see if there is an issue with your docker configuration or something else in your environment before adding multiple nodes and extra configuration. If that fails, it would be useful to try again with |
The cgroup driver has to match in the CRI implementation (containerd here) and in kubelet. Why are you using cgroupfs? KIND is pretty sensitive to cgroup configurations and we don't test with this. |
I thought the line above indicated this.
There is nothing wrong with my Docker environmental, I believe. I simply changed:
to
I can do that but I shared a log statement indicating that it thinks
mmm ... I am using the
Are you indicated that this is perhaps the error?
I am deploying Slurm in Kubernetes and it uses |
That's on your host. The configuration in kind nodes for both containerd and kubelet has to match, you're only patching kubelet in kind and docker on your host. Re: nvidia-container-runtime, checkout https://github.com/klueska/nvkind We're looking into CDI but there are some complications with kind (#3290) and with the nvkind guide you can use GPUs with kind as-is.
That log statement is useless, it's just kubeadm giving suggestions as to why kubelet might not have started, it doesn't say anything about why it actually didn't start. It's a generic hint. We cannot debug this without providing the exporting logs, but I can already tell you from your configuration that containerd is not being configured for cgroupfs while kubelet is, which will not work. kind uses systemd for the cgroup driver, as recommended by SIG node.
cgroups != cgroupfs, systemd cgroup driver still uses cgroup .. I don't work with Slurm, but skimming that page I don't see where it can't work under systemd, I'd recommend enabling cgroup v2 unified. |
There's an example here of patching containerd config https://kind.sigs.k8s.io/docs/user/local-registry/ but we do not test or support cgroupfs mode, so I'm not planning to add a guide for this in the docs, as it will increase support issues for something 99.99% of users should not do and their applications / kubernetes usage should not be aware of, kind / kubernetes / systemd manages the cgroups and we have to employ some workarounds to make this work properly. |
Info
Config
Logs
These are note worthy logs:
What gives? Why can't I use
cgroupfs
?The text was updated successfully, but these errors were encountered: