Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deployment vcluster KO in Kubernetes with noexec for emptyDir #1717

Open
antoinetran opened this issue Apr 24, 2024 · 2 comments
Open

deployment vcluster KO in Kubernetes with noexec for emptyDir #1717

antoinetran opened this issue Apr 24, 2024 · 2 comments

Comments

@antoinetran
Copy link

What happened?

In an environment where any emptyDir is mounted to a partition in host, with noexec, vcluster create will give:

12:07:17 warn Pod my-vcluster-795748b48b-gzbvb: Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "
/binaries/vcluster": permission denied: unknown (Failed)

After editing the pod for debug with strace:

/ # /binaries/vcluster
sh: /binaries/vcluster: Permission denied
/ # strace /binaries/vcluster
execve("/binaries/vcluster", ["/binaries/vcluster"], [/* 27 vars */]) = -1 EACCES (Permission denied)
writev(2, [{iov_base="strace: exec: Permission denied", iov_len=31}, {iov_base="\n", iov_len=1}], 2strace: exec: Permission denied
) = 32
writev(2, [{iov_base="", iov_len=0}, {iov_base=NULL, iov_len=0}], 2) = 0
getpid()                                = 18
exit_group(1)                           = ?
+++ exited with 1 +++

If copied to /tmp, vcluster works.

Mount command gives:

# for /tmp
mount | grep "on / "
overlay on / type overlay (rw,seclabel,relatime,lowerdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/26481/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/26480/fs,upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/26558/fs,workdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/26558/work)

# for /binaries
mount | grep "on /binaries "
/dev/sda7 on /binaries type ext4 (rw,seclabel,nosuid,nodev,noexec,relatime,stripe=64)

Which shows noexec for /binaries (but not for /tmp though).

What did you expect to happen?

vcluster create is OK

How can we reproduce it (as minimally and precisely as possible)?

Deploy a kubernetes cluster and configures it to bind any emptyDir to a partition with noexec. Then deploy vcluster.

Anything else we need to know?

Currently, it seems this behavior is particular to the Kubernetes environment I am deploying it into. Generally speaking, it seems the emptyDir are not mounted as noexec. However seeing kubernetes/kubernetes#48912 , it seems we are going in the direction of more security with emptyDir mounted as noexec (by default or with options).

From my understanding of the code (see https://github.com/loft-sh/vcluster/blob/v0.20.0-beta.1/chart/templates/_init-containers.tpl), the initContainers are here to inject vcluster, only to do a cp command (because the cp is not present in the kubernetes images), to get kube-controller-manager and kube-apiserver binaries into vcluster image. This needs emptyDir mounted as exec.

Host cluster Kubernetes version

$ kubectl version
kubectl version
Client Version: v1.28.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.26.4

Host cluster Kubernetes distribution

kubespray

vlcuster version

$ vcluster --version
vcluster version 0.20.0-beta.1

Vcluster Kubernetes distribution(k3s(default)), k8s, k0s)

k8s

OS and Arch

OS:  Linux
Arch: amd64
@antoinetran
Copy link
Author

I could ask the kubernetes admin if they can change the behavior of emptyDir, so that the partition are not noexec. It might be difficult for them to lower the security. Moreover, the kubernetes issue kubernetes/kubernetes#48912 might make this a future issue for vcluster anyway.

What if vcluster image directly contains the two binaries? I don't know about licence but that would prevent this trick and we could then have noexec in the image and in emptyDir.

@facchettos
Copy link
Contributor

@antoinetran Hi, thanks for opening this. to answer your question What if vcluster image directly contains the two binaries? the issue here is that we default to the current k8s version of the host (e.g. if you're on 1.27 in the host cluster the image will be pulled from k8s 1.27) and this is also configurable. So we would have to have at least 4 different images just for the k8s distro, plus the images would have to also include the scheduler and the controller even if not in use and BYOI would be harder too
The issue you linked may be a problem indeed for this approach, I will be taking a look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants