Skip to content
This repository has been archived by the owner on Oct 22, 2024. It is now read-only.

Directory permission issue when using DaemonSet and PMEM-CSI on OpenShift 4.6.9 #912

Open
Tianyang-Zhang opened this issue Mar 8, 2021 · 15 comments
Labels
OpenShift issues occuring on Red Hat OpenShift

Comments

@Tianyang-Zhang
Copy link

I created a local PV and PVC with local storage class(no provisioner) and readWriteMany access mode for storage sharing between pods:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: local-storage
provisioner: kubernetes.io/no-provisioner
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: shared-volume
spec:
  capacity:
    storage: 8Gi
  accessModes:
  - ReadWriteMany
  storageClassName: local-storage
  local:
    path: /tmp
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: storage
          operator: In
          values:
          - pmem
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: shared-volume-claim
spec:
  storageClassName: local-storage
  volumeName: shared-volume
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 8Gi

Then I created a daemonSet mount to this volume(path /tmp/memverge). This daemonSet uses PMEM-CSI to provision PMEM by CSI ephemeral volume(I'm using OpenShift 4.6 and generic ephemeral volume somehow is not supported). Everything works fine and I can attach to my pods(say pod A) and access the mounted directory. But if I create another pod(say pod B, which is running on the same node as pod A) mounting to the same local PV, I no longer able to access /tmp/memverge in pod A and get error:

[root@memory-machine-mcz4z /]# ls /tmp/memverge/
ls: cannot open directory '/tmp/memverge/': Permission denied

The permission in container is correct:

[root@memory-machine-mcz4z /]# ls -l /tmp/
total 8
-rwx------.  1 root root 701 Dec  4 17:37 ks-script-esd4my7v
-rwx------.  1 root root 671 Dec  4 17:37 ks-script-eusq_sc5
drwxrwsrwt. 11 root root 520 Mar  5 23:12 memverge

If I create more pods mounting to the same local PV, all these pods works fine and I am able to access the mounted dir. But not the pod A.

If I remove the CSI ephemeral volume part in the daemonSet and re-do everything, this issue is gone. The volume spec for PMEM-CSI is as following:

volumes:
      - name: pmem-csi-ephemeral-volume
        csi:
          driver: pmem-csi.intel.com
          fsType: "xfs"
          volumeAttributes:
            size: "20Gi"

This issue seems only happens when daemonSet is involved. I haven't do

@pohly
Copy link
Contributor

pohly commented Mar 9, 2021

This smells like an issue in the container runtime, potentially related to SELinux.

Can you reproduce it with SELinux disabled?

Can you reproduce it when replacing PMEM-CSI with some other CSI driver, for example https://github.com/kubernetes-csi/csi-driver-host-path?

@pohly
Copy link
Contributor

pohly commented Mar 9, 2021

I tried to reproduce this on our QEMU cluster, but without success. it worked:

pohly@pohly-desktop:/nvme/gopath/src/github.com/intel/pmem-csi$ kubectl get pods -o wide
NAME          READY   STATUS    RESTARTS   AGE     IP                NODE                         NOMINATED NODE   READINESS GATES
pod-b         1/1     Running   0          13s     192.168.200.68    pmem-csi-pmem-govm-worker3   <none>           <none>
sleep-qkzxr   1/1     Running   0          4m13s   192.168.200.67    pmem-csi-pmem-govm-worker3   <none>           <none>
sleep-rj7qx   1/1     Running   0          4m13s   192.168.133.132   pmem-csi-pmem-govm-worker1   <none>           <none>
sleep-ssrs7   1/1     Running   0          4m13s   192.168.220.67    pmem-csi-pmem-govm-worker2   <none>           <none>
pohly@pohly-desktop:/nvme/gopath/src/github.com/intel/pmem-csi$ kubectl exec pod-b -- ls /tmp/memverge
runc-process670585825
systemd-private-b887230389c949ce9a1d9e64bdcec54b-chronyd.service-xJQLMi
systemd-private-b887230389c949ce9a1d9e64bdcec54b-dbus-broker.service-vVaVSe
systemd-private-b887230389c949ce9a1d9e64bdcec54b-systemd-logind.service-ko8uti
systemd-private-b887230389c949ce9a1d9e64bdcec54b-systemd-resolved.service-jsPHjj
pohly@pohly-desktop:/nvme/gopath/src/github.com/intel/pmem-csi$ kubectl exec sleep-qkzxr -- ls /tmp/memverge
runc-process561773745
systemd-private-b887230389c949ce9a1d9e64bdcec54b-chronyd.service-xJQLMi
systemd-private-b887230389c949ce9a1d9e64bdcec54b-dbus-broker.service-vVaVSe
systemd-private-b887230389c949ce9a1d9e64bdcec54b-systemd-logind.service-ko8uti
systemd-private-b887230389c949ce9a1d9e64bdcec54b-systemd-resolved.service-jsPHjj
pohly@pohly-desktop:/nvme/gopath/src/github.com/intel/pmem-csi$ kubectl exec sleep-qkzxr -- touch /tmp/memverge/foo
pohly@pohly-desktop:/nvme/gopath/src/github.com/intel/pmem-csi$ kubectl exec pod-b -- ls /tmp/memverge
foo
runc-process553107404
systemd-private-b887230389c949ce9a1d9e64bdcec54b-chronyd.service-xJQLMi
systemd-private-b887230389c949ce9a1d9e64bdcec54b-dbus-broker.service-vVaVSe
systemd-private-b887230389c949ce9a1d9e64bdcec54b-systemd-logind.service-ko8uti
systemd-private-b887230389c949ce9a1d9e64bdcec54b-systemd-resolved.service-jsPHjj
pohly@pohly-desktop:/nvme/gopath/src/github.com/intel/pmem-csi$ kubectl exec sleep-qkzxr -- mount
overlay on / type overlay (rw,seclabel,relatime,lowerdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/55/fs,upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/56/fs,workdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/56/work)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev type tmpfs (rw,seclabel,nosuid,size=65536k,mode=755)
devpts on /dev/pts type devpts (rw,seclabel,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666)
mqueue on /dev/mqueue type mqueue (rw,seclabel,nosuid,nodev,noexec,relatime)
sysfs on /sys type sysfs (ro,seclabel,nosuid,nodev,noexec,relatime)
tmpfs on /sys/fs/cgroup type tmpfs (ro,seclabel,nosuid,nodev,noexec,relatime,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (ro,seclabel,nosuid,nodev,noexec,relatime,xattr,name=systemd)
cgroup on /sys/fs/cgroup/blkio type cgroup (ro,seclabel,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/freezer type cgroup (ro,seclabel,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/memory type cgroup (ro,seclabel,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/perf_event type cgroup (ro,seclabel,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (ro,seclabel,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/devices type cgroup (ro,seclabel,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/pids type cgroup (ro,seclabel,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (ro,seclabel,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (ro,seclabel,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/cpuset type cgroup (ro,seclabel,nosuid,nodev,noexec,relatime,cpuset)
/dev/ndbus0region0fsdax/csi-66-bcbfbd4fad181ad3a7f1eb7d641e996ba48246a2f8e0ec39bc54b489 on /pmem type xfs (rw,seclabel,relatime,attr2,dax=always,inode64,logbufs=8,logbsize=32k,noquota)
tmpfs on /tmp/memverge type tmpfs (rw,seclabel,nr_inodes=409600)
/dev/vda1 on /etc/hosts type ext4 (rw,seclabel,relatime)
/dev/vda1 on /dev/termination-log type ext4 (rw,seclabel,relatime)
/dev/vda1 on /etc/hostname type ext4 (rw,seclabel,relatime)
/dev/vda1 on /etc/resolv.conf type ext4 (rw,seclabel,relatime)
shm on /dev/shm type tmpfs (rw,seclabel,nosuid,nodev,noexec,relatime,size=65536k)
tmpfs on /var/run/secrets/kubernetes.io/serviceaccount type tmpfs (ro,seclabel,relatime)
proc on /proc/bus type proc (ro,relatime)
proc on /proc/fs type proc (ro,relatime)
proc on /proc/irq type proc (ro,relatime)
proc on /proc/sys type proc (ro,relatime)
proc on /proc/sysrq-trigger type proc (ro,relatime)
tmpfs on /proc/acpi type tmpfs (ro,seclabel,relatime)
tmpfs on /proc/kcore type tmpfs (rw,seclabel,nosuid,size=65536k,mode=755)
tmpfs on /proc/keys type tmpfs (rw,seclabel,nosuid,size=65536k,mode=755)
tmpfs on /proc/latency_stats type tmpfs (rw,seclabel,nosuid,size=65536k,mode=755)
tmpfs on /proc/timer_list type tmpfs (rw,seclabel,nosuid,size=65536k,mode=755)
tmpfs on /proc/sched_debug type tmpfs (rw,seclabel,nosuid,size=65536k,mode=755)
tmpfs on /proc/scsi type tmpfs (ro,seclabel,relatime)
tmpfs on /sys/firmware type tmpfs (ro,seclabel,relatime)

@pohly
Copy link
Contributor

pohly commented Mar 9, 2021

Here are the objects that I used. Local volume (same as in description):

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: local-storage
provisioner: kubernetes.io/no-provisioner
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: shared-volume
spec:
  capacity:
    storage: 8Gi
  accessModes:
  - ReadWriteMany
  storageClassName: local-storage
  local:
    path: /tmp
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: storage
          operator: In
          values:
          - pmem
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: shared-volume-claim
spec:
  storageClassName: local-storage
  volumeName: shared-volume
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 8Gi

Daemonset:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: sleep
spec:
  selector:
    matchLabels:
      name: sleep
  template:
    metadata:
      labels:
        name: sleep
    spec:
      containers:
      - name: sleep
        image: busybox
        command:
          - sleep
          - "1000000"
        volumeMounts:
        - name: memverge
          mountPath: /tmp/memverge
        - name: pmem-csi-ephemeral-volume
          mountPath: /pmem
      volumes:
      - name: memverge
        persistentVolumeClaim:
          claimName: shared-volume-claim
      - name: pmem-csi-ephemeral-volume
        csi:
          driver: pmem-csi.intel.com
          fsType: "xfs"
          volumeAttributes:
            size: "100Mi"

Pod:

apiVersion: v1
kind: Pod
metadata:
  name: pod-b
spec:
  containers:
  - name: sleep
    image: busybox
    command:
      - sleep
      - "1000000"
    volumeMounts:
    - name: memverge
      mountPath: /tmp/memverge
  volumes:
    - name: memverge
      persistentVolumeClaim:
        claimName: shared-volume-claim

@pohly
Copy link
Contributor

pohly commented Mar 9, 2021

Does it perhaps matter where volumes are mounted inside the containers?

I would avoid mounting volumes on top of each other, if it can be avoided. I don't have a particular reason, it just seems unnecessarily complicated.

@pohly
Copy link
Contributor

pohly commented Mar 9, 2021

For hostpath, distributed provisioning from v1.6.0 would be needed to get all pods of the DaemonSet running. But it looks like I broke CSI ephemeral volume support in that driver when adding capacity simulation in that release. Somehow that didn't show up in tests... because CSI ephemeral volume support is not tested with that driver. Will fix both.

To use hostpath:

  • check out https://github.com/pohly/csi-driver-host-path/commits/volume-size, it has the fix for CSI ephemeral volumes
  • push a custom image to Docker Hub: make push REGISTRY_NAME=pohly IMAGE_TAGS=2021-03-09-2
  • deploy that image in a cluster: HOSTPATHPLUGIN_REGISTRY=pohly HOSTPATHPLUGIN_TAG=2021-03-09-2 /nvme/gopath/src/github.com/kubernetes-csi/csi-driver-host-path/deploy/kubernetes-distributed/deploy.sh

If you don't want to build yourself, you can also use the image that I pushed and directly deploy.

Then use this DaemonSet:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: sleep
spec:
  selector:
    matchLabels:
      name: sleep
  template:
    metadata:
      labels:
        name: sleep
    spec:
      containers:
      - name: sleep
        image: busybox
        command:
          - sleep
          - "1000000"
        volumeMounts:
        - name: memverge
          mountPath: /tmp/memverge
        - name: pmem-csi-ephemeral-volume
          mountPath: /pmem
      volumes:
      - name: memverge
        persistentVolumeClaim:
          claimName: shared-volume-claim
      - name: pmem-csi-ephemeral-volume
        csi:
          driver: hostpath.csi.k8s.io

@Tianyang-Zhang Tianyang-Zhang changed the title Directory permission issue when using k8s DaemonSet and PMEM-CSI Directory permission issue when using DaemonSet and PMEM-CSI on OpenShift 4.6.9 Mar 9, 2021
@Tianyang-Zhang
Copy link
Author

Thanks for the information. I forgot to mention that the issue was found in the OpenShift environment. I'm not sure if this is caused by OpenShift.

I will try not mounting on the same path.

@pohly
Copy link
Contributor

pohly commented Apr 9, 2021

@Tianyang-Zhang Did using different paths help?

@Tianyang-Zhang
Copy link
Author

Sorry about the late update. I tried using a different path(/home/shared) but still having this issue. The SELinux was disabled.

[root@memory-machine-28ql5 /]# ls -l /home/shared/
ls: cannot open directory '/home/shared/': Permission denied
[root@memory-machine-28ql5 /]# ls -l /home/
total 0
drwxr-xr-x. 3 root root       22 Apr  9 00:08 etc
drwxr-xr-x. 1 root root       29 Apr  9 00:08 memverge
drwxr-xr-x. 3 root root       22 Apr  9 00:08 opt
drwxrwsr-x. 3 root 1000960000 81 Apr  9 23:15 shared

@pohly
Copy link
Contributor

pohly commented Apr 10, 2021

Can you reproduce it with the CSI hostpath driver instead of PMEM-CSI? v1.6.2 should work out of the box, i.e. no image building needed.

If yes, then this is something that can be reported to Red Hat.

@Tianyang-Zhang
Copy link
Author

When I trying to create your daemonSet example, I got this error:

Normal   Scheduled         29s                default-scheduler  Successfully assigned injection/sleep-lvqhv to osc-5k68w-worker-9c42p
Warning  FailedMount       13s (x6 over 29s)  kubelet            MountVolume.NewMounter initialization failed for volume "pmem-csi-ephemeral-volume" : volume mode "Ephemeral" not supported by driver hostpath.csi.k8s.io (no CSIDriver object)

Should I build a image from source?

@pohly
Copy link
Contributor

pohly commented Apr 13, 2021

@Tianyang-Zhang
Copy link
Author

I rechecked the whole cluster system and found selinux was re-enabled. This issue is gone after disabled selinux. Sorry about the confusion and extra time you spent!

@pohly
Copy link
Contributor

pohly commented Apr 14, 2021

But the solution can't be "disable SELinux", right?

It might require some extra work, but ideally it should also work with SELinux enabled - whatever "it" is that was failing.

@pohly pohly reopened this Apr 14, 2021
@Tianyang-Zhang
Copy link
Author

But the solution can't be "disable SELinux", right?

It might require some extra work, but ideally it should also work with SELinux enabled - whatever "it" is that was failing.

You are right. It might related to how daemonSet works? I only met this issue when using daemonSet.

@pohly pohly added the OpenShift issues occuring on Red Hat OpenShift label Apr 15, 2021
@Tianyang-Zhang
Copy link
Author

Tianyang-Zhang commented Apr 16, 2021

FYI, we also reproduced this issue without using any CSI driver on Diamanti cluster(k8s). Disable SELinux also fixed it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
OpenShift issues occuring on Red Hat OpenShift
Projects
None yet
Development

No branches or pull requests

2 participants