创建完 Deploment 之后,使用 kubectl get pod 命令查看 Pod 创建的进度,发现一直卡在 ContainerCreating 状态
1 2 3 4
[root@k8s-master-01 opt]# kubectl get pod -n ops-test NAME READY STATUS RESTARTS AGE node1-nfs-test-547c4d7678-j6kwv 0/1 ContainerCreating 0 2m12s node1-nfs-test-547c4d7678-vwdqg 0/1 ContainerCreating 0 2m12s
[root@k8s-master-01 opt]# kubectl describe pod node1-nfs-test-547c4d7678-j6kwv -n ops-test Name: node1-nfs-test-547c4d7678-j6kwv Namespace: ops-test Priority: 0 Node: node1/10.10.107.214 Start Time: Thu, 30 Apr 2020 17:00:33 +0800 Labels: pod-template-hash=547c4d7678 workload.user.cattle.io/workloadselector=deployment-ops-test-node1-nfs-test Annotations: cattle.io/timestamp: 2020-04-30T09:01:05Z workload.cattle.io/state: {"bm9kZTE=":"c-rl5jz:machine-wbs6r"} Status: Pending IP: IPs: <none> Controlled By: ReplicaSet/node1-nfs-test-547c4d7678 Containers: node1-nfs-test: Container ID: Image: alpine Image ID: Port: <none> Host Port: <none> State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Environment: <none> Mounts: /tmp from vol1 (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-f6wjj (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: vol1: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: nfs-211 ReadOnly: false default-token-f6wjj: Type: Secret (a volume populated by a Secret) SecretName: default-token-f6wjj Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedMount 8m49s kubelet, node1 MountVolume.SetUp failed for volume "nfs211" : mount failed: exit status 32 Mounting command: systemd-run Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/cddc94e7-8033-4150-bed5-d141e3b71e49/volumes/kubernetes.io~nfs/nfs211 --scope -- mount -t nfs 10.20.172.211:/nfs /var/lib/kubelet/pods/cddc94e7-8033-4150-bed5-d141e3b71e49/volumes/kubernetes.io~nfs/nfs211 Output: Running scope as unit run-38284.scope. mount: wrong fs type, bad option, bad superblock on 10.20.172.211:/nfs, missing codepage or helper program, or other error (for several filesystems (e.g. nfs, cifs) you might need a /sbin/mount.<type> helper program)
In some cases useful info is found in syslog - try dmesg | tail or so. Warning FailedMount 8m48s kubelet, node1 MountVolume.SetUp failed for volume "nfs211" : mount failed: exit status 32
[root@k8s-master-03 ~]# mount -t nfs 10.20.172.211:/nfs /tmp mount: wrong fs type, bad option, bad superblock on 10.20.172.211:/nfs, missing codepage or helper program, or other error (for several filesystems (e.g. nfs, cifs) you might need a /sbin/mount.<type> helper program)
In some cases useful info is found in syslog - try dmesg | tail or so.
Total download size: 1.5 M Installed size: 4.3 M Is this ok [y/d/N]:
yum 一把梭后发现 nfs-utils 还真没有安装😂。
安装完时候使用 kubectl 删除掉之前的 Pod,Deploment 控制器会自动帮我们将 Pod 数量调和到指定的数量。可以发现 Pod 所在宿主机安装 NFS 客户端之后 kubelet 就能正常为 Pod 挂载 volume 了 而且 Pod 也正常运行了。
1 2 3 4 5 6 7
[root@k8s-master-01 ~]# kubectl delete pod node1-nfs-test-547c4d7678-j6kwv node1-nfs-test-547c4d7678-vwdqg -n ops-test pod "node1-nfs-test-547c4d7678-j6kwv" deleted pod "node1-nfs-test-547c4d7678-vwdqg" deleted [root@k8s-master-01 ~]# kubectl get pod -n ops-test NAME READY STATUS RESTARTS AGE node1-nfs-test-7589fb4787-cknz4 1/1 Running 0 18s node1-nfs-test-7589fb4787-l9bt2 1/1 Running 0 22s
进入容器内查看一下容器内挂载点的信息:
1 2 3 4 5 6 7 8 9 10 11
[root@k8s-master-01 ~]# kubectl exec -it node1-nfs-test-7589fb4787-cknz4 -n ops-test sh / # df -h Filesystem Size Used Available Use% Mounted on overlay 28.9G 4.1G 23.3G 15% / 10.20.172.211:/nfs 28.9G 14.5G 12.9G 53% /tmp tmpfs 1.8G 0 1.8G 0% /sys/firmware / # mount rootfs on / type rootfs (rw) 10.20.172.211:/nfs on /tmp type nfs (rw,relatime,vers=3,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.20.172.211,mountvers=3,mountport=20048,mountproto=udp,local_lock=none,addr=10.20.172.211)
10.20.172.211:/nfs on /mnt/nfs type nfs (rw,relatime,vers=3,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.20.172.211,mountvers=3,mountport=20048,mountproto=udp,local_lock=none,addr=10.20.172.211)
至此问题已经解决了,接下来就到了正文:开始分析一下 kubelet 为 Pod 挂载 volume 的流程和原理😂
根据 docker 的官方文档 Manage data in Docker ,docker 提供了 3 种方法将数据从 Docker 宿主机挂载(mount)到容器内,如下:
图片从 Docker 官方文档偷来的😂
Volumes are stored in a part of the host filesystem which is managed by Docker (/var/lib/docker/volumes/ on Linux). Non-Docker processes should not modify this part of the filesystem. Volumes are the best way to persist data in Docker.
Bind mounts may be stored anywhere on the host system. They may even be important system files or directories. Non-Docker processes on the Docker host or a Docker container can modify them at any time.
tmpfs mounts are stored in the host system’s memory only, and are never written to the host system’s filesystem.
╭─root@sg-02 /home/ubuntu ╰─# docker volume Usage: docker volume COMMAND Manage volumes Commands: create Create a volume inspect Display detailed information on one or more volumes ls List volumes prune Remove all unused local volumes rm Remove one or more volumes Run 'docker volume COMMAND --help' for more information on a command.
在 Node 节点上可以使用 mount 命令来查看 kubelet 为 Pod 挂载的挂载点信息。
1
10.10.107.216:/nfs on /var/lib/kubelet/pods/6750b756-d8e4-448a-93f9-8906f9c44788/volumes/kubernetes.io~nfs/nfs-test type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.10.107.216,mountvers=3,mountport=56389,mountproto=udp,local_lock=none,addr=10.10.107.216)
1 2 3 4 5 6 7
╭─root@k8s-node-3 ~ ╰─# mount | grep kubelet tmpfs on /var/lib/kubelet/pods/45c55c5e-ce96-47fd-94b3-60a334e5a44d/volumes/kubernetes.io~secret/kube-proxy-token-h4dfb type tmpfs (rw,relatime,seclabel) tmpfs on /var/lib/kubelet/pods/3fb63baa-27ec-4d76-8028-39a0a8f91749/volumes/kubernetes.io~secret/calico-node-token-4hks6 type tmpfs (rw,relatime,seclabel) tmpfs on /var/lib/kubelet/pods/05c75313-f932-4913-b09f-d7bccdfb6e62/volumes/kubernetes.io~secret/nginx-ingress-token-5569x type tmpfs (rw,relatime,seclabel) 10.20.172.211:/nfs on /var/lib/kubelet/pods/c4b1998b-f5c1-440a-b9bc-7fbf87f3c267/volumes/kubernetes.io~nfs/nfs211 type nfs (rw,relatime,vers=3,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.20.172.211,mountvers=3,mountport=20048,mountproto=udp,local_lock=none,addr=10.20.172.211) tmpfs on /var/lib/kubelet/pods/73fed6f3-4cbe-46a7-af7b-6fd912e6ebd4/volumes/kubernetes.io~secret/default-token-wgfd9 type tmpfs (rw,relatime,seclabel)
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedMount 8m49s kubelet, node1 MountVolume.SetUp failed for volume "nfs211" : mount failed: exit status 32 Mounting command: systemd-run Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/cddc94e7-8033-4150-bed5-d141e3b71e49/volumes/kubernetes.io~nfs/nfs211 --scope -- mount -t nfs 10.20.172.211:/nfs /var/lib/kubelet/pods/cddc94e7-8033-4150-bed5-d141e3b71e49/volumes/kubernetes.io~nfs/nfs211 Output: Running scope as unit run-38284.scope. mount: wrong fs type, bad option, bad superblock on 10.20.172.211:/nfs, missing codepage or helper program, or other error (for several filesystems (e.g. nfs, cifs) you might need a /sbin/mount.<type> helper program)
In some cases useful info is found in syslog - try dmesg | tail or so. Warning FailedMount 8m48s kubelet, node1 MountVolume.SetUp failed for volume "nfs211" : mount failed: exit status 32
咦?当时我还寻思着 kubelet 挂载 volumes 和 systemd 什么关系?systemd 这个大妈怎么又来管这事儿了😂(之前我写过一篇《Linux 的小伙伴 systemd 详解》 ,戏称 systemd 是 Linux 的小伙伴,看来这个说法是不妥的。systemd 简直就是 Linux 里的物业大妈好嘛🤣,上管 service 下管 dev 、 mount 设备等。屑,简直就是个物业大妈管这管那的。回到正题,于是顺着这条报错日志顺藤摸瓜找到了 Run mount in its own systemd scope. 这个 PR。
Kubelet needs to run /bin/mount in its own cgroup.
When kubelet runs as a systemd service, “systemctl restart kubelet” may kill all processes in the same cgroup and thus terminate fuse daemons that are needed for gluster and cephfs mounts.
When kubelet runs in a docker container, restart of the container kills all fuse daemons started in the container.
Killing fuse daemons is bad, it basically unmounts volumes from running pods.
This patch runs mount via “systemd-run –scope /bin/mount …”, which makes sure that any fuse daemons are forked in its own systemd scope (= cgroup) and they will survive restart of kubelet’s systemd service or docker container.
// doMount runs the mount command. mounterPath is the path to mounter binary if containerized mounter is used. // sensitiveOptions is an extention of options except they will not be logged (because they may contain sensitive material) func(mounter *Mounter)doMount(mounterPath string, mountCmd string, source string, target string, fstype string, options []string, sensitiveOptions []string)error { mountArgs, mountArgsLogStr := MakeMountArgsSensitive(source, target, fstype, options, sensitiveOptions) iflen(mounterPath) > 0 { mountArgs = append([]string{mountCmd}, mountArgs...) mountArgsLogStr = mountCmd + " " + mountArgsLogStr mountCmd = mounterPath }
if mounter.withSystemd { // Try to run mount via systemd-run --scope. This will escape the // service where kubelet runs and any fuse daemons will be started in a // specific scope. kubelet service than can be restarted without killing // these fuse daemons. // // Complete command line (when mounterPath is not used): // systemd-run --description=... --scope -- mount -t <type> <what> <where> // // Expected flow: // * systemd-run creates a transient scope (=~ cgroup) and executes its // argument (/bin/mount) there. // * mount does its job, forks a fuse daemon if necessary and finishes. // (systemd-run --scope finishes at this point, returning mount's exit // code and stdout/stderr - thats one of --scope benefits). // * systemd keeps the fuse daemon running in the scope (i.e. in its own // cgroup) until the fuse daemon dies (another --scope benefit). // Kubelet service can be restarted and the fuse daemon survives. // * When the fuse daemon dies (e.g. during unmount) systemd removes the // scope automatically. // // systemd-mount is not used because it's too new for older distros // (CentOS 7, Debian Jessie). mountCmd, mountArgs, mountArgsLogStr = AddSystemdScopeSensitive("systemd-run", target, mountCmd, mountArgs, mountArgsLogStr) } else { // No systemd-run on the host (or we failed to check it), assume kubelet // does not run as a systemd service. // No code here, mountCmd and mountArgs are already populated. }
Mount 阶段:容器启动的时候通过 bind mount 的方式将 /var/lib/kubelet/pods/<Pod的ID>/volumes/kubernetes.io~<Volume类型>/<Volume名字> 这个目录挂载到容器内。这一步相当于使用docker run -v /var/lib/kubelet/pods/<Pod的ID>/volumes/kubernetes.io~<Volume类型>/<Volume名字>:/<容器内的目标目录> 我的镜像 启动一个容器。