服务器重启后kubernetes无法启动的原因
一故障现象
[root@master-node ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: activating (auto-restart) (Result: exit-code) since 五 2021-11-26 13:39:00 CST; 9s ago
Docs: https://kubernetes.io/docs/
Process: 8824 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILURE)
Main PID: 8824 (code=exited, status=1/FAILURE)
11月 26 13:39:00 master-node systemd[1]: kubelet.service: main process exited, code=exited, status=1/FAILURE
11月 26 13:39:00 master-node systemd[1]: Unit kubelet.service entered failed state.
11月 26 13:39:00 master-node systemd[1]: kubelet.service failed.
[root@master-node ~]#
二故障重现
在一台3个节点的kubernetes集群上,重启了master节点之后,发现kubelet服务启动失败,无论手动启动、还是其自动重启,都无法顺利启动,报错如上。百思不得其解。
三故障原因
服务器开启了交换分区。
事实上,当前节点机器上通过df -Th并没有看到已经开启交换分区。但是,执行一下swapoff -a,再启动kubelet服务就正常了。
[root@master-node ~]# swapoff -a
[root@master-node ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: activating (auto-restart) (Result: exit-code) since 五 2021-11-26 13:39:00 CST; 9s ago
Docs: https://kubernetes.io/docs/
Process: 8824 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILURE)
Main PID: 8824 (code=exited, status=1/FAILURE)
11月 26 13:39:00 master-node systemd[1]: kubelet.service: main process exited, code=exited, status=1/FAILURE
11月 26 13:39:00 master-node systemd[1]: Unit kubelet.service entered failed state.
11月 26 13:39:00 master-node systemd[1]: kubelet.service failed.
[root@master-node ~]# systemctl start kubelet
[root@master-node ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since 五 2021-11-26 13:39:10 CST; 5s ago
Docs: https://kubernetes.io/docs/
Main PID: 8854 (kubelet)
Tasks: 15
Memory: 44.0M
CGroup: /system.slice/kubelet.service
└─8854 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --network-plugin=cni --pod-infra-co...
11月 26 13:39:15 master-node kubelet[8854]: E1126 13:39:15.449196 8854 kubelet.go:2412] "Error getting node" err="node \"master-node\" not found"
11月 26 13:39:15 master-node kubelet[8854]: E1126 13:39:15.550136 8854 kubelet.go:2412] "Error getting node" err="node \"master-node\" not found"
11月 26 13:39:15 master-node kubelet[8854]: E1126 13:39:15.650889 8854 kubelet.go:2412] "Error getting node" err="node \"master-node\" not found"
11月 26 13:39:15 master-node kubelet[8854]: E1126 13:39:15.750974 8854 kubelet.go:2412] "Error getting node" err="node \"master-node\" not found"
11月 26 13:39:15 master-node kubelet[8854]: E1126 13:39:15.852678 8854 kubelet.go:2412] "Error getting node" err="node \"master-node\" not found"
11月 26 13:39:15 master-node kubelet[8854]: E1126 13:39:15.953314 8854 kubelet.go:2412] "Error getting node" err="node \"master-node\" not found"
11月 26 13:39:16 master-node kubelet[8854]: E1126 13:39:16.054236 8854 kubelet.go:2412] "Error getting node" err="node \"master-node\" not found"
11月 26 13:39:16 master-node kubelet[8854]: E1126 13:39:16.155038 8854 kubelet.go:2412] "Error getting node" err="node \"master-node\" not found"
11月 26 13:39:16 master-node kubelet[8854]: E1126 13:39:16.256030 8854 kubelet.go:2412] "Error getting node" err="node \"master-node\" not found"
11月 26 13:39:16 master-node kubelet[8854]: E1126 13:39:16.356340 8854 kubelet.go:2412] "Error getting node" err="node \"master-node\" not found"
[root@master-node ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master-node Ready control-plane,master 21d v1.22.3
node-1 Ready <none> 21d v1.22.3
node-2 Ready <none> 21d v1.22.3
[root@master-node ~]#
四彻底解决
当前master节点重启之后,虽然通过df -Th没有看到SWAP分区信息,但是并不表示系统没有开启SWAP。如:
[root@master-node ~]# ll /etc/fstab
-rw-r--r--. 1 root root 465 1月 8 2020 /etc/fstab
[root@master-node ~]# cat /etc/fstab
#
# /etc/fstab
# Created by anaconda on Wed Jan 8 17:41:56 2020
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
/dev/mapper/centos-root / xfs defaults 0 0
UUID=0c810ea4-9c87-4512-a1ed-71dfdc89498b /boot xfs defaults 0 0
/dev/mapper/centos-swap swap swap defaults 0 0
[root@master-node ~]# swapon -s
文件名 类型 大小 已用 权限
/dev/dm-1 partition 8257532 0 -1
[root@master-node ~]# swapoff -a
[root@master-node ~]# swapon -s
[root@master-node ~]#
可以看到系统的文件系统配置文件里有配置SWAP,同时通过swapon -s可以看到当前系统的SWAP使用情况,但是df -Th没有看到SWAP信息。为了彻底解决该问题,防止下次机器重启之后,无法启动kubernetes服务,我们把/etc/fstab文件里关于SWAP的配置注释或者彻底删除掉。即可。
五参考
https://stackoverflow.com/questions/62407918/kubelet-service-is-not-starting