startupProbe小结
Contents
一 startupProbe的适用场景
当Container还在启动的过程中,还没完成启动,此时如果livenessProbe开始工作,那么就可能会导致livenessProbe的执行结果失败,进而导致重启Container。最终有可能导致这个Container进入无限循环的不断重启的情况发生。
startupProbe用来解决这个场景下的问题。
需要更长的时间来启动,
二 startupProbe的作用
startupProbe在Container启动时就开始工作,它确保该Container一定可以启动成功。
如果startupProbe的执行结果失败,那么它就会重启Container,直到Container启动成功。
如果startupProbe的执行结果成功,那么它认为Container启动成功,接下来才可以开始执行livenessProbe,或者是Container可以开始接收应用请求了。这样就避免了Container还没启动完成,应用请求或者是livenessProbe的请求就发送过来,导致得不到预期的响应结果的情况发生。
三 startupProbe的分类和参数
1分类
跟livenessProbe和readinessProbe一样分为3类:
- httpGet请求;
- TCPSocket请求;
- exec命令;
在Kubernetes 1.23版本开始,还支持gRPC类型的probe。
关于每一种类型的具体说明,可以参考前面的文章:Pod如何实现Container的健康运行。
2参数字段
startupProbe的参数同样有4个:initialDelaySeconds、timeoutSeconds、failureThreshold和periodSeconds。只是,通常initialDelaySeconds的初始默认值为0。
参数字段的说明,同样参考前面的文章:Pod如何实现Container的健康运行。
四 startupProbe举例
1 启动正常的exec类型的startupProbe
[root@master-node ~]# cat startup-probe-exec-succeed-demo.yaml apiVersion: v1 kind: Pod metadata: name: startup-probe-exec-succeed-demo spec: containers: - name: startup-probe-exec-succeed-demo image: busybox:latest args: - /bin/sh - -c - sleep 300 startupProbe: exec: command: - cat - /etc/hosts periodSeconds: 10 failureThreshold: 10 [root@master-node ~]#
启动pod,并查看:
[root@master-node ~]# kubectl apply -f startup-probe-exec-succeed-demo.yaml pod/startup-probe-exec-succeed-demo created [root@master-node ~]# kubectl describe pod startup-probe-exec-succeed-demo Name: startup-probe-exec-succeed-demo Namespace: default Priority: 0 Node: node-1/172.16.11.148 Start Time: Sun, 29 May 2022 08:34:53 +0800 ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 9s default-scheduler Successfully assigned default/startup-probe-exec-succeed-demo to node-1 Normal Pulling 8s kubelet Pulling image "busybox:latest" Normal Pulled 4s kubelet Successfully pulled image "busybox:latest" in 3.715726337s Normal Created 4s kubelet Created container startup-probe-exec-succeed-demo Normal Started 4s kubelet Started container startup-probe-exec-succeed-demo [root@master-node ~]#
该Container里的startupProbe是exec类型的,cat /etc/hosts 命令结果是否为0 ? 作为判断条件。
2 启动失败的exec类型的startupProbe
[root@master-node ~]# cat startup-probe-exec-failure-demo.yaml apiVersion: v1 kind: Pod metadata: name: startup-probe-exec-failure-demo spec: containers: - name: startup-probe-exec-failure-demo image: busybox:latest args: - /bin/sh - -c - sleep 300 startupProbe: exec: command: - cat - /etc/foobar periodSeconds: 5 failureThreshold: 3 [root@master-node ~]#
Container里压根儿就不存在/etc/foobar文件,startupProbe必然失败,那么Container就会被重启。下一次startupProbe又失败,又重启,直到该错误解决为止。
[root@master-node ~]# kubectl apply -f startup-probe-exec-failure-demo.yaml pod/startup-probe-exec-failure-demo created [root@master-node ~]#
在另外一个terminal上执行:kubectl get events -w ;会看到类似下述错误:
[root@master-node ~]# kubectl get events -w .... 0s Normal Scheduled pod/startup-probe-exec-failure-demo Successfully assigned default/startup-probe-exec-failure-demo to node-2 0s Normal Pulling pod/startup-probe-exec-failure-demo Pulling image "busybox:latest" 0s Normal Pulled pod/startup-probe-exec-failure-demo Successfully pulled image "busybox:latest" in 5.199006765s 0s Normal Created pod/startup-probe-exec-failure-demo Created container startup-probe-exec-failure-demo 0s Normal Started pod/startup-probe-exec-failure-demo Started container startup-probe-exec-failure-demo 0s Warning Unhealthy pod/startup-probe-exec-failure-demo Startup probe failed: cat: can't open '/etc/foobar': No such file or directory 0s Warning Unhealthy pod/startup-probe-exec-failure-demo Startup probe failed: cat: can't open '/etc/foobar': No such file or directory 0s Warning Unhealthy pod/startup-probe-exec-failure-demo Startup probe failed: cat: can't open '/etc/foobar': No such file or directory 0s Normal Killing pod/startup-probe-exec-failure-demo Container startup-probe-exec-failure-demo failed startup probe, will be restarted 0s Normal Pulling pod/startup-probe-exec-failure-demo Pulling image "busybox:latest" 0s Normal Pulled pod/startup-probe-exec-failure-demo Successfully pulled image "busybox:latest" in 3.68318295s 0s Normal Created pod/startup-probe-exec-failure-demo Created container startup-probe-exec-failure-demo 0s Normal Started pod/startup-probe-exec-failure-demo Started container startup-probe-exec-failure-demo 0s Warning Unhealthy pod/startup-probe-exec-failure-demo Startup probe failed: cat: can't open '/etc/foobar': No such file or directory 0s Warning Unhealthy pod/startup-probe-exec-failure-demo Startup probe failed: cat: can't open '/etc/foobar': No such file or directory 0s Warning Unhealthy pod/startup-probe-exec-failure-demo Startup probe failed: cat: can't open '/etc/foobar': No such file or directory 0s Normal Killing pod/startup-probe-exec-failure-demo Container startup-probe-exec-failure-demo failed startup probe, will be restarted 0s Normal Pulling pod/startup-probe-exec-failure-demo Pulling image "busybox:latest" 0s Normal Pulled pod/startup-probe-exec-failure-demo Successfully pulled image "busybox:latest" in 3.667381851s 0s Normal Created pod/startup-probe-exec-failure-demo Created container startup-probe-exec-failure-demo 0s Normal Started pod/startup-probe-exec-failure-demo Started container startup-probe-exec-failure-demo 0s Warning Unhealthy pod/startup-probe-exec-failure-demo Startup probe failed: cat: can't open '/etc/foobar': No such file or directory 0s Warning Unhealthy pod/startup-probe-exec-failure-demo Startup probe failed: cat: can't open '/etc/foobar': No such file or directory 0s Warning Unhealthy pod/startup-probe-exec-failure-demo Startup probe failed: cat: can't open '/etc/foobar': No such file or directory 0s Normal Killing pod/startup-probe-exec-failure-demo Container startup-probe-exec-failure-demo failed startup probe, will be restarted
3 启动正常的httpGet类型的startupProbe
[root@master-node ~]# cat startup-probe-httget-succeed-demo.yaml apiVersion: v1 kind: Pod metadata: name: startup-probe-httpget-succeed-demo spec: containers: - name: startup-probe-httpget-succeed-demo image: nginx:latest startupProbe: httpGet: path: / port: 80 periodSeconds: 5 failureThreshold: 3 [root@master-node ~]#
启动,并监控pod和events:
[root@master-node ~]# kubectl apply -f startup-probe-httget-succeed-demo.yaml pod/startup-probe-httpget-succeed-demo created [root@master-node ~]# [root@master-node ~]# kubectl get pods -w .... startup-probe-httpget-succeed-demo 0/1 Pending 0 0s startup-probe-httpget-succeed-demo 0/1 Pending 0 0s startup-probe-httpget-succeed-demo 0/1 ContainerCreating 0 0s startup-probe-httpget-succeed-demo 0/1 Running 0 19s startup-probe-httpget-succeed-demo 0/1 Running 0 20s startup-probe-httpget-succeed-demo 1/1 Running 0 21s [root@master-node ~]# kubectl get events -w LAST SEEN TYPE REASON OBJECT MESSAGE 0s Normal Scheduled pod/startup-probe-httpget-succeed-demo Successfully assigned default/startup-probe-httpget-succeed-demo to node-2 0s Normal Pulling pod/startup-probe-httpget-succeed-demo Pulling image "nginx:latest" 0s Normal Pulled pod/startup-probe-httpget-succeed-demo Successfully pulled image "nginx:latest" in 17.338565388s 0s Normal Created pod/startup-probe-httpget-succeed-demo Created container startup-probe-httpget-succeed-demo 0s Normal Started pod/startup-probe-httpget-succeed-demo Started container startup-probe-httpget-succeed-demo
4 启动失败的httpGet类型的startupProbe
The startupProbe.httpGet field supports optional host, scheme, path, and httpHeaders fields to customize the request that’s made. The host defaults to the pod’s internal IP address; the default scheme is http. The following pod manifest includes a startup probe that makes an HTTPS request with a custom header:
[root@master-node ~]# cat startup-probe-httget-failure-demo.yaml apiVersion: v1 kind: Pod metadata: name: startup-probe-httpget-failure-demo spec: containers: - name: startup-probe-httpget-failure-demo image: nginx:latest startupProbe: httpGet: path: / port: 80 scheme: HTTPS httpHeaders: - name: X-Client-Identity value: Kubernetes-Startup-Probe [root@master-node ~]#
在向NGINX的HTTP请求头了加入了信息。我们通过HTTPS协议,默认情况下NGINX并没开启支持HTTPS,所以startupProbe会报错!
[root@master-node ~]# kubectl apply -f startup-probe-httget-failure-demo.yaml pod/startup-probe-httpget-failure-demo created [root@master-node ~]# ... [root@master-node ~]# kubectl get events -w LAST SEEN TYPE REASON OBJECT MESSAGE ... 0s Normal Scheduled pod/startup-probe-httpget-failure-demo Successfully assigned default/startup-probe-httpget-failure-demo to node-1 0s Normal Pulling pod/startup-probe-httpget-failure-demo Pulling image "nginx:latest" 0s Normal Pulled pod/startup-probe-httpget-failure-demo Successfully pulled image "nginx:latest" in 18.746863622s 0s Normal Created pod/startup-probe-httpget-failure-demo Created container startup-probe-httpget-failure-demo 0s Normal Started pod/startup-probe-httpget-failure-demo Started container startup-probe-httpget-failure-demo 0s Warning Unhealthy pod/startup-probe-httpget-failure-demo Startup probe failed: Get "https://10.244.1.58:80/": http: server gave HTTP response to HTTPS client 0s Warning Unhealthy pod/startup-probe-httpget-failure-demo Startup probe failed: Get "https://10.244.1.58:80/": http: server gave HTTP response to HTTPS client 0s Warning EvictionThresholdMet node/master-node Attempting to reclaim ephemeral-storage 0s Warning Unhealthy pod/startup-probe-httpget-failure-demo Startup probe failed: Get "https://10.244.1.58:80/": http: server gave HTTP response to HTTPS client 0s Normal Killing pod/startup-probe-httpget-failure-demo Container startup-probe-httpget-failure-demo failed startup probe, will be restarted 0s Normal Pulling pod/startup-probe-httpget-failure-demo Pulling image "nginx:latest" 0s Normal Pulled pod/startup-probe-httpget-failure-demo Successfully pulled image "nginx:latest" in 3.744738269s 0s Normal Created pod/startup-probe-httpget-failure-demo Created container startup-probe-httpget-failure-demo 0s Normal Started pod/startup-probe-httpget-failure-demo Started container startup-probe-httpget-failure-demo 0s Warning Unhealthy pod/startup-probe-httpget-failure-demo Startup probe failed: Get "https://10.244.1.58:80/": http: server gave HTTP response to HTTPS client
5 启动正常的tcpSocket类型的startupProbe
[root@master-node ~]# cat startup-probe-tcpsocket-succeed-demo.yaml apiVersion: v1 kind: Pod metadata: name: startup-probe-tcpsocket-succeed-demo spec: containers: - name: startup-probe-tcpsocket-succeed-demo image: nginx:latest startupProbe: tcpSocket: port: 80 periodSeconds: 5 failureThreshold: 3 [root@master-node ~]# kubectl apply -f startup-probe-tcpsocket-succeed-demo.yaml pod/startup-probe-tcpsocket-succeed-demo created [root@master-node ~]# [root@master-node ~]# kubectl get events -w LAST SEEN TYPE REASON OBJECT MESSAGE ... 19s Normal Scheduled pod/startup-probe-tcpsocket-succeed-demo Successfully assigned default/startup-probe-tcpsocket-succeed-demo to node-2 18s Normal Pulling pod/startup-probe-tcpsocket-succeed-demo Pulling image "nginx:latest" 14s Normal Pulled pod/startup-probe-tcpsocket-succeed-demo Successfully pulled image "nginx:latest" in 3.686109984s 14s Normal Created pod/startup-probe-tcpsocket-succeed-demo Created container startup-probe-tcpsocket-succeed-demo 14s Normal Started pod/startup-probe-tcpsocket-succeed-demo Started container startup-probe-tcpsocket-succeed-demo
五startupProbe的实践建议
1 对于那些启动时间比较长的Container,建议配置startupProbe;
2 startupProbe最好和livenessProbe、readiness同时配置;
3 startupProbe的类型和执行的命令,最好和livenessProbe保持一致;这样避免startupProbe执行成功,而livenessProbe失败导致Container重启,而很难去定位和分析问题;
4 startupProbe的periodSeconds*failureThreshold一定要大于Container启动的需要时长。否则,该时间段内,Container不能完成启动,将被重启,下一次又没能在该时长范围内完成启动,再次被重启,陷入死循环;
六 小结和参考
《Kubernetes in Action Second Edition》 Marko luksa Chapter 06 managing the lifecycle of the Pod’s containers P158–P160.
https://www.containiq.com/post/kubernetes-startup-probe
延伸阅读:如何查看和分析Kubernetes中pod的phase、conditions?它们有什么作用?