Kubernetes

startupProbe小结

一 startupProbe的适用场景

当Container还在启动的过程中,还没完成启动,此时如果livenessProbe开始工作,那么就可能会导致livenessProbe的执行结果失败,进而导致重启Container。最终有可能导致这个Container进入无限循环的不断重启的情况发生。

startupProbe用来解决这个场景下的问题。

需要更长的时间来启动,

二 startupProbe的作用

startupProbe在Container启动时就开始工作,它确保该Container一定可以启动成功。

如果startupProbe的执行结果失败,那么它就会重启Container,直到Container启动成功。

如果startupProbe的执行结果成功,那么它认为Container启动成功,接下来才可以开始执行livenessProbe,或者是Container可以开始接收应用请求了。这样就避免了Container还没启动完成,应用请求或者是livenessProbe的请求就发送过来,导致得不到预期的响应结果的情况发生。

三 startupProbe的分类和参数

1分类

跟livenessProbe和readinessProbe一样分为3类:

  1. httpGet请求;
  2. TCPSocket请求;
  3. exec命令;

在Kubernetes 1.23版本开始,还支持gRPC类型的probe。

关于每一种类型的具体说明,可以参考前面的文章:Pod如何实现Container的健康运行

2参数字段

startupProbe的参数同样有4个:initialDelaySeconds、timeoutSeconds、failureThreshold和periodSeconds。只是,通常initialDelaySeconds的初始默认值为0。

参数字段的说明,同样参考前面的文章:Pod如何实现Container的健康运行

四 startupProbe举例

1 启动正常的exec类型的startupProbe

[root@master-node ~]# cat startup-probe-exec-succeed-demo.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: startup-probe-exec-succeed-demo
spec:
  containers:
  - name: startup-probe-exec-succeed-demo
    image: busybox:latest
    args:
    - /bin/sh
    - -c
    - sleep 300
    startupProbe:
      exec:
        command:
        - cat
        - /etc/hosts
      periodSeconds: 10
      failureThreshold: 10
[root@master-node ~]# 

启动pod,并查看:

[root@master-node ~]# kubectl apply -f startup-probe-exec-succeed-demo.yaml 
pod/startup-probe-exec-succeed-demo created
[root@master-node ~]# kubectl describe pod startup-probe-exec-succeed-demo 
Name:         startup-probe-exec-succeed-demo
Namespace:    default
Priority:     0
Node:         node-1/172.16.11.148
Start Time:   Sun, 29 May 2022 08:34:53 +0800
...
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  9s    default-scheduler  Successfully assigned default/startup-probe-exec-succeed-demo to node-1
  Normal  Pulling    8s    kubelet            Pulling image "busybox:latest"
  Normal  Pulled     4s    kubelet            Successfully pulled image "busybox:latest" in 3.715726337s
  Normal  Created    4s    kubelet            Created container startup-probe-exec-succeed-demo
  Normal  Started    4s    kubelet            Started container startup-probe-exec-succeed-demo
[root@master-node ~]# 

该Container里的startupProbe是exec类型的,cat /etc/hosts 命令结果是否为0 ? 作为判断条件。

2 启动失败的exec类型的startupProbe

[root@master-node ~]# cat startup-probe-exec-failure-demo.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: startup-probe-exec-failure-demo
spec:
  containers:
  - name: startup-probe-exec-failure-demo
    image: busybox:latest
    args:
    - /bin/sh
    - -c
    - sleep 300
    startupProbe:
      exec:
        command:
        - cat
        - /etc/foobar
      periodSeconds: 5
      failureThreshold: 3
[root@master-node ~]# 

Container里压根儿就不存在/etc/foobar文件,startupProbe必然失败,那么Container就会被重启。下一次startupProbe又失败,又重启,直到该错误解决为止。

[root@master-node ~]# kubectl apply -f startup-probe-exec-failure-demo.yaml 
pod/startup-probe-exec-failure-demo created
[root@master-node ~]# 

在另外一个terminal上执行:kubectl get events -w ;会看到类似下述错误:

[root@master-node ~]# kubectl get events -w
....
0s          Normal    Scheduled              pod/startup-probe-exec-failure-demo   Successfully assigned default/startup-probe-exec-failure-demo to node-2
0s          Normal    Pulling                pod/startup-probe-exec-failure-demo   Pulling image "busybox:latest"
0s          Normal    Pulled                 pod/startup-probe-exec-failure-demo   Successfully pulled image "busybox:latest" in 5.199006765s
0s          Normal    Created                pod/startup-probe-exec-failure-demo   Created container startup-probe-exec-failure-demo
0s          Normal    Started                pod/startup-probe-exec-failure-demo   Started container startup-probe-exec-failure-demo
0s          Warning   Unhealthy              pod/startup-probe-exec-failure-demo   Startup probe failed: cat: can't open '/etc/foobar': No such file or directory
0s          Warning   Unhealthy              pod/startup-probe-exec-failure-demo   Startup probe failed: cat: can't open '/etc/foobar': No such file or directory
0s          Warning   Unhealthy              pod/startup-probe-exec-failure-demo   Startup probe failed: cat: can't open '/etc/foobar': No such file or directory
0s          Normal    Killing                pod/startup-probe-exec-failure-demo   Container startup-probe-exec-failure-demo failed startup probe, will be restarted
0s          Normal    Pulling                pod/startup-probe-exec-failure-demo   Pulling image "busybox:latest"
0s          Normal    Pulled                 pod/startup-probe-exec-failure-demo   Successfully pulled image "busybox:latest" in 3.68318295s
0s          Normal    Created                pod/startup-probe-exec-failure-demo   Created container startup-probe-exec-failure-demo
0s          Normal    Started                pod/startup-probe-exec-failure-demo   Started container startup-probe-exec-failure-demo
0s          Warning   Unhealthy              pod/startup-probe-exec-failure-demo   Startup probe failed: cat: can't open '/etc/foobar': No such file or directory
0s          Warning   Unhealthy              pod/startup-probe-exec-failure-demo   Startup probe failed: cat: can't open '/etc/foobar': No such file or directory
0s          Warning   Unhealthy              pod/startup-probe-exec-failure-demo   Startup probe failed: cat: can't open '/etc/foobar': No such file or directory
0s          Normal    Killing                pod/startup-probe-exec-failure-demo   Container startup-probe-exec-failure-demo failed startup probe, will be restarted
0s          Normal    Pulling                pod/startup-probe-exec-failure-demo   Pulling image "busybox:latest"
0s          Normal    Pulled                 pod/startup-probe-exec-failure-demo   Successfully pulled image "busybox:latest" in 3.667381851s
0s          Normal    Created                pod/startup-probe-exec-failure-demo   Created container startup-probe-exec-failure-demo
0s          Normal    Started                pod/startup-probe-exec-failure-demo   Started container startup-probe-exec-failure-demo
0s          Warning   Unhealthy              pod/startup-probe-exec-failure-demo   Startup probe failed: cat: can't open '/etc/foobar': No such file or directory
0s          Warning   Unhealthy              pod/startup-probe-exec-failure-demo   Startup probe failed: cat: can't open '/etc/foobar': No such file or directory
0s          Warning   Unhealthy              pod/startup-probe-exec-failure-demo   Startup probe failed: cat: can't open '/etc/foobar': No such file or directory
0s          Normal    Killing                pod/startup-probe-exec-failure-demo   Container startup-probe-exec-failure-demo failed startup probe, will be restarted

3 启动正常的httpGet类型的startupProbe

[root@master-node ~]# cat startup-probe-httget-succeed-demo.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: startup-probe-httpget-succeed-demo
spec:
  containers:
  - name: startup-probe-httpget-succeed-demo
    image: nginx:latest
    startupProbe:
      httpGet:
        path: /
        port: 80
      periodSeconds: 5
      failureThreshold: 3
[root@master-node ~]# 

启动,并监控pod和events:

[root@master-node ~]# kubectl apply -f startup-probe-httget-succeed-demo.yaml 
pod/startup-probe-httpget-succeed-demo created
[root@master-node ~]# 
​
[root@master-node ~]# kubectl get pods -w
....
startup-probe-httpget-succeed-demo   0/1     Pending            0                0s
startup-probe-httpget-succeed-demo   0/1     Pending            0                0s
startup-probe-httpget-succeed-demo   0/1     ContainerCreating   0                0s
startup-probe-httpget-succeed-demo   0/1     Running             0                19s
startup-probe-httpget-succeed-demo   0/1     Running             0                20s
startup-probe-httpget-succeed-demo   1/1     Running             0                21s
​
​
[root@master-node ~]# kubectl get events -w
LAST SEEN   TYPE      REASON                 OBJECT                                MESSAGE
0s          Normal    Scheduled              pod/startup-probe-httpget-succeed-demo   Successfully assigned default/startup-probe-httpget-succeed-demo to node-2
0s          Normal    Pulling                pod/startup-probe-httpget-succeed-demo   Pulling image "nginx:latest"
0s          Normal    Pulled                 pod/startup-probe-httpget-succeed-demo   Successfully pulled image "nginx:latest" in 17.338565388s
0s          Normal    Created                pod/startup-probe-httpget-succeed-demo   Created container startup-probe-httpget-succeed-demo
0s          Normal    Started                pod/startup-probe-httpget-succeed-demo   Started container startup-probe-httpget-succeed-demo

4 启动失败的httpGet类型的startupProbe

The startupProbe.httpGet field supports optional host, scheme, path, and httpHeaders fields to customize the request that’s made. The host defaults to the pod’s internal IP address; the default scheme is http. The following pod manifest includes a startup probe that makes an HTTPS request with a custom header:

[root@master-node ~]# cat startup-probe-httget-failure-demo.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: startup-probe-httpget-failure-demo
spec:
  containers:
  - name: startup-probe-httpget-failure-demo
    image: nginx:latest
    startupProbe:
      httpGet:
        path: /
        port: 80
        scheme: HTTPS
        httpHeaders: 
        - name: X-Client-Identity
          value: Kubernetes-Startup-Probe
[root@master-node ~]# 

在向NGINX的HTTP请求头了加入了信息。我们通过HTTPS协议,默认情况下NGINX并没开启支持HTTPS,所以startupProbe会报错!

[root@master-node ~]# kubectl apply -f startup-probe-httget-failure-demo.yaml 
pod/startup-probe-httpget-failure-demo created
[root@master-node ~]# 
...
[root@master-node ~]# kubectl get events -w
LAST SEEN   TYPE      REASON                 OBJECT                                   MESSAGE
...
0s          Normal    Scheduled              pod/startup-probe-httpget-failure-demo   Successfully assigned default/startup-probe-httpget-failure-demo to node-1
0s          Normal    Pulling                pod/startup-probe-httpget-failure-demo   Pulling image "nginx:latest"
0s          Normal    Pulled                 pod/startup-probe-httpget-failure-demo   Successfully pulled image "nginx:latest" in 18.746863622s
0s          Normal    Created                pod/startup-probe-httpget-failure-demo   Created container startup-probe-httpget-failure-demo
0s          Normal    Started                pod/startup-probe-httpget-failure-demo   Started container startup-probe-httpget-failure-demo
0s          Warning   Unhealthy              pod/startup-probe-httpget-failure-demo   Startup probe failed: Get "https://10.244.1.58:80/": http: server gave HTTP response to HTTPS client
0s          Warning   Unhealthy              pod/startup-probe-httpget-failure-demo   Startup probe failed: Get "https://10.244.1.58:80/": http: server gave HTTP response to HTTPS client
0s          Warning   EvictionThresholdMet   node/master-node                         Attempting to reclaim ephemeral-storage
0s          Warning   Unhealthy              pod/startup-probe-httpget-failure-demo   Startup probe failed: Get "https://10.244.1.58:80/": http: server gave HTTP response to HTTPS client
0s          Normal    Killing                pod/startup-probe-httpget-failure-demo   Container startup-probe-httpget-failure-demo failed startup probe, will be restarted
0s          Normal    Pulling                pod/startup-probe-httpget-failure-demo   Pulling image "nginx:latest"
0s          Normal    Pulled                 pod/startup-probe-httpget-failure-demo   Successfully pulled image "nginx:latest" in 3.744738269s
0s          Normal    Created                pod/startup-probe-httpget-failure-demo   Created container startup-probe-httpget-failure-demo
0s          Normal    Started                pod/startup-probe-httpget-failure-demo   Started container startup-probe-httpget-failure-demo
0s          Warning   Unhealthy              pod/startup-probe-httpget-failure-demo   Startup probe failed: Get "https://10.244.1.58:80/": http: server gave HTTP response to HTTPS client

5 启动正常的tcpSocket类型的startupProbe

[root@master-node ~]# cat startup-probe-tcpsocket-succeed-demo.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: startup-probe-tcpsocket-succeed-demo
spec:
  containers:
  - name: startup-probe-tcpsocket-succeed-demo
    image: nginx:latest
    startupProbe:
      tcpSocket:
        port: 80
      periodSeconds: 5
      failureThreshold: 3
[root@master-node ~]# kubectl apply -f startup-probe-tcpsocket-succeed-demo.yaml 
pod/startup-probe-tcpsocket-succeed-demo created
[root@master-node ~]# 
​
[root@master-node ~]# kubectl get events -w
LAST SEEN   TYPE      REASON                 OBJECT                                     MESSAGE
...
19s         Normal    Scheduled              pod/startup-probe-tcpsocket-succeed-demo   Successfully assigned default/startup-probe-tcpsocket-succeed-demo to node-2
18s         Normal    Pulling                pod/startup-probe-tcpsocket-succeed-demo   Pulling image "nginx:latest"
14s         Normal    Pulled                 pod/startup-probe-tcpsocket-succeed-demo   Successfully pulled image "nginx:latest" in 3.686109984s
14s         Normal    Created                pod/startup-probe-tcpsocket-succeed-demo   Created container startup-probe-tcpsocket-succeed-demo
14s         Normal    Started                pod/startup-probe-tcpsocket-succeed-demo   Started container startup-probe-tcpsocket-succeed-demo

五startupProbe的实践建议

1 对于那些启动时间比较长的Container,建议配置startupProbe;

2 startupProbe最好和livenessProbe、readiness同时配置;

3 startupProbe的类型和执行的命令,最好和livenessProbe保持一致;这样避免startupProbe执行成功,而livenessProbe失败导致Container重启,而很难去定位和分析问题;

4 startupProbe的periodSeconds*failureThreshold一定要大于Container启动的需要时长。否则,该时间段内,Container不能完成启动,将被重启,下一次又没能在该时长范围内完成启动,再次被重启,陷入死循环;

六 小结和参考

《Kubernetes in Action Second Edition》 Marko luksa Chapter 06 managing the lifecycle of the Pod’s containers P158–P160.

https://www.containiq.com/post/kubernetes-startup-probe

https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/

延伸阅读:如何查看和分析Kubernetes中pod的phase、conditions?它们有什么作用?

Pod中container的状态小结

Pod的restartPolicy和自动重启container的机制是什么?

pod如何实现container的健康运行?

留言