Docker,  Linux

记录一则由umask=0027导致的docker故障案例

一 故障现象

[root@peixun71-middleware ~]# uname -rm
3.10.0-1160.62.1.el7.x86_64 x86_64
[root@peixun71-middleware ~]# cat /etc/redhat-release 
CentOS Linux release 7.9.2009 (Core)
[root@peixun71-middleware ~]# docker ps
CONTAINER ID        IMAGE                                                     COMMAND                  CREATED             STATUS                            PORTS                                          NAMES
4ab79d4a7310        registry.cn-hangzhou.aliyuncs.com/onlyou/rocketmq:4.2.0   "sh mqbroker -c /o..."   2 hours ago         Restarting (255) 24 minutes ago                                                  rmqbroker
ce6b77e9f9d7        registry.cn-hangzhou.aliyuncs.com/onlyou/rocketmq:4.2.0   "sh mqnamesrv"           2 hours ago         Up 2 hours                        10909/tcp, 0.0.0.0:9876->9876/tcp, 10911/tcp   rmqnamesrv
3f0cb32edcc0        10.248.206.105:5000/redis:latest                          "docker-entrypoint..."   7 days ago          Up 2 hours                                                                       redis
7956f7770cf4        memcached                                                 "docker-entrypoint..."   6 weeks ago         Up 2 hours                                                                       memcached
fca6aac1f8ab        zookeeper:3.4.11                                          "/docker-entrypoin..."   6 weeks ago         Up 2 hours                                                                       zookeeper
[root@peixun71-middleware ~]# docker logs -f rmqbroker|head
java.io.FileNotFoundException: /opt/rocketmq-4.2.0/conf/broker.conf (Permission denied)
        at java.io.FileInputStream.open0(Native Method)
        at java.io.FileInputStream.open(FileInputStream.java:195)
        at java.io.FileInputStream.<init>(FileInputStream.java:138)
        at java.io.FileInputStream.<init>(FileInputStream.java:93)
        at org.apache.rocketmq.broker.BrokerStartup.createBrokerController(BrokerStartup.java:119)
        at org.apache.rocketmq.broker.BrokerStartup.main(BrokerStartup.java:56)
java.io.FileNotFoundException: /opt/rocketmq-4.2.0/conf/broker.conf (Permission denied)
        at java.io.FileInputStream.open0(Native Method)
        at java.io.FileInputStream.open(FileInputStream.java:195)
[root@peixun71-middleware ~]# 

在CentOS 7.9的系统上,以root用户启动rocketmq的broker容器之后,出现报错,/opt/rocketmq-4.2.0/conf/broker.conf (Permission denied)。配置和启动container的命令是:

#1创建配置路径
mkdir -p /data/rocketmq/data/namesrv/logs
mkdir -p /data/rocketmq/data/namesrv/stone
mkdir -p /data/rocketmq/data/broker/logs
mkdir -p /data/rocketmq/data/broker/store
mkdir -p /data/rocketmq/conf
​
#2创建配置文件
cat <<EOF >/data/rocketmq/conf/broker.conf
brokerClusterName = DefaultCluster
brokerName = broker-a
brokerId = 0
deleteWhen = 04
fileReservedTime = 48
brokerRole = ASYNC_MASTER
flushDiskType = ASYNC_FLUSH
brokerIP1 = your_host_ip_addr
EOF
​
#启动rmqnamesrv
docker run -itd -p 9876:9876 --restart=always -v /data/rocketmq/data/namesrv/logs:/root/logs -e "TZ=Asia/Shanghai" -v /data/rocketmq/data/namesrv/store:/root/store --name rmqnamesrv -e "MAX_POSSIBLE_HEAP=100000000" registry.cn-hangzhou.aliyuncs.com/onlyou/rocketmq:4.2.0 sh mqnamesrv
​
#4启动rmqbroker
docker run -itd --restart=always -p 10911:10911 -p 10909:10909 -v /data/rocketmq/data/broker/logs:/root/logs -v /data/rocketmq/data/broker/store:/root/store -e "TZ=Asia/Shanghai" -v /data/rocketmq/conf/broker.conf:/opt/rocketmq-4.2.0/conf/broker.conf --name rmqbroker --link rmqnamesrv:namesrv -e "NAMESRV_ADDR=namesrv:9876" -e "MAX_POSSIBLE_HEAP=2000000000" registry.cn-hangzhou.aliyuncs.com/onlyou/rocketmq:4.2.0 sh mqbroker -c /opt/rocketmq-4.2.0/conf/broker.conf

诡异的地方是,同样的命令,同样的配置,都是以root用户去启动的container,在其它机器上执行启动container却没有报错。

二 分析原因

反反复复的着实花了不少时间和精力,在分析这个问题的原因上。

后来,在对比其它可以正常启动这2个container的环境和这个报错的机器上的挂载的配置文件上时,发现了问题的端倪:

#正常的机器
[root@middleware1 ~]# docker ps|grep ro
5c40feb3dcb5        registry.cn-hangzhou.aliyuncs.com/onlyou/rocketmq-console-ng   "sh -c 'java $JAVA..."   2 months ago        Up 7 days           0.0.0.0:18080->8080/tcp                                        rocket-console
a6d9251d18d9        registry.cn-hangzhou.aliyuncs.com/onlyou/rocketmq:4.2.0        "sh mqbroker -c /o..."   2 months ago        Up 7 days           0.0.0.0:10909->10909/tcp, 9876/tcp, 0.0.0.0:10911->10911/tcp   rmqbroker
0ccb86360d98        registry.cn-hangzhou.aliyuncs.com/onlyou/rocketmq:4.2.0        "sh mqnamesrv"           2 months ago        Up 7 days           10909/tcp, 0.0.0.0:9876->9876/tcp, 10911/tcp                   rmqnamesrv
[root@middleware1 ~]# ll /data/rocketmq/conf/
total 4
-rw-r--r-- 1 root root 189 Mar  4 11:44 broker.conf
[root@middleware1 ~]# 
​
#这个报错的机器
[root@peixun71-middleware ~]# docker ps|grep ro
4ab79d4a7310        registry.cn-hangzhou.aliyuncs.com/onlyou/rocketmq:4.2.0   "sh mqbroker -c /o..."   2 hours ago         Restarting (255) 38 minutes ago                                                  rmqbroker
ce6b77e9f9d7        registry.cn-hangzhou.aliyuncs.com/onlyou/rocketmq:4.2.0   "sh mqnamesrv"           2 hours ago         Up 2 hours                        10909/tcp, 0.0.0.0:9876->9876/tcp, 10911/tcp   rmqnamesrv
[root@peixun71-middleware ~]# ll /data/rocketmq/conf/
total 4
-rw-r----- 1 root root 188 May 24 11:50 broker.conf
[root@peixun71-middleware ~]# 

原来是/data/rocketmq/conf/broker.conf的权限不一样导致的。为什么会这样呢?

进入到正常运行的container里,深入查看一下:

[root@middleware1 ~]# docker exec -it rmqbroker /bin/bash
[rocketmq@a6d9251d18d9 bin]$ ls -l /opt/rocketmq-4.2.0/conf/broker.conf 
-rw-r--r-- 1 root root 189 Mar  4 11:44 /opt/rocketmq-4.2.0/conf/broker.conf
[rocketmq@a6d9251d18d9 bin]$ ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
rocketmq     1  0.0  0.0  11680  1412 ?        Ss+  May17   0:00 sh mqbroker -c /opt/rocketmq-4.2.0/conf/broker.conf
rocketmq     8  0.0  0.0  11680  1480 ?        S+   May17   0:00 sh /opt/rocketmq-4.2.0/bin/runbroker.sh org.apache.rocketmq.broker.BrokerStartup -c /opt/rocketmq-4.2.0/conf/b
rocketmq    11  1.6  3.5 12415856 2298948 ?    Sl+  May17 174:13 /bin/java -server -Xms2000000000 -Xmx2000000000 -Xmn1000000000 -XX:+UseG1GC -XX:G1HeapRegionSize=16m -XX:G1Res
rocketmq   301  0.0  0.0  11820  1936 ?        Ss   14:20   0:00 /bin/bash
rocketmq   322  0.0  0.0  51740  1736 ?        R+   14:21   0:00 ps aux
[rocketmq@a6d9251d18d9 bin]$ id
uid=3000(rocketmq) gid=3000(rocketmq) groups=3000(rocketmq)
[rocketmq@a6d9251d18d9 bin]$ 

原来这个配置文件/data/rocketmq/conf/broker.conf挂载到container里的/opt/rocketmq-4.2.0/conf/broker.conf之后,其权限依然是644。而该container是以rocketmq用户启动的,该用户自然就没有权限读取/opt/rocketmq-4.2.0/conf/broker.conf配置文件了。那么,container就报错了。

问题来了:为什么都是以root用户创建出来的文件/data/rocketmq/conf/broker.conf,在不同机器上创建出的文件的权限却不一样呢?

原来是不同机器的umask不一致导致的:

#正常的机器
[root@middleware1 ~]# umask
0022
[root@middleware1 ~]#
​
#这个报错的机器
[root@peixun71-middleware ~]# umask
0027
[root@peixun71-middleware ~]# 

三 小结

该问题解决起来,非常简单,修改一下umask或者该一下配置文件的权限就可以了。但是,分析问题的过程,确实耗费了我不少精力;

说起来,问题还是出在基础性的问题上了。

关于Unix、Linux文件权限的问题,我之前曾经还写过两篇文章;

普通用户新建的文件和目录的权限为什么和root创建的不一样?

Linux上文件特殊权限位SUID,SGID,SBIT的使用说明

顺便,再次复习了一下之前写过的内容。毕竟,这一块儿知识,平时用的比较少,关注的也相对少一些。但是,这些都是solids的fundamentals呀。

留言