记录一则由umask=0027导致的docker故障案例
一 故障现象
[root@peixun71-middleware ~]# uname -rm 3.10.0-1160.62.1.el7.x86_64 x86_64 [root@peixun71-middleware ~]# cat /etc/redhat-release CentOS Linux release 7.9.2009 (Core) [root@peixun71-middleware ~]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 4ab79d4a7310 registry.cn-hangzhou.aliyuncs.com/onlyou/rocketmq:4.2.0 "sh mqbroker -c /o..." 2 hours ago Restarting (255) 24 minutes ago rmqbroker ce6b77e9f9d7 registry.cn-hangzhou.aliyuncs.com/onlyou/rocketmq:4.2.0 "sh mqnamesrv" 2 hours ago Up 2 hours 10909/tcp, 0.0.0.0:9876->9876/tcp, 10911/tcp rmqnamesrv 3f0cb32edcc0 10.248.206.105:5000/redis:latest "docker-entrypoint..." 7 days ago Up 2 hours redis 7956f7770cf4 memcached "docker-entrypoint..." 6 weeks ago Up 2 hours memcached fca6aac1f8ab zookeeper:3.4.11 "/docker-entrypoin..." 6 weeks ago Up 2 hours zookeeper [root@peixun71-middleware ~]# docker logs -f rmqbroker|head java.io.FileNotFoundException: /opt/rocketmq-4.2.0/conf/broker.conf (Permission denied) at java.io.FileInputStream.open0(Native Method) at java.io.FileInputStream.open(FileInputStream.java:195) at java.io.FileInputStream.<init>(FileInputStream.java:138) at java.io.FileInputStream.<init>(FileInputStream.java:93) at org.apache.rocketmq.broker.BrokerStartup.createBrokerController(BrokerStartup.java:119) at org.apache.rocketmq.broker.BrokerStartup.main(BrokerStartup.java:56) java.io.FileNotFoundException: /opt/rocketmq-4.2.0/conf/broker.conf (Permission denied) at java.io.FileInputStream.open0(Native Method) at java.io.FileInputStream.open(FileInputStream.java:195) [root@peixun71-middleware ~]#
在CentOS 7.9的系统上,以root用户启动rocketmq的broker容器之后,出现报错,/opt/rocketmq-4.2.0/conf/broker.conf (Permission denied)。配置和启动container的命令是:
#1创建配置路径 mkdir -p /data/rocketmq/data/namesrv/logs mkdir -p /data/rocketmq/data/namesrv/stone mkdir -p /data/rocketmq/data/broker/logs mkdir -p /data/rocketmq/data/broker/store mkdir -p /data/rocketmq/conf #2创建配置文件 cat <<EOF >/data/rocketmq/conf/broker.conf brokerClusterName = DefaultCluster brokerName = broker-a brokerId = 0 deleteWhen = 04 fileReservedTime = 48 brokerRole = ASYNC_MASTER flushDiskType = ASYNC_FLUSH brokerIP1 = your_host_ip_addr EOF #启动rmqnamesrv docker run -itd -p 9876:9876 --restart=always -v /data/rocketmq/data/namesrv/logs:/root/logs -e "TZ=Asia/Shanghai" -v /data/rocketmq/data/namesrv/store:/root/store --name rmqnamesrv -e "MAX_POSSIBLE_HEAP=100000000" registry.cn-hangzhou.aliyuncs.com/onlyou/rocketmq:4.2.0 sh mqnamesrv #4启动rmqbroker docker run -itd --restart=always -p 10911:10911 -p 10909:10909 -v /data/rocketmq/data/broker/logs:/root/logs -v /data/rocketmq/data/broker/store:/root/store -e "TZ=Asia/Shanghai" -v /data/rocketmq/conf/broker.conf:/opt/rocketmq-4.2.0/conf/broker.conf --name rmqbroker --link rmqnamesrv:namesrv -e "NAMESRV_ADDR=namesrv:9876" -e "MAX_POSSIBLE_HEAP=2000000000" registry.cn-hangzhou.aliyuncs.com/onlyou/rocketmq:4.2.0 sh mqbroker -c /opt/rocketmq-4.2.0/conf/broker.conf
诡异的地方是,同样的命令,同样的配置,都是以root用户去启动的container,在其它机器上执行启动container却没有报错。
二 分析原因
反反复复的着实花了不少时间和精力,在分析这个问题的原因上。
后来,在对比其它可以正常启动这2个container的环境和这个报错的机器上的挂载的配置文件上时,发现了问题的端倪:
#正常的机器 [root@middleware1 ~]# docker ps|grep ro 5c40feb3dcb5 registry.cn-hangzhou.aliyuncs.com/onlyou/rocketmq-console-ng "sh -c 'java $JAVA..." 2 months ago Up 7 days 0.0.0.0:18080->8080/tcp rocket-console a6d9251d18d9 registry.cn-hangzhou.aliyuncs.com/onlyou/rocketmq:4.2.0 "sh mqbroker -c /o..." 2 months ago Up 7 days 0.0.0.0:10909->10909/tcp, 9876/tcp, 0.0.0.0:10911->10911/tcp rmqbroker 0ccb86360d98 registry.cn-hangzhou.aliyuncs.com/onlyou/rocketmq:4.2.0 "sh mqnamesrv" 2 months ago Up 7 days 10909/tcp, 0.0.0.0:9876->9876/tcp, 10911/tcp rmqnamesrv [root@middleware1 ~]# ll /data/rocketmq/conf/ total 4 -rw-r--r-- 1 root root 189 Mar 4 11:44 broker.conf [root@middleware1 ~]# #这个报错的机器 [root@peixun71-middleware ~]# docker ps|grep ro 4ab79d4a7310 registry.cn-hangzhou.aliyuncs.com/onlyou/rocketmq:4.2.0 "sh mqbroker -c /o..." 2 hours ago Restarting (255) 38 minutes ago rmqbroker ce6b77e9f9d7 registry.cn-hangzhou.aliyuncs.com/onlyou/rocketmq:4.2.0 "sh mqnamesrv" 2 hours ago Up 2 hours 10909/tcp, 0.0.0.0:9876->9876/tcp, 10911/tcp rmqnamesrv [root@peixun71-middleware ~]# ll /data/rocketmq/conf/ total 4 -rw-r----- 1 root root 188 May 24 11:50 broker.conf [root@peixun71-middleware ~]#
原来是/data/rocketmq/conf/broker.conf的权限不一样导致的。为什么会这样呢?
进入到正常运行的container里,深入查看一下:
[root@middleware1 ~]# docker exec -it rmqbroker /bin/bash [rocketmq@a6d9251d18d9 bin]$ ls -l /opt/rocketmq-4.2.0/conf/broker.conf -rw-r--r-- 1 root root 189 Mar 4 11:44 /opt/rocketmq-4.2.0/conf/broker.conf [rocketmq@a6d9251d18d9 bin]$ ps aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND rocketmq 1 0.0 0.0 11680 1412 ? Ss+ May17 0:00 sh mqbroker -c /opt/rocketmq-4.2.0/conf/broker.conf rocketmq 8 0.0 0.0 11680 1480 ? S+ May17 0:00 sh /opt/rocketmq-4.2.0/bin/runbroker.sh org.apache.rocketmq.broker.BrokerStartup -c /opt/rocketmq-4.2.0/conf/b rocketmq 11 1.6 3.5 12415856 2298948 ? Sl+ May17 174:13 /bin/java -server -Xms2000000000 -Xmx2000000000 -Xmn1000000000 -XX:+UseG1GC -XX:G1HeapRegionSize=16m -XX:G1Res rocketmq 301 0.0 0.0 11820 1936 ? Ss 14:20 0:00 /bin/bash rocketmq 322 0.0 0.0 51740 1736 ? R+ 14:21 0:00 ps aux [rocketmq@a6d9251d18d9 bin]$ id uid=3000(rocketmq) gid=3000(rocketmq) groups=3000(rocketmq) [rocketmq@a6d9251d18d9 bin]$
原来这个配置文件/data/rocketmq/conf/broker.conf挂载到container里的/opt/rocketmq-4.2.0/conf/broker.conf之后,其权限依然是644。而该container是以rocketmq用户启动的,该用户自然就没有权限读取/opt/rocketmq-4.2.0/conf/broker.conf配置文件了。那么,container就报错了。
问题来了:为什么都是以root用户创建出来的文件/data/rocketmq/conf/broker.conf,在不同机器上创建出的文件的权限却不一样呢?
原来是不同机器的umask不一致导致的:
#正常的机器 [root@middleware1 ~]# umask 0022 [root@middleware1 ~]# #这个报错的机器 [root@peixun71-middleware ~]# umask 0027 [root@peixun71-middleware ~]#
三 小结
该问题解决起来,非常简单,修改一下umask或者该一下配置文件的权限就可以了。但是,分析问题的过程,确实耗费了我不少精力;
说起来,问题还是出在基础性的问题上了。
关于Unix、Linux文件权限的问题,我之前曾经还写过两篇文章;
普通用户新建的文件和目录的权限为什么和root创建的不一样?
Linux上文件特殊权限位SUID,SGID,SBIT的使用说明
顺便,再次复习了一下之前写过的内容。毕竟,这一块儿知识,平时用的比较少,关注的也相对少一些。但是,这些都是solids的fundamentals呀。