一、docker启动异常表现:
1.状态反复restaring,用命令查看
$docker ps -a
container id image command created status ports names
21c09be88c11 docker.xxxx.cn:5000/xxx-tes/xxx_tes:1.0.6 "/usr/local/tomcat..." 9 days ago restarting (1) less than a second ago xxx10
2.docker日志有明显问题:
$docker logs [容器名/容器id]
二、docker启动异常的可能原因:
2.1.内存不够
docker 启动至少需要2g内存,首先执行free -mh命令查看剩余内存是否足够
直接查看内存
$free -mh
total used free shared buff/cache available
mem: 15g 14g 627m 195m 636m 726m
swap: 0b 0b 0b
分析日志
有时候一瞬间内存过载溢出,导致部分进程被杀死,看起来内存也是够用的,事实上docker还是会反复重启,就需要通过docker日志和系统日志信的息来进一步分析:
分析docker日志
查看docker日志看到内存溢出的信息,要仔细翻阅才能找到信息,并不是在最下面
$docker logs [容器名/容器id]|less
java hotspot(tm) 64-bit server vm warning: info: os::commit_memory(0x0000000769990000, 1449590784, 0) failed; error='cannot allocate memory' (errno=12)
#
# there is insufficient memory for the java runtime environment to continue.
# native memory allocation (malloc) failed to allocate 1449590784 bytes for committing reserved memory.
# an error report file with more information is saved as:
# //hs_err_pid1.log
java hotspot(tm) 64-bit server vm warning: info: os::commit_memory(0x0000000769990000, 1449590784, 0) failed; error='cannot allocate memory' (errno=12)
#
# there is insufficient memory for the java runtime environment to continue.
# native memory allocation (malloc) failed to allocate 1449590784 bytes for committing reserved memory.
# an error report file with more information is saved as:
# /tmp/hs_err_pid1.log
java hotspot(tm) 64-bit server vm warning: info: os::commit_memory(0x0000000769990000, 1449590784, 0) failed; error='cannot allocate memory' (errno=12)
#
# there is insufficient memory for the java runtime environment to continue.
# native memory allocation (malloc) failed to allocate 1449590784 bytes for committing reserved memory.
# can not save log file, dump to screen..
#
# there is insufficient memory for the java runtime environment to continue.
# native memory allocation (malloc) failed to allocate 1449590784 bytes for committing reserved memory.
# possible reasons:
# the system is out of physical ram or swap space
# in 32 bit mode, the process size limit was hit
# possible solutions:
# reduce memory load on the system
# increase physical memory or swap space
# check if swap backing store is full
# use 64 bit java on a 64 bit os
# decrease java heap size (-xmx/-xms)
# decrease number of java threads
# decrease java thread stack sizes (-xss)
# set larger code cache with -xx:reservedcodecachesize=
# this output file may be truncated or incomplete.
#
# out of memory error (os_linux.cpp:2756), pid=1, tid=140325689620224
#
# jre version: (7.0_79-b15) (build )
# java vm: java hotspot(tm) 64-bit server vm (24.79-b02 mixed mode linux-amd64 compressed oops)
# core dump written. default location: //core or core.1
#
分析系统日志
查看系统日志,发现有大量由于内存溢出,进程被杀死的记录
$grep -i 'out of memory' /var/log/messages
apr 7 10:04:02 centos106 kernel: out of memory: kill process 1192 (java) score 54 or sacrifice child
apr 7 10:08:00 centos106 kernel: out of memory: kill process 2301 (java) score 54 or sacrifice child
apr 7 10:09:59 centos106 kernel: out of memory: kill process 28145 (java) score 52 or sacrifice child
apr 7 10:20:40 centos106 kernel: out of memory: kill process 2976 (java) score 54 or sacrifice child
apr 7 10:21:08 centos106 kernel: out of memory: kill process 3577 (java) score 47 or sacrifice child
apr 7 10:21:08 centos106 kernel: out of memory: kill process 3631 (java) score 47 or sacrifice child
apr 7 10:21:08 centos106 kernel: out of memory: kill process 3634 (java) score 47 or sacrifice child
apr 7 10:21:08 centos106 kernel: out of memory: kill process 3640 (java) score 47 or sacrifice child
apr 7 10:21:08 centos106 kernel: out of memory: kill process 3654 (java) score 47 or sacrifice child
apr 7 10:27:27 centos106 kernel: out of memory: kill process 6998 (java) score 51 or sacrifice child
apr 7 10:27:28 centos106 kernel: out of memory: kill process 7027 (java) score 52 or sacrifice child
apr 7 10:28:10 centos106 kernel: out of memory: kill process 7571 (java) score 42 or sacrifice child
apr 7 10:28:10 centos106 kernel: out of memory: kill process 7586 (java) score 42 or sacrifice child
2.2.端口冲突
该docker监听端口已经被其他进程占用,一般此种问题容易出现在新部署的服务,或在原有机器上部署新的后台服务,所以在部署之前应该执行命令检查端口是否已经被占用,如果上线后发现占有则应改为可用端口再重启之。
检查命令: $netstat -nltp|grep [规划的端口号]
三、对策
3.1.内存不够的对策:
对策1:
3.1.1 saltstack的minion在运行过久之后,可能占用大量内存,需要将其重启。重启命令可能有时并不起作用。主要检查运行状态,如果未成功停止,则重新重启;
对策2:
3.2.2 elk日志收集程序或者其他java进程占用过高,用top和ps命令排查,谨慎确定进程的作用,在确保不影响业务的情况下,停止相关进程;
对策3:
释放被占用的内存(buff/cache):
$sync #将内存数据写入磁盘
$echo 3 > /proc/sys/vm/drop_caches #释放被占用的内存
对策4:
有时候并不是buff/cache过高导致内存不够用,确实是被很多必要的进程消耗掉了内存,那就需要从机器资源分配使用的层面去考虑和解决了。
3.2 端口冲突的对策
对策1:
一般此种问题容易出现在新部署的服务,或在原有机器上部署新的后台服务,所以在部署之前应该执行命令检查端口是否已经被占用,如果上线后发现占有则应改为可用端口再重启之。
检查命令: $netstat -nltp|grep [规划的端口号]