• 1. 查看系统Event事件
    • 1.1. Pod
    • 1.2. NODE
    • 1.3. RC
    • 1.4. NAMESPACE
    • 1.5. Service
  • 2. 查看容器日志
  • 3. 查看k8s服务日志
    • 3.1. journalctl
    • 3.2. 日志文件
  • 4. 常见问题
    • 4.1. Pod状态一直为Pending
    • 4.2. Pod创建后不断重启

    title: “[Kubernetes] Kubernetes集群问题排查”
    catalog: true
    date: 2017-09-20 10:50:57
    type: “categories”
    subtitle:
    header-img:
    tags:

    • Kubernetes
      catagories:
    • Kubernetes

    1. 查看系统Event事件

    1. kubectl describe pod <PodName> --namespace=<NAMESPACE>

    该命令可以显示Pod创建时的配置定义、状态等信息和最近的Event事件,事件信息可用于排错。例如当Pod状态为Pending,可通过查看Event事件确认原因,一般原因有几种:

    • 没有可用的Node可调度
    • 开启了资源配额管理并且当前Pod的目标节点上恰好没有可用的资源
    • 正在下载镜像(镜像拉取耗时太久)或镜像下载失败。

    kubectl describe还可以查看其它k8s对象:NODE,RC,Service,Namespace,Secrets。

    1.1. Pod

    1. kubectl describe pod <PodName> --namespace=<NAMESPACE>

    以下是容器的启动命令非阻塞式导致容器挂掉,被k8s频繁重启所产生的事件。

    1. kubectl describe pod <PodName> --namespace=<NAMESPACE>
    2. Events:
    3. FirstSeen LastSeen Count From SubobjectPath Reason Message
    4. ───────── ──────── ───── ──── ───────────── ────── ───────
    5. 7m 7m 1 {scheduler } Scheduled Successfully assigned yangsc-1-0-0-index0 to 10.8.216.19
    6. 7m 7m 1 {kubelet 10.8.216.19} containers{infra} Pulled Container image "gcr.io/kube-system/pause:0.8.0" already present on machine
    7. 7m 7m 1 {kubelet 10.8.216.19} containers{infra} Created Created with docker id 84f133c324d0
    8. 7m 7m 1 {kubelet 10.8.216.19} containers{infra} Started Started with docker id 84f133c324d0
    9. 7m 7m 1 {kubelet 10.8.216.19} containers{yangsc0} Started Started with docker id 3f9f82abb145
    10. 7m 7m 1 {kubelet 10.8.216.19} containers{yangsc0} Created Created with docker id 3f9f82abb145
    11. 7m 7m 1 {kubelet 10.8.216.19} containers{yangsc0} Created Created with docker id fb112e4002f4
    12. 7m 7m 1 {kubelet 10.8.216.19} containers{yangsc0} Started Started with docker id fb112e4002f4
    13. 6m 6m 1 {kubelet 10.8.216.19} containers{yangsc0} Created Created with docker id 613b119d4474
    14. 6m 6m 1 {kubelet 10.8.216.19} containers{yangsc0} Started Started with docker id 613b119d4474
    15. 6m 6m 1 {kubelet 10.8.216.19} containers{yangsc0} Created Created with docker id 25cb68d1fd3d
    16. 6m 6m 1 {kubelet 10.8.216.19} containers{yangsc0} Started Started with docker id 25cb68d1fd3d
    17. 5m 5m 1 {kubelet 10.8.216.19} containers{yangsc0} Started Started with docker id 7d9ee8610b28
    18. 5m 5m 1 {kubelet 10.8.216.19} containers{yangsc0} Created Created with docker id 7d9ee8610b28
    19. 3m 3m 1 {kubelet 10.8.216.19} containers{yangsc0} Started Started with docker id 88b9e8d582dd
    20. 3m 3m 1 {kubelet 10.8.216.19} containers{yangsc0} Created Created with docker id 88b9e8d582dd
    21. 7m 1m 7 {kubelet 10.8.216.19} containers{yangsc0} Pulling Pulling image "gcr.io/test/tcp-hello:1.0.0"
    22. 1m 1m 1 {kubelet 10.8.216.19} containers{yangsc0} Started Started with docker id 089abff050e7
    23. 1m 1m 1 {kubelet 10.8.216.19} containers{yangsc0} Created Created with docker id 089abff050e7
    24. 7m 1m 7 {kubelet 10.8.216.19} containers{yangsc0} Pulled Successfully pulled image "gcr.io/test/tcp-hello:1.0.0"
    25. 6m 7s 34 {kubelet 10.8.216.19} containers{yangsc0} Backoff Back-off restarting failed docker container

    1.2. NODE

    1. kubectl describe node 10.8.216.20
    1. [root@FC-43745A-10 ~]# kubectl describe node 10.8.216.20
    2. Name: 10.8.216.20
    3. Labels: kubernetes.io/hostname=10.8.216.20,namespace/bcs-cc=true,namespace/myview=true
    4. CreationTimestamp: Mon, 17 Apr 2017 11:32:52 +0800
    5. Phase:
    6. Conditions:
    7. Type Status LastHeartbeatTime LastTransitionTime Reason Message
    8. ──── ────── ───────────────── ────────────────── ────── ───────
    9. Ready True Fri, 18 Aug 2017 09:38:33 +0800 Tue, 02 May 2017 17:40:58 +0800 KubeletReady kubelet is posting ready status
    10. OutOfDisk False Fri, 18 Aug 2017 09:38:33 +0800 Mon, 17 Apr 2017 11:31:27 +0800 KubeletHasSufficientDisk kubelet has sufficient disk space available
    11. Addresses: 10.8.216.20,10.8.216.20
    12. Capacity:
    13. cpu: 32
    14. memory: 67323039744
    15. pods: 40
    16. System Info:
    17. Machine ID: 723bafc7f6764022972b3eae1ce6b198
    18. System UUID: 4C4C4544-0042-4210-8044-C3C04F595631
    19. Boot ID: da01f2e3-987a-425a-9ca7-1caaec35d1e5
    20. Kernel Version: 3.10.0-327.28.3.el7.x86_64
    21. OS Image: CentOS Linux 7 (Core)
    22. Container Runtime Version: docker://1.13.1
    23. Kubelet Version: v1.1.1-xxx2-13.1+79c90c68bfb72f-dirty
    24. Kube-Proxy Version: v1.1.1-xxx2-13.1+79c90c68bfb72f-dirty
    25. ExternalID: 10.8.216.20
    26. Non-terminated Pods: (6 in total)
    27. Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
    28. ───────── ──── ──────────── ────────── ─────────────── ─────────────
    29. bcs-cc bcs-cc-api-0-0-1364-index0 1 (3%) 1 (3%) 4294967296 (6%) 4294967296 (6%)
    30. bcs-cc bcs-cc-api-0-0-1444-index0 1 (3%) 1 (3%) 4294967296 (6%) 4294967296 (6%)
    31. fw fw-demo2-0-0-1519-index0 1 (3%) 1 (3%) 4294967296 (6%) 4294967296 (6%)
    32. myview myview-api-0-0-1362-index0 1 (3%) 1 (3%) 4294967296 (6%) 4294967296 (6%)
    33. myview myview-api-0-0-1442-index0 1 (3%) 1 (3%) 4294967296 (6%) 4294967296 (6%)
    34. qa-ts-dna ts-dna-console3-0-0-1434-index0 1 (3%) 1 (3%) 4294967296 (6%) 4294967296 (6%)
    35. Allocated resources:
    36. (Total limits may be over 100%, i.e., overcommitted. More info: http://releases.k8s.io/HEAD/docs/user-guide/compute-resources.md)
    37. CPU Requests CPU Limits Memory Requests Memory Limits
    38. ──────────── ────────── ─────────────── ─────────────
    39. 6 (18%) 6 (18%) 25769803776 (38%) 25769803776 (38%)
    40. No events.

    1.3. RC

    1. kubectl describe rc mytest-1-0-0 --namespace=test
    1. [root@FC-43745A-10 ~]# kubectl describe rc mytest-1-0-0 --namespace=test
    2. Name: mytest-1-0-0
    3. Namespace: test
    4. Image(s): gcr.io/test/mywebcalculator:1.0.1
    5. Selector: app=mytest,appVersion=1.0.0
    6. Labels: app=mytest,appVersion=1.0.0,env=ts,zone=inner
    7. Replicas: 1 current / 1 desired
    8. Pods Status: 1 Running / 0 Waiting / 0 Succeeded / 0 Failed
    9. No volumes.
    10. Events:
    11. FirstSeen LastSeen Count From SubobjectPath Reason Message
    12. ───────── ──────── ───── ──── ───────────── ────── ───────
    13. 20h 19h 9 {replication-controller } FailedCreate Error creating: Pod "mytest-1-0-0-index0" is forbidden: limited to 10 pods
    14. 20h 17h 7 {replication-controller } FailedCreate Error creating: pods "mytest-1-0-0-index0" already exists
    15. 20h 17h 4 {replication-controller } SuccessfulCreate Created pod: mytest-1-0-0-index0

    1.4. NAMESPACE

    1. kubectl describe namespace test
    1. [root@FC-43745A-10 ~]# kubectl describe namespace test
    2. Name: test
    3. Labels: <none>
    4. Status: Active
    5. Resource Quotas
    6. Resource Used Hard
    7. --- --- ---
    8. cpu 5 20
    9. memory 1342177280 53687091200
    10. persistentvolumeclaims 0 10
    11. pods 4 10
    12. replicationcontrollers 8 20
    13. resourcequotas 1 1
    14. secrets 3 10
    15. services 8 20
    16. No resource limits.

    1.5. Service

    1. kubectl describe service xxx-containers-1-1-0 --namespace=test
    1. [root@FC-43745A-10 ~]# kubectl describe service xxx-containers-1-1-0 --namespace=test
    2. Name: xxx-containers-1-1-0
    3. Namespace: test
    4. Labels: app=xxx-containers,appVersion=1.1.0,env=ts,zone=inner
    5. Selector: app=xxx-containers,appVersion=1.1.0
    6. Type: ClusterIP
    7. IP: 10.254.46.42
    8. Port: port-dna-tcp-35913 35913/TCP
    9. Endpoints: 10.0.92.17:35913
    10. Port: port-l7-tcp-8080 8080/TCP
    11. Endpoints: 10.0.92.17:8080
    12. Session Affinity: None
    13. No events.

    2. 查看容器日志

    1、查看指定pod的日志

    1. kubectl logs <pod_name>
    2. kubectl logs -f <pod_name> #类似tail -f的方式查看

    2、查看上一个pod的日志

    1. kubectl logs -p <pod_name>

    3、查看指定pod中指定容器的日志

    1. kubectl logs <pod_name> -c <container_name>

    4、kubectl logs —help

    1. [root@node5 ~]# kubectl logs --help
    2. Print the logs for a container in a pod. If the pod has only one container, the container name is optional.
    3. Usage:
    4. kubectl logs [-f] [-p] POD [-c CONTAINER] [flags]
    5. Aliases:
    6. logs, log
    7. Examples:
    8. # Return snapshot logs from pod nginx with only one container
    9. $ kubectl logs nginx
    10. # Return snapshot of previous terminated ruby container logs from pod web-1
    11. $ kubectl logs -p -c ruby web-1
    12. # Begin streaming the logs of the ruby container in pod web-1
    13. $ kubectl logs -f -c ruby web-1
    14. # Display only the most recent 20 lines of output in pod nginx
    15. $ kubectl logs --tail=20 nginx
    16. # Show all logs from pod nginx written in the last hour
    17. $ kubectl logs --since=1h nginx

    3. 查看k8s服务日志

    3.1. journalctl

    在Linux系统上systemd系统来管理kubernetes服务,并且journal系统会接管服务程序的输出日志,可以通过systemctl status 或journalctl -u -f来查看kubernetes服务的日志。

    其中kubernetes组件包括:

    k8s组件 涉及日志内容 备注
    kube-apiserver
    kube-controller-manager Pod扩容相关或RC相关
    kube-scheduler Pod扩容相关或RC相关
    kubelet Pod生命周期相关:创建、停止等
    etcd

    3.2. 日志文件

    也可以通过指定日志存放目录来保存和查看日志

    • —logtostderr=false:不输出到stderr
    • —log-dir=/var/log/kubernetes:日志的存放目录
    • —alsologtostderr=false:设置为true表示日志输出到文件也输出到stderr
    • —v=0:glog的日志级别
    • —vmodule=gfs=2,test=4:glog基于模块的详细日志级别

    4. 常见问题

    4.1. Pod状态一直为Pending

    1. kubectl describe <pod_name> --namespace=<NAMESPACE>

    查看该POD的事件。

    • 正在下载镜像但拉取不下来(镜像拉取耗时太久)[一般都是该原因]
    • 没有可用的Node可调度
    • 开启了资源配额管理并且当前Pod的目标节点上恰好没有可用的资源

    解决方法:

    1. 查看该POD所在宿主机与镜像仓库之间的网络是否有问题,可以手动拉取镜像
    2. 删除POD实例,让POD调度到别的宿主机上

    4.2. Pod创建后不断重启

    kubectl get pods中Pod状态一会running,一会不是,且RESTARTS次数不断增加。

    一般原因为容器启动命令不是阻塞式命令,导致容器运行后马上退出。

    非阻塞式命令:

    • 本身CMD指定的命令就是非阻塞式命令
    • 将服务启动方式设置为后台运行

    解决方法:

    1、将命令改为阻塞式命令(前台运行),例如:zkServer.sh start-foreground

    2、java运行程序的启动脚本将 nohup xxx &的nobup和&去掉,例如:

    1. nohup JAVA_HOME/bin/java JAVA_OPTS -cp $CLASSPATH com.cnc.open.processor.Main &

    改为:

    1. JAVA_HOME/bin/java JAVA_OPTS -cp $CLASSPATH com.cnc.open.processor.Main

    文章参考《Kubernetes权威指南》