ALL THINGS KUBERNETES

Debugging Kubernetes Applications: How To Guide

Do your Kubernetes pods or deployments sometimes crash or begin to behave in a way not expected?

Without knowledge of how to inspect and debug them, Kubernetes developers and administrators will struggle to identify the reasons for the application failure. Fortunately, Kubernetes ships with powerful built-in debugging tools that allow inspecting cluster-level, node-level, and application-level issues.

In this article, we focus on several application-level issues you might face when you create your pods and deployments. We’ll show several examples of using kubectl  CLI to debug pending, waiting, or terminated pods in your Kubernetes cluster. By the end of this tutorial, you’ll be able to identify the causes of pod failures at a fraction of time, making debugging Kubernetes applications much easier. Let’s get started!

Tutorial

To complete examples in this tutorial, you need the following prerequisites:

  • A running Kubernetes cluster. See Supergiant GitHub wiki for more information about deploying a Kubernetes cluster with Supergiant. As an alternative, you can install a single-node Kubernetes cluster on a local system using Minikube.
  • A kubectl command line tool installed and configured to communicate with the cluster. See how to install kubectl here.

One of the most common reasons for a pod being unable to start is that Kubernetes can’t find a node on which to schedule the pod. The scheduling failure might be because of the excessive resource request by pod containers. If for some reason you lost track of how many resources are available in your cluster, the pod’s failure to start might confuse and puzzle you. Kubernetes built-in pod inspection and debugging functionality come to the rescue, though. Let’s see how. Below is the Deployment spec that creates 5 replicas of Apache HTTP Server (5 Pods) requesting 0.3 CPU and 500 Mi each (Check this article to learn more about Kubernetes resource model):

Let’s save this spec in the httpd-deployment.yaml  and create the deployment with the following command:

Now, if we check the replicas created we’ll see the following output:

As you see, only 3 replicas of 5 are Ready  and Running  and 2 others are in the Pending  state. If you are a Kubernetes newbie, you’ll probably wonder what all these statuses mean. Some of them are quite easy to understand (e.g., Running  ) while others are not.

Just to remind the readers, a pod’s life cycle includes a number of phases defined in the PodStatus  object. Possible values for phase include the following:

  • Pending: Pods with a pending status have been already accepted by the system, but one or several container images have not been yet downloaded or installed.
  • Running: The pod has been scheduled to a specific node, and all its containers are already running.
  • Succeeded: All containers in the pod were successfully terminated and will not be restarted.
  • Failed: At least one container in the pod was terminated with a failure. This means that one of the containers in the pod either exited with a non-zero status or was terminated by the system.
  • Unknown: The state of the pod cannot be obtained for some reason, typically due to a communication error.

Two replicas of our deployment are Pending , which means that the pods have not yet been scheduled by the system. The next logical question why that is the case? Let’s use the main inspection tool at our disposal – kubectl describe . Run this command with one of the pods that have a Pending  status:

Let’s discuss some fields of this description that are useful for debugging:

Namespace  — A Kubernetes namespace in which the pod was created. You might sometimes forget the namespace in which the deployment and pod were created and then be surprised to find no pods when running kubectl get pods . In this case, check all available namespaces by running kubectl get namespaces  and access pods in the needed namespace by running kubectl get pods --namespace <your-namespace> .

Status  — A pod’s lifecycle phase defined in the PodStatus  object (see the discussion above).

Conditions: PodScheduled  — A Boolean value that tells if the pod was scheduled. The value of this field indicates that our pod was not scheduled.

QoS Class  — Resource guarantees for the pod defined by the quality of service (QoS) Class. In accordance with QoS, pods can be Guaranteed , Burstable  and Best-Effort  (see the image below).

Pod Classes

 

Events  — pod events emitted by the system. Events are very informative about the potential reasons for the pod’s issues. In this example, you can find the event with a FailedScheduling  Reason and the informative message indicating that the pod was not scheduled due to insufficient CPU and insufficient memory. Events such as these are stored in etcd  to provide high-level information on what is going on in the cluster. To list all events, we can use the following command:

Please, remember that all events are namespaced so you should indicate the namespace you are searching by typing: kubectl get events --namespace=my-namespace

As you see, kubectl describe pod <pod-name>  function is very powerful in identifying pod issues. It allowed us to find out that the pod was not created due to insufficient memory and CPU. Another way to retrieve extra information about a pod is passing the -o yaml  format flag to kubectl get pod :

This command will output all information that Kubernetes has about this pod. It will contain the description of all spec options and fields you specified including any annotations, restart policy, statuses, phases, and more. The abundance of pod-related data makes this command one of the best tools for debugging pods in Kubernetes.

That’s it! To fix the scheduling issue, you’ll need to request the appropriate amount of CPU and memory. While doing so, please, keep in mind that Kubernetes starts with some default daemons and services like kube-proxy . Therefore, you can’t request 1.0 of CPU for your apps.

Scheduling is only one amongst the common issues for your pods stuck in the pending stage. Let’s create another deployment to illustrate other potential scenarios:

All we need to know about this deployment, is that it creates 3 replicas of the Apache HTTP server and specifies the custom RollingUpdate strategy.

Let’s save this spec in the httpd-deployment-2.yaml  and create the deployment running the following command:

Let’s check whether all replicas were successfully created:

Oops! As you see, all three dods are not Ready  and have ImagePullBackoff  and ErrImagePull  statuses. These statuses indicate that something wrong has happened while pulling the httpd  image from the Docker hub repository. Let’s describe one of the pods in the deployment to find out more information:

If you scroll down this description to the bottom, you’ll see details on why the pod failed: "Failed to pull image "httpd:23-alpine": rpc error: code = Unknown desc = Error response from daemon: manifest for httpd:23-alpine not found" . This means that we specified a container image that does not exist. Let’s update our deployment with the right httpd container image version to fix the issue:

Then, let’s check the deployment’s pods again:

Awesome! The Deployment controller has managed to pull the new image and all Pod replicas are now Running.

Finding the Reasons your Pod Crashed

Sometimes, your pod might crash due to some syntax errors in commands and arguments for the container. In this case, kubectl describe pod <PodName>  will provide you only with the error name but not the explanation of its cause. Let’s create a new pod to illustrate this scenario:

This pod uses BusyBox sh  command to calculate the arithmetic value of two variables.

Let’s save the spec in the pod-crash.yaml  and create the pod running the following command:

Now, if you check the pod, you’ll see the following output:

The CrashLoopBackOff  status means that you have a pod starting, crashing, starting again, and then crashing again. Kubernetes attempts to restart this pod because the default restartPolicy : Always is enabled. If we had set the policy to Never , the pod would not be restarted.

The status above, however, does not indicate the precise reason for the pod’s crash. Let’s try to find more details:

The description above indicates that the pod is not Ready  and that it was terminated because of the Back-Off Error. However, the description does not provide any further explanation of why the error occurred. Where should we search then?

We are most likely to find the reason for the pod’s crash in the BusyBox container logs. You can check them by running kubectl logs ${POD_NAME} ${CONTAINER_NAME} . Note that ${CONTAINER_NAME}  can be omitted for pods that only contain a single container (as in our case)

Awesome! There must be something wrong with our command or arguments syntax. Indeed, we made a typo inserting ;  into the expression echo "$(( MIN*60 + SEC + ; ))"' . Just fix that typo and you are good to go!

Pod Fails Due to the ‘Unknown Field’ Error

In the earlier versions of Kubernetes, a pod could be created even if the error was made in the spec’s field name or value. In this case, the error would be silently ignored if the pod was created with the --validate  flag set to false. In the newer Kubernetes versions (we are using Kubernetes 1.10.0), the --validate  option is always set to true by default so the error for the unknown field is always printed. Therefore, the debugging becomes much easier. Let’s create a pod with a wrong field value to illustrate this:

Let’s save this spec in the pod-field-error.yaml  and create the pod with the following command:

As you see, the pod start was blocked because of the unknown ‘ comand ‘ field (we made a typo in this field intentionally). If you are using older versions of Kubernetes and the pod is created with the error silently ignored, delete the pod and run it with kubectl create --validate -f pod-field-error.yaml  . This command will help you find the reason for the error:

Cleaning Up

This tutorial is over, lso t’s clean up after ourselves.

Delete Deployments:

Delete Pods:

You may also want to delete files with spec definitions if you don’t need them anymore.

Conclusion

As this tutorial demonstrated, Kubernetes ships with great debugging tools that help identify the reasons for pod failure or unexpected behavior at a fraction of time. The rule of thumb for Kubernetes debugging is first to find out the pod status and pod events and then check event messages by using kubectl describe  or kubectl get events . If your pod crashes and the detailed error message is not available, you can check the containers’ logs to find container-level errors and exceptions. These simple tools will dramatically increase your debugging speed and efficiency freeing up time for more productive work.

Stay tuned to upcoming blogs to learn more about node-level and cluster-level debugging in Kubernetes.