Supergiant Blog

Product releases, new features, announcements, and tutorials.

What is Kubernetes?

Posted by Kirill Goltsman on September 19, 2018

Kubernetes (or K8s) is an open source platform for the deployment and management of containerized applications at scale. It is designed to automate many processes and tasks such as deployment, scaling, update, scheduling, and communication of containerized applications. With Kubernetes, it is simple to group multiple hosts running Linux containers and turn them into a working computer cluster controlled and managed by the intelligent control plane maintaining the desired state of your deployments.

Kubernetes was originally designed and developed by Google for the management of its huge computer clusters and application deployments. Google generated more than 2 billion container deployments per week in 2014, meaning that every second Google was launching an average of 3,300 containers. All these containers were managed by their internal orchestration platform named Borg that became a predecessor of Kubernetes. Google open-sourced Kubernetes in 2014. This move continued Google's track record of successful contributions to container ecosystem and cloud-native movement including the development of cGroups that ultimately made Docker possible and BorgMon, which became the inspiration for Prometheus. 

To make a long story short, since 2014 a number of companies and developers have contributed to the Kubernetes project, building dozens of integrations with popular cloud providers, storage systems, and networking infrastructures etc. Kubernetes is supported by the growing ecosystem and community and is currently the most popular container orchestration tool around.

Why Do Companies Need Kubernetes?

Container runtimes were designed as an alternative to running immutable virtual machine (VM) images. VM images are much heavier (require more resources) than containers and, thus, need more servers to be deployed. In contrast, modern container technologies simplify running thousands of lightweight containers on a single host, which leads to the radical saving of computing resources. Also, containers allow you to disentangle applications from the underlying infrastructure and isolate them from the host environment using an autonomous filesystem, virtual networks, and dependencies. This isolation makes containers much more portable and easier to deploy than VMs.

In itself, however, container technologies like Docker do not fully address the challenge of running containerized applications in production. Docker did offer Swarm (and still does) for orchestration, but Swarm, ultimately, didn’t offer as much as K8s. Think about this for a moment: real production applications include multiple containers deployed across many server hosts. When you manage hundreds or thousands of containers across multiple nodes in production, you'll need to have an ability to scale them depending on the application's load, enable communication and external access (e.g., via microservices), storage, running regular health checks, managing updates, and many other tasks. These are all orchestration tasks that do not come out of the box in container runtimes. Developing your own orchestration framework to perform these tasks would be an unnecessary overhead for your business. 

Fortunately, though, Google open-sourced Kubernetes, so you can get all these handy features and tools without writing a single line of code! With Kubernetes, you get such features as manual and automatic scaling, multi-level networking and service discovery, native support for co-located and tightly coupled applications, and many more. Let's discuss some of these features.

Scalability Is Not a Problem 

Scalability is among the major concerns of modern production-grade applications exposed to millions of potential users. When it comes to running applications in clusters of hundreds or even thousands of servers, scaling becomes a complex administration task that involves many prerequisites and caveats. Here are some of the questions that might arise along the way:

  • Is the desired number of application instances running?
  • How many of those instances are healthy and ready to serve the traffic?
  • What is the current application load? How many replicas are needed to service that load?
  • How many nodes are currently available for scheduling? How many resources are available in the cluster?

These are just a few questions which could make a cluster administrator without appropriate tools go crazy. The obvious solution is automation, and this is where Kubernetes truly shines. 

For example, the platform's controllers like deployments can monitor how many applications are running in your cluster, and if, for some reason, this number is different from the desired state, Kubernetes will scale the deployment up or down to reach it. Under the hood, Kubernetes will also keep a record of available nodes and resources to intelligently schedule your applications during the scaling process. At the same time, you can always do a manual scaling using Kubernetes command line tools (kubectl).

How about auto-scaling? This feature is critical for managing the ever-changing application load. It's also important because it comes down to managing your infrastructure costs: normally, you don't need more applications instances running than the current demand for your services. Kubernetes ships with a Horizontal Pod Autoscaler that scales the number of pods in a replication controller, deployment, or replica set based on observed CPU utilization. 

Our new Supergiant Kubernetes-as-a-Service platform extends this auto-scaling feature even further. Its efficient cost-reduction algorithm based on machine learning and real-time analysis of traffic and resources ensures that you utilize precisely the storage and memory that are needed for your applications to work properly. In this way, Supergiant adds the horizontal auto-scaling feature to Kubernetes' native vertical scaling functionality while introducing immense cost savings.

Efficient Multi-Layer Networking and Service Discovery

The purpose of Kubernetes networking is to turn containers into bona fide "virtual hosts" that can communicate with each other across nodes, thereby combining the benefits of VMs, containers, and microservices. Kubernetes networking is based on several layers, all serving this final goal:

  • Container-to-container communication on localhost and using a pod's network namespace. This networking layer enables the container network interfaces for tightly coupled containers that can communicate with each other on specified ports much like the conventional applications.
  • Pod-to-pod communication that enables communication of pods across nodes. Kubernetes can turn the cluster into a virtual network where all pods can communicate with each other no matter what nodes they land on.
  • Services. A Service abstraction defines a policy (microservice) for accessing pods by other applications. Services act as load-balancers that distribute requests across backend pods managed by the service.

Kubernetes is very flexible about networking solutions that you can use. However, the platform imposes the following principles for the cluster-level network interfaces:

  • All containers can communicate with each other without NAT.
  • All nodes can communicate with all containers (and vice versa) without NAT.
  • The IP seen by one container is the same IP seen by other containers.

There are a number of powerful networking implementations of this model including Cilium, Contriv, Flannel, and others.

Enabling Co-Located and Tightly Coupled Applications

Kubernetes packages containers in abstractions called pods that provide Kubernetes infrastructure resources and services to containers. Pods work as wrappers for the containers that provide interfaces for sharing resources and communication between them.

In particular, containers can communicate via localhost on the pod network and share resources via volumes assigned to pods. This allows implementing various tightly coupled application designs where one container serves as the main application and another container (sidecar) helps it process data or consume logs. Containers in a pod also can have shared fate and behavior, which dramatically simplifies the deployment of co-located applications.

Fast, Zero Downtime Updates

Users expect your applications to be up all the time. However, if your app runs thousands of containers in production, doing manual updates can introduce a lot of risks and cause downtimes. The last thing you want is to make your application unavailable when it's being updated. 

Fortunately, Kubernetes ships with a native support for rolling updates which allow updates to pass with zero downtime. Kubernetes controllers will incrementally update running pods according to parameters you specified, ensuring that the old versions of your application are running before the new replicas are launched. Also, with Kubernetes API, you can have fine-grained control over the maximum number of pods unavailable and the maximum surge of new pods above the desired number specified in your deployment. As an added benefit, Kubernetes will store all updates you make in the revision history, so you can always roll back the deployment to any point in time if your update went wrong for some reason (e.g., image pull error).

Efficient Management of Hardware Resources in your Cluster

When you run thousands of applications in production and have multiple teams working on dozens of projects at the same time, cluster resource management becomes essential for the efficient distribution of your limited cloud budget. 

Kubernetes was built with this practical concern in mind. It is based on the efficient resource management model implemented from the lowest level of containers and pods to the highest level of your cluster.

At the container level, Kubernetes allows assigning resource requests and limits that control how many resources are requested by the containers and set the upper boundary of resource usage. By setting different request/limit ratios, you can create diverse classes of pods.-- best-effort, guaranteed, and burstable --depending on your application's needs.

Also, Kubernetes allows efficiently managing resources at the namespace level. For example, you can define default resource requests and limits automatically applied to containers, resource constraints (minimum and maximum resource requests and limits), and resource quota for all containers running in a given namespace. These features enable efficient resource utilization by applications in your cluster, and they help divide resources productively between different teams. For example, with the namespace resource constraints you can control the share of cluster resources assigned to production and development workloads, ensuring efficient distribution of budget across different workload types. With all these features, you can ensure that your cluster always has available resources for running your applications and dramatically decrease cloud infrastructure costs.

Native Support for Stateful Applications

Kubernetes comes with a native support for stateful applications like databases and key-value stores. In particular, its persistent volumes subsystem provides an API that abstracts details of the underlying storage infrastructure (e.g., AWS EBS, Azure Disk, etc.), allowing users and administrators to focus on storage capacity and storage types that their applications will consume, rather than the subtle details of each storage provider's API.

Persistent volumes allow reserving the needed amount of resources using persistent volume claims. The claim is automatically bound to volumes that match storage type, capacity, and other requirements specified in the claim. Claims are automatically unbound, too, if a matching volume does not exist. This feature allows efficiently reserving storage by applications running in your Kubernetes cluster. 

Another great feature for stateful applications is dynamic storage provisioning. In Kubernetes, administrators can describe various storage types available in the cluster and their specific reclaim, mounting, and backup policies. After such storage classes are defined, Kubernetes can automatically provision the requested amount of resources from the underlying storage provider like AWS EBS or Azure disk.

Also, one of the best statefulness features in Kubernetes is stateful sets, which are relevant for applications that require stable and unique network identifiers, stable and persistent storage, ordered deployment and scaling, and ordered and automated rolling updates. Using described APIs you can deploy production-grade stateful apps of any complexity in Kubernetes.

Seamless Integration with Cloud and Container Ecosystem

Kubernetes works well with all major container runtimes, cloud environments, and cloud native applications. In particular, Kubernetes has:

  • Support for popular container runtimes. Kubernetes 1.5 release came out with the Container Runtime Interface (CRI), a plugin interface that allows using a wide variety of container runtimes without the need to recompile. Since that release, Kubernetes has simplified the usage of various container runtimes compatible with the CRI.
  • Support for multiple volume types. Kubernetes allows creating custom volume plugins that help abstract any external volume infrastructure and use it inside the Kubernetes cluster. Kubernetes currently supports over 25 volume plugins, which include volumes of cloud service providers (AWS EBS, GCE Persistent Disk), object storage systems (CephFS), network filesystems (NFS, Gluster), data center filesystems (Quobyte), and more.
  • Easy integration with cloud providers and use of their native services. Kubernetes makes it easy to deploy clusters on popular cloud platforms and to use their native infrastructure and networking tools. For example, Kubernetes supports external load balancers provided by major cloud providers. They give externally-accessible IP addresses that send traffic to the specified ports on your cluster nodes.

Extensibility and Pluggability

Kubernetes emphasizes the philosophy of extensibility and pluggability, which means that the platform preserves user choice and flexibility where those matter. Kubernetes aims to support the widest variety of workloads and application types possible and to be easy to integrate with any environment and tool.

Some pluggin framework supported by Kubernetes include:

  • Container Network Interface (CNI() plugins: these implement the CNI networking model and are designed for interoperability.
  • The out-of-tree volume plugins such as the Container Storage Interface (CSI) and FlexVolume. They enable storage vendors to create custom storage plugins without adding them to the Kubernetes repository.

How Does Supergiant Add to Kubernetes?

Supergiant simplifies deployment and management of applications in Kubernetes for developers and administrators. In addition to easing the configuration and deployment of Helm charts, Supergiant facilitates clusters on multiple cloud providers, striving for truly agnostic infrastructure. It achieves this with an autoscaling system designed to increase efficiency and reduce costs. Autoscaling ensures that non-utilized infrastructure is not paid for by downscaling unused nodes and resource packing. In addition to that, Supergiant implements various abstract layers for load balancing, application deployment, basic monitoring, node deployment, and destruction on a highly usable UI.

What's Next?

Read some articles that introduce to key concepts of Kubernetes :

Find more about Supergiant:

Keep reading

Configuring Kubernetes Apps Using ConfigMaps

Posted by Kirill Goltsman on September 16, 2018

We previously discussed how to use Secrets API to populate containers and pods with sensitive data and enhance the security of your Kubernetes application. Secrets are handy in detaching passwords and other credentials from pod manifests and in preventing bad actors from ever getting them. 

Kubernetes ConfigMaps apply the same approach to configuration data that can be easily detached from pod specs using a simple API. 

In this tutorial, we'll discuss various ways to create and expose ConfigMaps to your pods and containers. By the end of this article, you'll be able to inject configuration details into pods and containers without actually exposing them as literal values. This pattern enables better isolation and extensibility of your Kubernetes applications and allows easier maintainability of your deployment code. You'll see it yourself soon. Let's get started!

Definition of a ConfigMap

In a nutshell, ConfigMap is a Kubernetes API object designed to detach configuration from container images. The basic rationale behind using ConfigMaps is to make Kubernetes applications portable, maintainable, and extensible. A ConfigMap API resource stores configuration data as key-value pairs. This data can be easily converted to files and environmental variables accessible inside container runtime and/or volumes mounted to the containers. 

Unlike Kubernetes Secrets, however, ConfigMap data is not obfuscated using base64 and is consumed as a plaintext. This is because you are not supposed to store sensitive information in your ConfigMaps. If you need to inject credentials to your Kubernetes application, use Secrets API instead.

There are several ways to create ConfigMaps in Kubernetes: from literal values, from files, and using ConfigMap manifests. A general pattern for creating ConfigMaps using kubectl looks like:

kubectl create configmap <map-name> <data-source>

where the map-name is the name of a ConfigMap and the data-source corresponds to a key-value pair in the ConfigMap, where

  • Key is the file name of the key you provided on the CLI, and
  • Value is the file contents or the literal value you provided on the CLI.

In what follows, we'll see how to use this pattern to create ConfigMaps in Kubernetes.

Tutorial

To complete examples in this tutorial, you'll need:

  • A running Kubernetes cluster. See Supergiant documentation for more information about deploying a Kubernetes cluster with Supergiant. As an alternative, you can install a single-node Kubernetes cluster on a local system using Minikube.
  • A kubectl command line tool installed and configured to communicate with the cluster. See how to install kubectl here.

Creating ConfigMaps from Files

If you have a lot of configuration settings, the most viable option is storing them in files and creating ConfigMaps from those files. To illustrate this, let's first create a text file with some configuration. We came up with a simple front-end configuration that contains the following settings:

color.primary=purple
color.brand=yellow
font.size.default = 14px

Save this configuration in some file (e.g., front-end) and use --from-file argument to ship this file to the ConfigMap:

kubectl create configmap front-end-config --from-file=front-end
configmap "front-end-config" created

To see a detailed information about a new ConfigMap, you can run:

kubectl describe configmap front-end-config

You should see the following output:

Name:         front-end-config
Namespace:    default
Labels:       <none>
Annotations:  <none>
Data
====
front-end:
----
color.primary=purple
color.brand=yellow
font.size.default = 14px
Events:  <none>

As you see, the name of the file with configuration settings turned into the ConfigMap's key and the contents of that file (all configuration fields) became the value of that key.

Alternatively, if you want to provide a key name different from the filename, you can use the following pattern:

kubectl create configmap front-end-config --from-file=ui-config=front-end

In this case, the key of the ConfigMap would be ui-config instead of front-end.

Creating ConfigMaps from Literal Values

If you plan to use a few configuration settings, you can create a ConfigMap from the literal value using --from-literal argument. For example,

kubectl create configmap some-config --from-literal=font.size=14px --from-literal=color.default=green
configmap "some-config" created

will create a ConfigMap with two key-value pairs specified in the --from-literal arguments. As you see, using this argument, we can pass in multiple key-value pairs.

You can get a detailed description of the new ConfigMap using the following command:

kubectl get configmap some-config -o yaml

which will output something like this:

apiVersion: v1
data:
  color.default: green
  font.size: 14px
kind: ConfigMap
metadata:
  creationTimestamp: 2018-07-23T08:21:02Z
  name: some-config
  namespace: default
  resourceVersion: "414488"
  selfLink: /api/v1/namespaces/default/configmaps/some-config
  uid: 56360207-8e51-11e8-9c6c-0800270c281a

As you see, each key-value pair of your configuration is represented as a separate entry in the data section of the ConfigMap.

Creating ConfigMaps With a ConfigMap Manifest

Defining a ConfigMap manifest is useful when you want to create multiple configuration key-value pairs that can be accessed as environmental variables or files in the volumes mounted to the container/s in your pod.

ConfigMap manifests look similar to API resources we've already discussed but have their distinct fields. Below is an example of a simple ConfigMap that stores three key-value pairs:

apiVersion: v1
kind: ConfigMap
metadata:
 name: trading-strategy
 namespace: default
data:
 strategy.type: HFT
 strategy.maxVolume: "5000"
 strategy.risk: high 

As you see, the main difference between this manifest and other API resources is the ConfigMap kind and special data field that stores key-value pairs.

Save this spec in the trading-strategy.yaml and create a ConfigMap running the following command:

kubectl create -f trading-strategy.yaml
configmap "trading-strategy" created

Check if the ConfigMap was successfully created:

kubectl get configmap trading-strategy -o yaml
apiVersion: v1
data:
  strategy.maxVolume: "5000"
  strategy.risk: high
  strategy.type: HFT
kind: ConfigMap
metadata:
  creationTimestamp: 2018-07-20T09:27:44Z
  name: trading-strategy
  namespace: default
  resourceVersion: "395247"
  selfLink: /api/v1/namespaces/default/configmaps/trading-strategy
  uid: 283cacd6-8bff-11e8-a2b0-0800270c281a

That's it! A new ConfigMap can be now used in pods. For example, one option is to expose configuration data in the container's environmental variables:

apiVersion: v1
kind: Pod
metadata:
  name: demo-envr
spec:
  containers:
  - name: envtest
    image: supergiantkir/k8s-liveliness
    ports:
    - containerPort: 8080
    env:
    - name: STRATEGY_RISK
      valueFrom:
        configMapKeyRef:
          name: trading-strategy
          key: strategy.risk 
    - name: STRATEGY_TYPE
        valueFrom:
          configMapKeyRef:
            name: trading-strategy
            key: strategy.type 

Let's briefly discuss key configuration fields of this pod spec:

  • spec.containers[].env[].name -- the name of the environmental variable to map ConfigMap key to.
  • spec.containers[].env[].valueFrom.configMapKeyRef.name -- the name of ConfigMap to use for this environmental variable.
  • spec.containers[].env[].valueFrom.configMapKeyRef.key -- a ConfigMap key to use for this environmental variable.

Save this spec in the demo-envr.yaml and create the pod:

kubectl create -f demo-envr.yaml
pod "demo-envr" created

Once the pod is ready and running, get a shell to the container:

kubectl exec -it demo-envr -- /bin/bash

From within the container, you can access configuration as environmental variables by using printenv command.

printenv STRATEGY_TYPE
HFT
printenv STRATEGY_RISK
high

Awesome, isn't it? Now, you can access environmental variables inside your container. For example, the container's scripts could use environmental variables defined in the ConfigMap to set up your application.

If you have many ConfigMap keys, it might be more viable to define those keys formatted as POSIX environmental variables and expose them to the pod using envFrom field of the spec. Let's create a new ConfigMap to see how it works:

apiVersion: v1
kind: ConfigMap
metadata:
 name: ui-config
 namespace: default
data:
 FONT_DEFAULT_COLOR: green
 FONT_DEFAULT_SIZE: "14px"

Notice that ConfigMap keys are now formatted as POSIX environmental variable names. Now, you can save the spec in the ui-config.yaml and create a ConfigMap running the following command:

kubectl create -f ui-config.yaml
configmap "ui-config" created

Next, let's create a new pod that will use this ConfigMap:

apiVersion: v1
kind: Pod
metadata:
  name: demo-from-env
spec:
  containers:
  - name: envtest
    image: supergiantkir/k8s-liveliness
    ports:
    - containerPort: 8080
    envFrom:
    - configMapRef:
      name: ui-config

Notice that spec.containers[].envFrom[].configMapRef field takes only the name of our ConfigMap (i.e., we need not specify all key-value pairs). Save this spec in the demo-from-env.yaml and create the pod running the following command:

kubectl create -f  demo-from-env.yaml
pod "demo-from-env" created

Check if the pod was created:

kubectl get pod demo-from-env
NAME            READY     STATUS    RESTARTS   AGE
demo-from-env   1/1       Running   0          2m

Once the pod is up and running, get a shell to the active container

kubectl exec -it demo-from-env -- /bin/bash

And print the environmental variables using env command from the bash:

env
FONT_DEFAULT_COLOR=green
FONT_DEFAULT_SIZE=14px
SHLVL=1
HOME=/root
YARN_VERSION=1.6.0
....

As you see, the configuration variables defined in the ConfigMap were successfully populated into the environmental variables of the container. Using envFrom is less verbose because you don't define individual environmental variables. This benefit, however, comes with the requirement of the proper formatting of variable names (see the note below):

Note: In case if you are using envFrom instead of env to create environmental variables in the container, the environmental names will be created from the ConfigMap's keys. If a ConfigMap key has invalid environment variable name, it will be skipped but the pod will be allowed to start. Kubernetes uses the same conventions as POSIX for checking the validity of environmental variables but that might change. According to POSIX:

Environment variable names used by the utilities in the Shell and Utilities volume of IEEE Std 1003.1-2001 consist solely of uppercase letters, digits, and the '_' (underscore) from the characters defined in Portable Character Set and do not begin with a digit. Other characters may be permitted by an implementation; applications shall tolerate the presence of such names.

If the environmental variable name does not pass the check, the  InvalidVariableNames event will be fired and the message with the list of invalid keys that were skipped will be generated.

Injecting ConfigMaps into the Container's Volume

As you remember from the previous tutorial, Kubernetes supports a  configMap volume type that can be used to inject configuration defined in the ConfigMap object for the use of containers in your pod. This option is useful when you want to populate configuration files inside the container with configuration key-value pairs defined in your ConfigMap.

To illustrate this use case, let's populate the container's volume with the ConfigMap data defined in the example above.

apiVersion: v1
kind: Pod
metadata:
  name: demo-config-volume
spec:
  containers:
  - name: demo-cont
    image: supergiantkir/k8s-liveliness
    volumeMounts:
    - name: config-volume
      mountPath: /etc/config
  volumes:
  - name: config-volume
    configMap:
      name: trading-strategy
  restartPolicy: Never

In this pod spec, we define a configMap volume type for the pod and mount it to the /etc/config path inside the container.

Save this spec in the demo-config-volume.yaml and create the pod running the following command:

kubectl create -f demo-config-volume.yaml
pod "demo-config-volume" created

Once the pod is ready and running, get a shell to the container

kubectl exec -it demo-config-volume -- /bin/bash

and check the /etc/config folder:

ls /etc/config/
strategy.maxVolume  strategy.risk  strategy.type

As you see, Kubernetes created three files in that folder. Each file's name is derived from the key name and the file contents are the key value. You can verify this easily:

cat /etc/config/strategy.risk
high

If you want to map the ConfigMap keys to different file names, you can slightly adjust the pod spec above using volumes.configMap.items field.

apiVersion: v1
kind: Pod
metadata:
  name: demo-config-volume
spec:
  containers:
  - name: demo-cont
    image: supergiantkir/k8s-liveliness
    volumeMounts:
    - name: config-volume
      mountPath: /etc/config
  volumes:
  - name: config-volume
    configMap:
      name: trading-strategy
      items:
      - key: strategy.risk
        path: risk
  restartPolicy: Never

Now, the strategy.risk configuration will be stored under the path /etc/config/risk instead of /etc/config/strategy.risk as in the example above. Please note that the default path won't be used if items are used. Thus, each piece of the ConfigMap desired must be reflected there.  Also, take note that if a wrong ConfigMap key is specified, the volume will not be created.

Update Note: as soon as our ConfigMap is consumed by a volume, Kubernetes will be running periodic checks on the configuration. If the ConfigMap is updated, Kubernetes will ensure that the projected keys are updated as well. The update may take some time, depending on the kubelet sync period. However, if your container uses a ConfigMap as a subPath volume mount, the configuration won't be updated.

Cleaning Up

Let's delete all assets and objects created during this tutorial.

Delete ConfigMaps:

kubectl delete configmap trading-strategy
configmap "trading-strategy" deleted
kubectl delete configmap ui-config
configmap "ui-config" deleted

Delete pods:

kubectl delete pod demo-config-volume
pod "demo-config-volume" deleted
kubectl delete pod demo-envr
pod "envr" deleted
kubectl delete pod demo-from-env
pod "demo-from-env"

Finally, delete all files with resource manifests if you don't need them anymore.

Conclusion

One of the main rules of good application development and containerized applications deployment is to separate configuration from the rest of your application. This allows deployments to be easily maintainable and extensible by different developer teams. Keeping configuration in ConfigMaps and exposing them to your containers when needed embodies this vision for your Kubernetes applications. Instead of injecting configuration directly to container image, you can leverage the power of ConfigMaps and pods to mount configuration key-value pairs to specific volume paths or environmental variables inside the container's runtime. This approach will dramatically simplify the management of configuration in your Kubernetes applications ensuring their maintainability and extensibility.

Keep reading

Defining Privileges and Access Control Settings for Pods and Containers in Kubernetes

Posted by Kirill Goltsman on September 6, 2018

In the recent tutorial, we discussed Secrets API designed to encode sensitive data and expose it to pods in a controlled way, enabling secrets encapsulation and sharing between containers. 

However, Secrets are only one component of the pod- and container-level security in Kubernetes. Another important dimension is a security context that facilitates management of access rights, privileges, and permissions for processes and filesystems in Kubernetes. 

In this tutorial, we'll discuss how to set access rights and privileges for container processes within a pod using discretionary access control (DAC) and ensuring proper isolation of container processes from the host using Linux capabilities. By the end of this tutorial, you'll know how to limit the ability of containers to negatively impact your infrastructure and other containers and limit access of users to sensitive data and mission-critical programs in your Kubernetes environment. Let's get started!

Defining Security Context

A security context can be defined as a set of constraints applied to a container in order to achieve the following goals:

  • Enable a distinct isolation between a container and the host/node it runs on. Many users of containers underestimate this task and think that containers are properly isolated from hosts like virtual machines (VMs). The reality is different though. Privileged processes (e.g., running as root) running in the container are identical to privileged processes that run on the host. Therefore, running an application in the container does not isolate it from the host. Running containers as root can cause serious problems if Docker images from untrusted sources are used.
  • Prevent containers from negatively impacting the infrastructure or other containers.

These basic goals necessitate the following best practices for using security contexts in Kubernetes:

  • Drop process privileges in containers as quickly as possible or be aware of them.
  • Run services as non-root whenever possible.
  • Don't use random Docker images in your system.

Security contexts in Kubernetes facilitate implementation of this task and help protect your system against various security risks. We'll discuss below how to achieve the goals outlined above by using PodSecurityContext and SecurityContext in your pods and containers.

Tutorial

To complete examples in this tutorial, you'll need:

  • A running Kubernetes cluster. See Supergiant documentation for more information about deploying a Kubernetes cluster with Supergiant. As an alternative, you can install a single-node Kubernetes cluster on a local system using Minikube.
  • A kubectl command line tool installed and configured to communicate with the cluster. See how to install kubectl here.

Using Security Contexts in Pods and Containers

Security context settings implement basic philosophy of discretionary access control (DAC). This is a type of access control in which a given user has complete control over all programs it owns and executes. This user can also determine the permissions of other users for accessing and modifying these files or programs. DAC contrasts with mandatory access control (MAC) by which the operating system (OS) constraints the ability of a subject (e.g., process) or initiator to access or perform some operations on computing objects (e.g., files),

In Kubernetes, using DAC implies that you, as a user or administrator, can set access and permission constraints on files and processes ran in your pods and containers. Security contexts can be specified for the entire pods and/or for individual containers. 

Let's first start with the pod-level security context. To specify security settings for a pod, you need to include the securityContext field in the pod manifest. This field is a PodSecurityContext object that saves security context in the Kubernetes API. Let's create a pod with a security context using the example below. This is a pod that runs a simple Node.js application that we wrote and saved in the public Docker Hub repository.

apiVersion: v1
kind: Pod
metadata:
  name: security-context-pod
spec:
  securityContext:
    runAsUser: 2500
    fsGroup: 2000
  volumes:
  - name: security-context-vol
    emptyDir: {}
  containers:
  - name: security-context-cont
    image: supergiantkir/k8s-liveliness
    volumeMounts:
    - name: security-context-vol
      mountPath: /data/test
    securityContext:
      allowPrivilegeEscalation: false

As you can see, we have two security contexts in this pod. The first one is a pod-level security context defined by the PodSecurityContext object, and the second one is a SecurityContext defined for the individual container. Pod-level security context works for all individual containers in the pod, but, field values of container.securityContext take precedence over field values of PodSecurityContext. In other words, if the container-level security context is defined, it overrides the pod-level security context.

You now have a basic understanding of how security contexts work, so let's discuss key settings available for the PodSecurityContext:

.spec.securityContext.runAsUser -- This field specifies the User ID (UID) with which to run the Entrypoint (default executable of the image) of the container process. If the field value is not specified, it defaults to the UID defined in the image metadata. The discussed field can be also used in the spec.containers[].securityContext , in which case it takes precedence over the same field in the PodSecurityContext. In our example, the field specifies that for any containers in the pod, the container process runs with user ID 2500.

.spec.securityContext.fsGroup -- The field defines a special supplemental group that assigns a group ID (GID) for all containers in the pod. Also, this group ID is associated with the emptyDir volume mounted at /data/test and with any files created in that volume. You should remember that only certain volume types allow the kubelet to change the ownership of a volume to be owned by the pod. If the volume type allows this (as emptyDir volume type) the owning GID will be the fsGroup.

.spec.securityContext.runAsGroup -- This field is useful in cases when you want to run the entrypoint of the container process by a group rather than a user. In this case, you can specify a GID for that group using this field. If the field is not set, the image default will be used. If the field is set both in SecurityContext and PodSecurityContext, the value specified in the container's SecurityContext takes precedence over the one specified in the PodSecurityContext.

.spec.securityContext.runAsNonRoot -- The field determines whether the pod's container should run as a non-root user. If set to true, the kubelet will validate the image at runtime to make sure that it does not run as UID 0 (root) and won't start the container if it does. If set in both SecurityContext and PodSecurityContext, the value specified in SecurityContext takes precedence. The discussed field is very important for preventing privileged processes in containers from accessing the system and the host.

Now, as you understand key options for PodSecurityContext, save the spec above in security-context-demo.yaml and create the Pod:

kubectl create -f security-context-demo.yaml
pod "security-context-pod" created

Now, verify that the pod is running:

kubectl get pod security-context-pod
NAME                  READY         STATUS         RESTARTS       AGE
security-context-pod   1/1          Running            0          16s

Next, we will check the ownership of processes run within the Node.js container. First, get a shell to the running container:

kubectl exec -it security-context-pod -- /bin/bash

Inside the container, list all running processes:

ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
2500         1  0.7  2.0 983564 41352 ?        Ssl  11:24   0:00 npm            
2500        16  0.0  0.0   4340   736 ?        S    11:24   0:00 sh -c node serv
2500        17  0.4  1.7 882368 35848 ?        Sl   11:24   0:00 node server.js
2500        23  0.0  0.1  20252  3252 pts/0    Ss   11:24   0:00 /bin/bash
2500        28  0.0  0.1  17500  2056 pts/0    R+   11:24   0:00 ps aux

Awesome! The output above shows that all processes in the container are run by the UID 2500 as we expected.

Remember that we set the GID for all containers and volumes in our Pod? Let's check how it worked out. Go to the /data directory in the container's filesystem root and list the permissions of the /test directory inside it:

cd /data
ls -l 

You should see something like this:

drwxrwsrwx 2 root 2000 4096 Jul 19 11:23 test

The output shows that the /data/demo directory has group ID 2000, which is the value of fsGroup.

Hypothetically, all new files and directories will also receive the GID defined by the fsGroup. Let's check if this is true:

cd test
echo This file has the same GID as the parent directory > demofile

Now, check the file's ownership:

ls -l 
-rw-r--r-- 1 2500 2000 51 Jul 19 11:30 demofile

As you see, the demofile has a group ID 2000, which is the value of fsGroup. As simple as that!

Overriding Pod Security Context in the Container

As we've already mentioned, a container's SecurityContext takes precedence over the PodSecurityContext. Therefore, you can set a pod-level security context for all containers in the pod and override it if needed by modifying a SecurityContext for individual containers. Let's create a new pod to see how this works:

apiVersion: v1
kind: Pod
metadata:
  name: override-security-demo
spec:
  securityContext:
    runAsUser: 3000
  containers:
  - name: override-security-cont
    image: supergiantkir/k8s-liveliness
    securityContext:
      runAsUser: 2000
      allowPrivilegeEscalation: false

This pod runs the container with the same Docker image as in the example above, but this time UID to run the process with is specified both for the pod and the container inside it.

Before creating this Pod, let's discuss key options available in the container's SecurityContext:

.spec.containers[]securityContext.runAsUser -- The same as in the PodSecurityContext

.spec.containers[]securityContext.runAsGroup -- The same as in the PodSecurityContext

.spec.containers[]securityContext.runAsNonRoot -- The same as in the PodSecurityContext

.spec.containers[].securityContext.allowPrivilegeEscalation -- This field controls whether a process can get more privileges than its parent process. More specifically, it controls whether the no_new_privs flag will be set on the container process. AllowPrivilegeEscalation is always true when the container is: (1) run as Privileged (2) has a CAP_SYS_ADMIN Linux capability enabled.

.spec.containers[].securityContext.privileged -- The field tells kubelet to run the container in the privileged mode. Processes in privileged containers are essentially identical to root processes on the host. The default value is false.

.spec.containers[].securityContext.readOnlyRootFilesystem -- Defines whether a container has a read-only root filesystem. The default value is false.

.spec.containers[].securityContext.seLinuxOptions -- The SELinux context to be applied to the container. If the value is unspecified, the container runtime (e.g., Docker) will assign a random SELinux context for each container in a pod. If the value is set in both SecurityContext and PodSecurityContext, the value specified in SecurityContext takes precedence.

Save this spec in the override-security-demo.yaml and create the pod running the following command:

kubectl create -f override-security-demo.yaml
pod "override-security-demo" created 

Next, verify that the pod is running:

kubectl get pod override-security-demo 
NAME                     READY     STATUS    RESTARTS   AGE
override-security-demo   1/1       Running   0          45s

Then, as in the first example, get a shell to the running container to check the ownership of container processes:

kubectl exec -it override-security-demo -- /bin/bash

Inside the container, show the list of running processes:

ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
2000         1  0.2  2.0 983532 40992 ?        Ssl  10:14   0:00 npm            
2000        16  0.0  0.0   4340   800 ?        S    10:14   0:00 sh -c node serv
2000        17  0.1  1.7 883392 36084 ?        Sl   10:14   0:00 node server.js
2000        23  0.0  0.1  20252  3232 pts/0    Ss   10:16   0:00 /bin/bash
2000        28  0.0  0.1  17500  2060 pts/0    R+   10:16   0:00 ps aux

As you see, all the processes are run with the UID 2000 which is the value of runAsUser specified for the Container. It overrides the UID value 3000 specified for the pod.

Using Linux Capabilities

If you want a fine-grained control over process privileges, you can use Linux capabilities. To understand how they work, we need a basic introduction to the Unix/Linux processes. In a nutshell, traditional Unix implementations have two classes of processes: (1) privileged processes (whose user ID is 0, referred to as root or as superuser) and (2) unprivileged processes (that have a non-zero UID). 

In contrast to privileged processes that bypass all kernel permission checks, unprivileged processes have to pass full permission checking based on the process's credentials such as effective UID, GID, and supplementary group list). Starting with kernel 2.2, Linux has divided privileged processes' privileges into distinct units, known as capabilities. These distinct units/privileges can be independently assigned and enabled for unprivileged processes introducing root privileges to them. Kubernetes users can use Linux capabilities to grant certain privileges to a process without giving it all privileges of the root user. This is helpful for improving container isolation from the host since containers no longer need to write as root -- you can just grant certain root privileges to them and that's it.

To add or remove Linux capabilities for a container, you can include the capabilities field in the securityContextsection of the container manifest. Let's see an example:

apiVersion: v1
kind: Pod
metadata:
  name: linux-cpb-demo
spec:
  securityContext:
    runAsUser: 3000
  containers:
  - name: linux-cpb-cont
    image: supergiantkir/k8s-liveliness
    securityContext:
      capabilities:
        add: ["NET_ADMIN"]

In this example, we assigned a CAP_NET_ADMIN capability to the container. This Linux capability allows a process to perform various network-related operations such as interface configuration, administration of IP firewall, modifying routing tables, enabling multicasting, etc. For the full list of available capabilities, see the official Linux documentation.

Note: Linux capabilities have the form CAP_XXX. However, when you list capabilities in your Container manifest, you must omit the CAP_ part of the constant. For example, to add CAP_NET_ADMIN capability, include SYS_TIME in your list of capabilities.

Cleaning Up

As this tutorial is over, let's clean after ourselves.

Don't forget to delete all pods:

kubectl delete pod security-context-pod
pod "security-context-pod" deleted
kubectl delete pod override-security-demo
pod "override-security-demo" deleted
kubectl delete pod linux-cpb-demo
pod "linux-cpb-demo" deleted

Also, you may wish to delete all files with the pod manifests if you don't need them anymore.

Conclusion

In this article, we have discussed how to use Kubernetes security contexts in your pods and containers. Security contexts are a powerful tool for controlling access rights and privileges of processes running in the pod's containers. Kubernetes allows setting a pod-level security context for all containers and overriding it by the individual containers using SecurityContext manifest. 

Kubernetes security contexts are also helpful if you want to isolate container processes from the host. In particular, you learned how to use Linux capabilities to grant certain root privileges to processes allowing them to run as non-root while giving them root privileges necessary for them to work. All these features make Kubernetes security context a powerful addition to Kubernetes secrets that allow improving the security of your Kubernetes application and proper isolation of container environment from other users and underlying nodes.

Keep reading

Debugging Kubernetes Applications: How To Guide

Posted by Kirill Goltsman on September 2, 2018

Do your Kubernetes pods or deployments sometimes crash or begin to behave in a way not expected? 

Without knowledge of how to inspect and debug them, Kubernetes developers and administrators will struggle to identify the reasons for the application failure. Fortunately, Kubernetes ships with powerful built-in debugging tools that allow inspecting cluster-level, node-level, and application-level issues. 

In this article, we focus on several application-level issues you might face when you create your pods and deployments. We'll show several examples of using kubectl CLI to debug pending, waiting, or terminated pods in your Kubernetes cluster. By the end of this tutorial, you'll be able to identify the causes of pod failures at a fraction of time, making debugging Kubernetes applications much easier. Let's get started!

Tutorial

To complete examples in this tutorial, you need the following prerequisites:

  • A running Kubernetes cluster. See Supergiant GitHub wiki for more information about deploying a Kubernetes cluster with Supergiant. As an alternative, you can install a single-node Kubernetes cluster on a local system using Minikube.
  • A kubectl command line tool installed and configured to communicate with the cluster. See how to install kubectl here.

One of the most common reasons for a pod being unable to start is that Kubernetes can't find a node on which to schedule the pod. The scheduling failure might be because of the excessive resource request by pod containers. If for some reason you lost track of how many resources are available in your cluster, the pod's failure to start might confuse and puzzle you. Kubernetes built-in pod inspection and debugging functionality come to the rescue, though. Let's see how. Below is the Deployment spec that creates 5 replicas of Apache HTTP Server (5 Pods) requesting 0.3 CPU and 500 Mi each (Check this article to learn more about Kubernetes resource model):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: httpd-deployment
  labels:
    app: httpd
spec:
  replicas: 5
  selector:
    matchLabels:
      app: httpd
  template:
    metadata:
      labels:
        app: httpd
    spec:
      containers:
      - name: httpd
        image: httpd:latest
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: "0.3"
            memory: "500Mi"

Let's save this spec in the httpd-deployment.yaml and create the deployment with the following command:

kubectl create -f httpd-deployment.yaml
deployment.extensions "httpd-deployment" created

Now, if we check the replicas created we'll see the following output:

kubectl get pods
NAME                               READY     STATUS      RESTARTS   AGE
httpd-deployment-b644c8654-54fpq   0/1       Pending     0          38s
httpd-deployment-b644c8654-82brr   1/1       Running     0          38s
httpd-deployment-b644c8654-h9cj2   1/1       Running     0          38s
httpd-deployment-b644c8654-jsl85   0/1       Pending     0          38s
httpd-deployment-b644c8654-wkqqx   1/1       Running     0          38s

As you see, only 3 replicas of 5 are Ready and Running and 2 others are in the Pending state. If you are a Kubernetes newbie, you'll probably wonder what all these statuses mean. Some of them are quite easy to understand (e.g., Running ) while others are not.

Just to remind the readers, a pod's life cycle includes a number of phases defined in the PodStatus object. Possible values for phase include the following:

  • Pending: Pods with a pending status have been already accepted by the system, but one or several container images have not been yet downloaded or installed.
  • Running: The pod has been scheduled to a specific node, and all its containers are already running.
  • Succeeded: All containers in the pod were successfully terminated and will not be restarted.
  • Failed: At least one container in the pod was terminated with a failure. This means that one of the containers in the pod either exited with a non-zero status or was terminated by the system.
  • Unknown: The state of the pod cannot be obtained for some reason, typically due to a communication error.

Two replicas of our deployment are Pending, which means that the pods have not yet been scheduled by the system. The next logical question why that is the case? Let's use the main inspection tool at our disposal --kubectl describe. Run this command with one of the pods that have a Pending status:

kubectl describe pod  httpd-deployment-b644c8654-54fpq
Name:           httpd-deployment-b644c8654-54fpq
Namespace:      default
Node:           <none>
Labels:         app=httpd
                pod-template-hash=620074210
Annotations:    <none>
Status:         Pending
IP:             
Controlled By:  ReplicaSet/httpd-deployment-b644c8654
Containers:
  httpd:
    Image:      httpd:latest
    Port:       80/TCP
    Host Port:  0/TCP
    Requests:
      cpu:        300m
      memory:     500Mi
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-9wdtd (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  default-token-9wdtd:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-9wdtd
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  4m (x37 over 15m)  default-scheduler  0/1 nodes are available: 1 Insufficient cpu, 1 Insufficient memory.

Let's discuss some fields of this description that are useful for debugging:

Namespace -- A Kubernetes namespace in which the pod was created. You might sometimes forget the namespace in which the deployment and pod were created and then be surprised to find no pods when running kubectl get pods. In this case, check all available namespaces by running kubectl get namespaces and access pods in the needed namespace by running kubectl get pods --namespace <your-namespace>.

Status -- A pod's lifecycle phase defined in the PodStatus object (see the discussion above).

Conditions: PodScheduled -- A Boolean value that tells if the pod was scheduled. The value of this field indicates that our pod was not scheduled.

QoS Class -- Resource guarantees for the pod defined by the quality of service (QoS) Class. In accordance with QoS, pods can be Guaranteed, Burstable and Best-Effort (see the image below).


Events -- pod events emitted by the system. Events are very informative about the potential reasons for the pod's issues. In this example, you can find the event with a FailedScheduling Reason and the informative message indicating that the pod was not scheduled due to insufficient CPU and insufficient memory. Events such as these are stored in etcd to provide high-level information on what is going on in the cluster. To list all events, we can use the following command:

kubectl get events
LAST SEEN   FIRST SEEN   COUNT     NAME                                              KIND      SUBOBJECT                   TYPE      REASON                    SOURCE                 MESSAGE
3h          3h           1         apache-server-558f6f49f6-8bjnc.1541c8c4e84a9d6c   Pod                                   Normal    SuccessfulMountVolume     kubelet, minikube      MountVolume.SetUp succeeded for volume "default-token-9wdtd" 
3h          3h           1         apache-server-558f6f49f6-8bjnc.1541c8c4f1c64e87   Pod                                   Normal    SandboxChanged            kubelet, minikube      Pod sandbox changed, it will be killed and re-created.
3h          3h           1         apache-server-558f6f49f6-8bjnc.1541c8c500f534e9   Pod       spec.containers{httpd}      Normal    Pulled                    kubelet, minikube      Container image "httpd:2-alpine" already present on machine
3h          3h           1         apache-server-558f6f49f6-8bjnc.1541c8c503df3cb5   Pod       spec.containers{httpd}      Normal    Created                   kubelet, minikube      Created container
3h          3h           1         apache-server-558f6f49f6-8bjnc.1541c8c50a061e37   Pod       spec.containers{httpd}      Normal    Started                   kubelet, minikube      Started container
3h          3h           1         apache-server-558f6f49f6-p7mkl.1541c8c4711915e3   Pod                                   Normal    SuccessfulMountVolume     kubelet, minikube      MountVolume.SetUp succeeded for volume "default-token-9wdtd" 
3h          3h           1         apache-server-558f6f49f6-p7mkl.1541c8c475d37603   Pod                                   Normal    SandboxChanged            kubelet, minikube      Pod sandbox changed, it will be killed and re-created.

Please, remember that all events are namespaced so you should indicate the namespace you are searching by typing: kubectl get events --namespace=my-namespace

As you see, kubectl describe pod <pod-name> function is very powerful in identifying pod issues. It allowed us to find out that the pod was not created due to insufficient memory and CPU. Another way to retrieve extra information about a pod is passing the -o yaml format flag to kubectl get pod:

kubectl get  pod  httpd-deployment-b644c8654-54fpq -o yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: 2018-07-13T10:33:58Z
  generateName: httpd-deployment-b644c8654-
  labels:
    app: httpd
    pod-template-hash: "620074210"
  name: httpd-deployment-b644c8654-54fpq
  namespace: default
  ownerReferences:
  - apiVersion: extensions/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: httpd-deployment-b644c8654
    uid: 40329209-8688-11e8-bf09-0800270c281a
  resourceVersion: "297148"
  selfLink: /api/v1/namespaces/default/pods/httpd-deployment-b644c8654-54fpq
  uid: 40383a20-8688-11e8-bf09-0800270c281a
spec:
  containers:
  - image: httpd:latest
    imagePullPolicy: Always
    name: httpd
    ports:
    - containerPort: 80
      protocol: TCP
    resources:
      requests:
        cpu: 300m
        memory: 500Mi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-9wdtd
      readOnly: true
  dnsPolicy: ClusterFirst
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: default-token-9wdtd
    secret:
      defaultMode: 420
      secretName: default-token-9wdtd
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: 2018-07-13T10:33:58Z
    message: '0/1 nodes are available: 1 Insufficient cpu, 1 Insufficient memory.'
    reason: Unschedulable
    status: "False"
    type: PodScheduled
  phase: Pending
  qosClass: Burstable

This command will output all information that Kubernetes has about this pod. It will contain the description of all spec options and fields you specified including any annotations, restart policy, statuses, phases, and more. The abundance of pod-related data makes this command one of the best tools for debugging pods in Kubernetes.

That's it! To fix the scheduling issue, you'll need to request the appropriate amount of CPU and memory. While doing so, please, keep in mind that Kubernetes starts with some default daemons and services like kube-proxy. Therefore, you can't request 1.0 of CPU for your apps.

Scheduling is only one amongst the common issues for your pods stuck in the pending stage. Let's create another deployment to illustrate other potential scenarios:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: apache-server
  labels:
    app: httpd
spec:
  replicas: 3
  selector:
    matchLabels:
      app: httpd
  strategy:
    type: RollingUpdate
    rollingUpdate: 
      maxSurge: 40%
      maxUnavailable: 40%
  template:
    metadata:
      labels:
        app: httpd
    spec:
      containers:
      - name: httpd
        image: httpd:23-alpine
        ports:
        - containerPort: 80

All we need to know about this deployment, is that it creates 3 replicas of the Apache HTTP server and specifies the custom RollingUpdate strategy.

Let's save this spec in the httpd-deployment-2.yaml and create the deployment running the following command:

kubectl create -f httpd-deployment-2.yaml
deployment.apps "apache-server" created

Let's check whether all replicas were successfully created:

kubectl get pods
NAME                            READY     STATUS             RESTARTS   AGE
apache-server-dc9bf8469-bblb4   0/1       ImagePullBackOff   0          53s
apache-server-dc9bf8469-x2wwq   0/1       ErrImagePull       0          53s
apache-server-dc9bf8469-xhmm7   0/1       ImagePullBackOff   0          53s

Oops! As you see, all three dods are not Ready and have ImagePullBackoff and ErrImagePull statuses. These statuses indicate that something wrong has happened while pulling the httpd image from the Docker hub repository. Let's describe one of the pods in the deployment to find out more information:

kubectl describe pod apache-server-dc9bf8469-bblb4
Name:           apache-server-dc9bf8469-bblb4
Namespace:      default
Node:           minikube/10.0.2.15
Start Time:     Fri, 13 Jul 2018 15:25:02 +0300
Labels:         app=httpd
                pod-template-hash=875694025
Annotations:    <none>
Status:         Pending
IP:             172.17.0.6
Controlled By:  ReplicaSet/apache-server-dc9bf8469
Containers:
  httpd:
    Container ID:   
    Image:          httpd:23-alpine
    Image ID:       
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ImagePullBackOff
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-9wdtd (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          False 
  PodScheduled   True 
Volumes:
  default-token-9wdtd:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-9wdtd
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                 Age              From               Message
  ----     ------                 ----             ----               -------
  Normal   Scheduled              3m               default-scheduler  Successfully assigned apache-server-dc9bf8469-bblb4 to minikube
  Normal   SuccessfulMountVolume  3m               kubelet, minikube  MountVolume.SetUp succeeded for volume "default-token-9wdtd"
  Normal   Pulling                2m (x4 over 3m)  kubelet, minikube  pulling image "httpd:23-alpine"
  Warning  Failed                 2m (x4 over 3m)  kubelet, minikube  Failed to pull image "httpd:23-alpine": rpc error: code = Unknown desc = Error response from daemon: manifest for httpd:23-alpine not found
  Warning  Failed                 2m (x4 over 3m)  kubelet, minikube  Error: ErrImagePull
  Normal   BackOff                1m (x6 over 3m)  kubelet, minikube  Back-off pulling image "httpd:23-alpine"
  Warning  Failed                 1m (x6 over 3m)  kubelet, minikube  Error: ImagePullBackOff

If you scroll down this description to the bottom, you'll see details on why the pod failed: "Failed to pull image "httpd:23-alpine": rpc error: code = Unknown desc = Error response from daemon: manifest for httpd:23-alpine not found". This means that we specified a container image that does not exist. Let's update our deployment with the right httpd container image version to fix the issue:

kubectl set image deployment/apache-server  httpd=httpd:2-alpine
deployment.apps "apache-server" image updated

Then, let's check the deployment's pods again:

kubectl get pods
NAME                             READY     STATUS             RESTARTS   AGE
apache-server-558f6f49f6-8bjnc   1/1       Running            0          36s
apache-server-558f6f49f6-p7mkl   1/1       Running            0          36s
apache-server-558f6f49f6-q8gj5   1/1       Running            0          36s

Awesome! The Deployment controller has managed to pull the new image and all Pod replicas are now Running.

Finding the Reasons your Pod Crashed

Sometimes, your pod might crash due to some syntax errors in commands and arguments for the container. In this case, kubectl describe pod <PodName> will provide you only with the error name but not the explanation of its cause. Let's create a new pod to illustrate this scenario:

apiVersion: v1
kind: Pod
metadata:
  name: pod-crash
  labels:
    app: demo
spec:
  containers:
  - name: busybox
    image: busybox
    command: ['sh']
    args: ['-c', 'MIN=5 SEC=45; echo "$(( MIN*60 + SEC + ; ))"']

This pod uses BusyBox sh command to calculate the arithmetic value of two variables.

Let's save the spec in the pod-crash.yaml and create the pod running the following command:

kubectl create -f pod-crash.yaml
pod "pod-crash" created

Now, if you check the pod, you'll see the following output:

kubectl get pods
NAME                             READY     STATUS             RESTARTS   AGE
pod-crash                        0/1       CrashLoopBackOff   1          11s

The CrashLoopBackOff status means that you have a pod starting, crashing, starting again, and then crashing again. Kubernetes attempts to restart this pod because the default restartPolicy: Always is enabled. If we had set the policy to Never, the pod would not be restarted.

The status above, however, does not indicate the precise reason for the pod's crash. Let's try to find more details:

kubectl describe pod pod-crash
Containers:
  busybox:
    Container ID:  docker://f9a67ec6e37281ff16b114e9e5a1f1c0adcd027bd1b63678ac8d09920a25c0ed
    Image:         busybox
    Image ID:      docker-pullable://busybox@sha256:141c253bc4c3fd0a201d32dc1f493bcf3fff003b6df416dea4f41046e0f37d47
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
    Args:
      -c
      MIN=5 SEC=45; echo "$(( MIN*60 + SEC + ; ))"
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Mon, 16 Jul 2018 10:32:35 +0300
      Finished:     Mon, 16 Jul 2018 10:32:35 +0300
    Ready:          False
    Restart Count:  5
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-9wdtd (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          False 
  PodScheduled   True 
Volumes:
  default-token-9wdtd:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-9wdtd
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                 Age               From               Message
  ----     ------                 ----              ----               -------
  Normal   Scheduled              5m                default-scheduler  Successfully assigned pod-crash to minikube
  Normal   SuccessfulMountVolume  5m                kubelet, minikube  MountVolume.SetUp succeeded for volume "default-token-9wdtd"
  Normal   Pulled                 4m (x4 over 5m)   kubelet, minikube  Successfully pulled image "busybox"
  Normal   Created                4m (x4 over 5m)   kubelet, minikube  Created container
  Normal   Started                4m (x4 over 5m)   kubelet, minikube  Started container
  Normal   Pulling                3m (x5 over 5m)   kubelet, minikube  pulling image "busybox"
  Warning  BackOff                2s (x23 over 4m)  kubelet, minikube  Back-off restarting failed container

The description above indicates that the pod is not Ready and that it was terminated because of the Back-Off Error. However, the description does not provide any further explanation of why the error occurred. Where should we search then?

We are most likely to find the reason for the pod's crash in the BusyBox container logs. You can check them by running kubectl logs ${POD_NAME} ${CONTAINER_NAME}. Note that ${CONTAINER_NAME} can be omitted for pods that only contain a single container (as in our case)

kubectl logs pod-crash
sh: arithmetic syntax error

Awesome! There must be something wrong with our command or arguments syntax. Indeed, we made a typo inserting ; into the expression echo "$(( MIN*60 + SEC + ; ))"'. Just fix that typo and you are good to go!

Pod Fails Due to the 'Unknown Field' Error

In the earlier versions of Kubernetes, a pod could be created even if the error was made in the spec's field name or value. In this case, the error would be silently ignored if the pod was created with the --validate flag set to false. In the newer Kubernetes versions (we are using Kubernetes 1.10.0), the --validate option is always set to true by default so the error for the unknown field is always printed. Therefore, the debugging becomes much easier. Let's create a pod with a wrong field value to illustrate this:

apiVersion: v1
kind: Pod
metadata:
  name: pod-field-error
  labels:
    app: demo
spec:
  containers:
  - name: busybox
    image: busybox
    comand: ['sh']
    args: ['-c', 'MIN=5 SEC=45; echo "$(( MIN*60 + SEC))"']

Let's save this spec in the pod-field-error.yaml and create the pod with the following command:

kubectl create -f pod-field-error.yaml
error: error validating "pod-field-error.yaml": error validating data: ValidationError(Pod.spec.containers[0]): unknown field "comand" in io.k8s.api.core.v1.Container; if you choose to ignore these errors, turn validation off with --validate=false

As you see, the pod start was blocked because of the unknown 'comand' field (we made a typo in this field intentionally). If you are using older versions of Kubernetes and the pod is created with the error silently ignored, delete the pod and run it with kubectl create --validate -f pod-field-error.yaml . This command will help you find the reason for the error:

kubectl create --validate -f pod-field-error.yaml
I0805 10:43:25.129850   46757 schema.go:126] unknown field: comand
I0805 10:43:25.129973   46757 schema.go:129] this may be a false alarm, see https://github.com/kubernetes/kubernetes/issues/6842
pods/pod-field-error

Cleaning Up

This tutorial is over, lso t's clean up after ourselves.

Delete Deployments:

kubectl delete deployment httpd-deployment
deployment.extensions "httpd-deployment" deleted
kubectl delete deployment apache-server 
deployment.extensions "apache-server" deleted

Delete Pods:

kubectl delete pod pod-crash
pod "pod-crash" deleted
kubectl delete pod pod-field-error
pod "pod-field-error" deleted

You may also want to delete files with spec definitions if you don't need them anymore.

Conclusion

As this tutorial demonstrated, Kubernetes ships with great debugging tools that help identify the reasons for pod failure or unexpected behavior at a fraction of time. The rule of thumb for Kubernetes debugging is first to find out the pod status and pod events and then check event messages by using kubectl describe or kubectl get events. If your pod crashes and the detailed error message is not available, you can check the containers' logs to find container-level errors and exceptions. These simple tools will dramatically increase your debugging speed and efficiency freeing up time for more productive work. 

Stay tuned to upcoming blogs to learn more about node-level and cluster-level debugging in Kubernetes.

Keep reading

Using Kubernetes Cron Jobs to Run Automated Tasks

Posted by Kirill Goltsman on August 29, 2018

In a previous tutorial, you learned how to use Kubernetes jobs to perform some tasks sequentially or in parallel. However, Kubernetes goes even further with task automation by enabling Jobs to create cron jobs that perform finite, time-related tasks that run repeatedly at any time you specify. Cron jobs can be used to automate a wide variety of common computing tasks such as creating database backups and snapshots, sending emails or upgrading Kubernetes applications. Before you can learn how to run cron jobs, make sure to consult our earlier tutorial about Kubernetes Jobs. If you are ready, let's delve into the basics of cron jobs where we'll show you how they work and how to create and manage them. Let's get started!

Definition of Cron Jobs

Cron (which originated from the Greek word for time χρόνος) initially was a utility time-based job scheduler in Unix-like operating system. At the OS level, cron files are used to schedule jobs (commands or shell scripts) to run periodically at fixed times, dates, or intervals. They are useful for automating system maintenance, administration, or scheduled interaction with the remote services (software and repository updates, emails, etc.). First used in the Unix-like operating systems, cron jobs implementations have become ubiquitous today. Cron Job API became a standard feature in Kubernetes in 1.8 and is widely supported by the Kubernetes ecosystem for automated backups, synchronization with remote services, system, and application maintenance (upgrades, updates, cleaning the cache) and more. Read on because we will show you a basic example of a cron job used to perform a mathematic operation.

Tutorial

To complete examples in this tutorial, you need the following prerequisites:

  • A running Kubernetes cluster at version >= 1.8 (for cron job). For previous versions of Kubernetes (< 1.8) you need to explicitly turn on batch/v2alpha1 API by passing --runtime-config=batch/v2alpha1=true to the API server (see how to do this in this tutorial), and then restart both the API server and the controller manager component. See Supergiant GitHub wiki for more information about deploying a Kubernetes cluster with Supergiant. As an alternative, you can install a single-node Kubernetes cluster on a local system using Minikube.
  • A kubectl command line tool installed and configured to communicate with the cluster. See how to install kubectl here.

Let's assume we have a simple Kubernetes jobs to calculate a π to 3000 places using perl and print out the result to stdout.

apiVersion: batch/v1
kind: Job
metadata:
  name: pi
spec:
  template:
    spec:
      containers:
      - name: pi
        image: perl
        command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(3000)"]
      restartPolicy: Never
  backoffLimit: 4

We can easily turn this simple job into a cron job. In essence, a cron job is a type of the API resource that creates a standard Kubernetes job executed at a specified date or interval. The following template can be used to turn our π job into a full-fledged cron job:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: pi-cron
spec:
  schedule: "*/1 * * * *"
  startingDeadlineSeconds: 20
  successfulJobsHistoryLimit: 5
  jobTemplate:
    spec:
      completions: 2
      template:
        metadata:
          name: pi
        spec:
          containers:
          - name: pi
            image: perl
            command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(3000)"]
          restartPolicy: Never

Let's look closely at the key fields of this spec:

.spec.schedule -- a scheduled time for the cron job to be created and executed. The field takes a cron format string, such as 0 * * * * or @hourly. The cron format string uses the format of the standard crontab (cron table) file -- a configuration file that specifies shell commands to run periodically on a given schedule. See the format in the example below:

# ┌───────────── minute (0 - 59)
# │ ┌───────────── hour (0 - 23)
# │ │ ┌───────────── day of month (1 - 31)
# │ │ │ ┌───────────── month (1 - 12)
# │ │ │ │ ┌───────────── day of week (0 - 6) (Sunday to Saturday;
# │ │ │ │ │                                       7 is also Sunday on some systems)
# │ │ │ │ │
# │ │ │ │ │
# * * * * *  command to execute

Each asterisk from the left to the right corresponds to a minute, an hour, a day of month, a month, a day of week on which to perform the cron job and the command to execute for it.

In this example, we combined a slash (/) with a 1-minute range to specify a step/interval at which to perform the job. For example, */5 written in the minutes field would cause the cron job to calculate π every 5 minutes. Correspondingly, if we wanted to perform the cron job hourly, we could write 0 */1 * * * to accomplish that. 

Format Note: The question mark (?) in the schedule field has the same meaning as an asterisk *. That is, it stands for any of available value for a given field.

.spec.jobTemplate -- a cron job's template. It has exactly the same schema as a job but is nested into a cron job and does not require an apiVersion or kind.

.spec.startingDeadlineSeconds -- a deadline in seconds for starting the cron job if it misses its schedule for some reason (e.g., node unavailability). A cron job that does not meet its deadline is regarded as failed. Cron jobs do not have any deadlines by default.

.spec.concurrencyPolicy --  specifies how to treat concurrent executions of a Job created by the cron job. The following concurrency policies are allowed:

  1. Allow (default): the cron job supports concurrently running jobs.
  2. Forbid: the cron job does not allow concurrent job runs. If the current job has not finished yet, a new job run will be skipped.
  3. Replace: if the previous job has not finished yet and the time for a new job run has come, the previous job will be replaced by a new one.

In this example, we are using the default allow policy. Computing π to 3000 places and printing out will take more than a minute. Therefore, we expect our cron job to run a new job even if the previous one has not yet completed.

.spec.suspend -- if the field is set to true, all subsequent job executions are suspended. This setting does not apply to executions which already began. The default value is false.

.spec.successfulJobsHistoryLimit -- the field specifies how many successfully completed jobs should be kept in job history. The default value is 3.

.spec.failedJobsHistoryLimit -- the field specifies how many failed jobs should be kept in job history. The default value is 1. Setting this limit to 0 means that no jobs will be kept after completion.

That's it! Now you have a basic understanding of available cron job settings and options. 

Let's continue with the tutorial. Open two terminal windows. In the first one, you are going to watch the jobs created by the cron job:

kubectl get jobs --watch

Let's save the spec above in the cron-job.yaml and create a cron job running the following command in the second terminal:

kubectl create -f cron-job.yaml
cronjob.batch "pi-cron" created

In a minute, you should see that two π jobs (as per the Completions value) were successfully created in the first terminal window:

kubectl get jobs --watch
NAME              DESIRED   SUCCESSFUL   AGE
pi-cron-1531219740   2         0         0s
pi-cron-1531219740   2         0         0s

You can also check that the cron job was successfully created by running:

kubectl get cronjobs
NAME      SCHEDULE      SUSPEND   ACTIVE    LAST SCHEDULE   AGE
pi-cron   */1 * * * *   False     1         57s             1m

Computing π to 3000 places is computationally intensive and takes more time than our cron job schedule (1 minute). Since we used the default concurrency policy ("allow"), you'll see that the cron job will start new jobs even though the previous ones have not yet completed:

kubectl get jobs --watch
NAME              DESIRED   SUCCESSFUL   AGE
pi-cron-1531219740   2         0         0s
pi-cron-1531219740   2         0         0s
pi-cron-1531219800   2         0         0s
pi-cron-1531219800   2         0         0s
pi-cron-1531219860   2         0         0s
pi-cron-1531219860   2         0         0s
pi-cron-1531219740   2         1         2m
pi-cron-1531219800   2         1         1m
pi-cron-1531219860   2         1         57s
pi-cron-1531219920   2         0         0s
pi-cron-1531219920   2         0         0s
pi-cron-1531219740   2         2         3m
pi-cron-1531219800   2         2         2m
pi-cron-1531219860   2         2         1m
pi-cron-1531219920   2         1         20s
pi-cron-1531219920   2         2         35s
pi-cron-1531219740   2         2         3m
pi-cron-1531219740   2         2         3m
pi-cron-1531219740   2         2         3m
pi-cron-1531219980   2         0         0s

As you see, some old jobs are still in process and new ones are created without waiting for them to finish. That's how Allow concurrency policy works!

Now, let's check if these jobs are computing the π correctly. To do this, simply find one pod created by the job:

kubectl get pods
NAME                       READY     STATUS             RESTARTS   AGE
pi-cron-1531220100-sbqrx   0/1       Completed          0          3m
pi-cron-1531220100-t8l2v   0/1       Completed          0          3m
pi-cron-1531220160-bqcqf   0/1       Completed          0          2m
pi-cron-1531220160-mqg7t   0/1       Completed          0          2m
pi-cron-1531220220-dzmfp   0/1       Completed          0          1m
pi-cron-1531220220-zrh85   0/1       Completed          0          1m
pi-cron-1531220280-k2ttw   0/1       Completed          0          23s

Next, select one pod from the list and check its logs:

kubectl logs pi-cron-1531220220-dzmfp 

You'll see a pi number calculated to the 3000 place after the comma (that's pretty impressive):

3.14159265358979323846264338327950288419716939937510582097494459230781640628620899862803482534211706798214808651328230664709384460955058223172535940812848111745028410270193852110555964462294895493038196442881097566593344612847564823378678316527120190914564856692346034861045432664821339360726024914127372458700660631558817488152092096282925409171536436789259036001133053054882046652138414695194151160943305727036575959195309218611738193261179310511854807446237996274956735188575272489122793818301194912983367336244065664308602139494639522473719070217986094370277053921717629317675238467481846766940513200056812714526356082778577134275778960917363717872146844090122495343014654958537105079227968925892354201995611212902196086403441815981362977477130996051870721134999999837297804995105973173281609631859502445945534690830264252230825334468503526193118817101000313783875288658753320838142061717766914730359825349042875546873115956286388235378759375195778185778053217122680661300192787661119590921642019893809525720106548586327886593615338182796823030195203530185296899577362259941389124972177528347913151557485724245415069595082953311686172785588907509838175463746493931925506040092770167113900984882401285836160356370766010471018194295559619894676783744944825537977472684710404753464620804668425906949129331367702898915210475216205696602405803815019351125338243003558764024749647326391419927260426992279678235478163600934172164121992458631503028618297455570674983850549458858692699569092721079750930295532116534498720275596023648066549911988183479775356636980742654252786255181841757467289097777279380008164706001614524919217321721477235014144197356854816136115735255213347574184946843852332390739414333454776241686251898356948556209921922218427255025425688767179049460165346680498862723279178608578438382796797668145410095388378636095068006422512520511739298489608412848862694560424196528502221066118630674427862203919494504712371378696095636437191728746776465757396241389086583264599581339047802759009946576407895126946839835259570982582262052248940772671947826848260147699090264013639443745530506820349625245174939965143142980919065925093722169646151570985838741059788595977297549893016175392846813826868386894277415599185592524595395943104997252468084598727364469584865383673622262609912460805124388439045124413654976278079771569143599770012961608944169486855584840635342207222582848864815845602850601684273945226746767889525213852254995466672782398645659611635488623057745649803559363456817432411251507606947945109659609402522887971089314566913686722874894056010150330861792868092087476091782493858900971490967598526136554978189312978482168299894872265880485756401427047755513237964145152374623436454285844479526586782105114135473573952311342716610213596953623144295248493718711014576540359027993440374200731057853906219838744780847848968332144571386875194350643021845319104848100537061468067491927819119793995206141966342875444064374512371819217999839101591956181467514269123974894090718649423196

Awesome! Our cron job works as expected. You can imagine how this functionality might be useful for making regular backups of your database, application upgrades and any other task. As it comes to automation, cron jobs are gold!

Cleaning Up

If you don’t need a cron Job anymore, delete it with kubectl delete cronjob:

$ kubectl delete cronjob pi-cron
cronjob "pi-cron" deleted

Deleting the cron job will remove all the jobs and pods it created and stop it from spawning additional jobs.

Conclusion

Hopefully, you now have a better understanding of how cron jobs can help you automate tasks in your Kubernetes application. We used a simple example that can kickstart your thought process. However, when working with the real world Kubernetes cron jobs, please, be aware of the following limitation.

A cron job creates a job object approximately once per execution time of its schedule. There are certain scenarios where two jobs are created or no Job is created at all. Therefore, to avoid side effects jobs should be idempotent, which means they should not change the data consumed by other scheduled jobs. If .spec.startingDeadlineSeconds is set to a large value or left unset (the default) and if .spec. concurrencyPolicy is set to Allow, the jobs will always run at least once. If you want to start the job notwithstanding the delay, set a longer .spec.startingDeadlineSeconds if starting your job is better than not starting it at all. If you keep these limitations and best practices in mind, your cron jobs will never let your application down.

Keep reading

Making Sense of Kubernetes Jobs

Posted by Kirill Goltsman on August 22, 2018

A Kubernetes Job is a special controller that can create one or more pods and manage them in the process of doing some finite work. Jobs ensure the pod's successful completion and allow rescheduling pods if they fail or terminate due to node hardware failure or node reboot. Kubernetes comes with a native support for parallel jobs that allow distributing workloads between multiple worker pods or performing the same task multiple times until reaching the completions count. The ability to reschedule failed pods and built-in parallelism make Kubernetes Jobs a great solution for parallel and batch processing and managing work queues in your applications.

In this article, we're going to discuss the architecture of and use cases for Kubernetes Jobs and walk you through simple examples demonstrating how to create and run your custom Jobs. Let's get started!

Why Do Kubernetes Jobs Matter?

Let's assume you have a task of calculating all prime numbers between 1 and 110 using the bash script and Kubernetes. The algorithm for calculating prime numbers is not that difficult and we could easily create a Pod with a bash command implementing it. However, using a bare pod for this kind of operation might run us into several problems.

First, the node on which your pod is running may suddenly shutdown due to hardware failure or connection issues. Consequently, the pod running on this node will also cease to exist.

Secondly, if we were to calculate all prime numbers ranging from 1 to 10000, for example, doing this in a single bash instance would be very slow. The alternative would be to split this range into several batches and assign those to multiple pods. To take a real-world example, we could create a work queue in some key-value store like Redis and make our worker pod process items in this queue until it gets empty. Using bare pods to accomplish that would be no big deal if we just needed 3-4 pods but would be a harder task if our work queue is large enough (e.g we have thousands of emails, files, and messages to process). Even if this can be done by manually created pods, the first problem is still not solved.

So what is the solution? Enter Kubernetes Jobs! They elegantly solve the above-mentioned problems. On the one hand, Jobs allow rescheduling pods to another node if the one they were running on fails. On the other hand, Kubernetes Jobs support pod parallelism with multiple pods performing connected tasks in parallel. In what follows, we will walk you through a simple tutorial that will teach you how to leverage these features of Kubernetes jobs.

Tutorial

To complete examples in this tutorial, you'll need the following prerequisites:

  • a running Kubernetes cluster. See Supergiant GitHub wiki for more information about deploying a Kubernetes cluster with Supergiant. As an alternative, you can install a single-node Kubernetes cluster on a local system using Minikube.
  • A kubectl command line tool installed and configured to communicate with the cluster. See how to install kubectl here.

In the example below, we create a Job to calculate prime numbers between 0 and 110. Let's define the Job spec first:

apiVersion: batch/v1
kind: Job
metadata:
  name: primes
spec:
  template:
    metadata:
      name: primes
    spec:
      containers:
      - name: primes
        image: ubuntu
        command: ["bash"]
        args: ["-c",  "current=0; max=110; echo 1; echo 2; for((i=3;i<=max;)); do for((j=i-1;j>=2;)); do if [  `expr $i % $j` -ne 0 ] ; then current=1; else current=0; break; fi; j=`expr $j - 1`; done; if [ $current -eq 1 ] ; then echo $i; fi; i=`expr $i + 1`; done"]
      restartPolicy: Never
  backoffLimit: 4

As you see, the Job uses the batch/v1 apiVersion, which is the first major difference from bare pods and Deployments. However, Jobs use the same PodTemplateSpec as Deployments and other controllers. In our case, we defined a pod running the ubuntu container from the public Docker Hub repository. Inside the container, we use the bash command provided by the image with a script that calculates prime numbers.

Also, we are using spec.template.spec.restartPolicy parameter set to Never to prevent a pod from restarting once the operation is completed. Finally, the field .spec.backoffLimit specifies the number of retries before the Job is considered to be failed. This might be useful in case when you want to fail a Job after some number of retries due to a logical error in the configuration etc. The default value for the .spec.backoffLimit is 6.

Let's save this spec in the job-prime.yaml and create the job running the following command:

kubectl create -f job-prime.yaml
job.batch "primes" created

Next, let's check the status of the running job:

kubectl describe jobs/primes
Name:           primes
Namespace:      default
Selector:       controller-uid=2415e4c1-802d-11e8-a389-0800270c281a
Labels:         controller-uid=2415e4c1-802d-11e8-a389-0800270c281a
                job-name=primes
Annotations:    <none>
Parallelism:    1
Completions:    1
Start Time:     Thu, 05 Jul 2018 11:26:40 +0300
Pods Statuses:  0 Running / 1 Succeeded / 0 Failed
Pod Template:
  Labels:  controller-uid=2415e4c1-802d-11e8-a389-0800270c281a
           job-name=primes
  Containers:
   primes:
    Image:      ubuntu
    Port:       <none>
    Host Port:  <none>
    Command:
      bash
    Args:
      -c
      current=0; max=110; echo 1; echo 2; for((i=3;i<=max;)); do for((j=i-1;j>=2;)); do if [  `expr $i % $j` -ne 0 ] ; then current=1; else current=0; break; fi; j=`expr $j - 1`; done; if [ $current -eq 1 ] ; then echo $i; fi; i=`expr $i + 1`; done
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
Events:
  Type    Reason            Age   From            Message
  ----    ------            ----  ----            -------
  Normal  SuccessfulCreate  46s   job-controller  Created pod: primes-bwdt7

Pay attention to several important fields in this description. In particular, the key Parallelism has a value of 1 (default) indicating that only one pod was started to do this job. In its turn, the key Completions tells that the job made one successful completion of the task (i.e prime numbers calculation). Since the pod successfully completed this task, the job was completed as well. Let's verify this by running:

kubectl get jobs
NAME      DESIRED   SUCCESSFUL   AGE
primes    1         1            5m

You can also easily check the prime numbers calculated by the bash script. In the bottom of the job description, find the name of the pod created by the Job (in our case, it is a pod named primes-bwdt7 . It is formatted as [POD_NAME][HASH_VALUE} ). Let's check the logs of this pod:

kubectl logs primes-bwdt7
1
2
3
5
7
11
13
17
19
23
29
31
37
41
....

That's it! The pod created by our Job has successfully calculated all prime numbers between 0 and 110. Above example represents a non-parallel Job. In this type of Jobs, just one pod is started unless it fails. Also, a Job is completed as soon as the pod completes successfully.

However, the Job controller also supports parallel Jobs which can create several pods working on the same task. There are two types of parallel jobs in Kubernetes: jobs with a fixed completions count and parallel jobs with a work queue. Let's discuss both of them.

Jobs with a Fixed Completions Count

Jobs with a fixed completions count create one or more pods sequentially and each pod has to complete the work before the next one is started. This type of Job needs to specify a non-zero positive value for .spec.completions which refers to a number of pods doing a task. A Job is considered completed when there is one successful pod for each value in the range 1 to .spec.completions (in other words, each pod started should complete the task). The jobs of this type may or may not specify the .spec.parallelism value. If the field value is not specified, a Job will create 1 pod. Let's test how this type of jobs works using the same spec as above with some slight modifications:

apiVersion: batch/v1
kind: Job
metadata:
  name: primes-parallel
  labels:
     app: primes
spec:
  completions: 3
  template:
    metadata:
      name: primes
      labels:
        app: primes
    spec:
      containers:
      - name: primes
        image: ubuntu
        command: ["bash"]
        args: ["-c",  "current=0; max=110; echo 1; echo 2; for((i=3;i<=max;)); do for((j=i-1;j>=2;)); do if [  `expr $i % $j` -ne 0 ] ; then current=1; else current=0; break; fi; j=`expr $j - 1`; done; if [ $current -eq 1 ] ; then echo $i; fi; i=`expr $i + 1`; done"]
      restartPolicy: Never

The only major change we made is adding the .spec.completions field set to 3, which asks Kubernetes to start 3 pods to perform the same task. Also, we set the app:primes label for our pods to access them in kubectl later.

Now, let's open two terminal windows.

In the first terminal, we are going to watch the pods created:

kubectl get pods -l app=primes -w

Save this spec in job-prime-2.yaml and create the job running the following command in the second terminal:

kubectl create -f job-prime-2.yaml
job.batch "primes-parallel" created

Next, let's watch what's happening in the first terminal window:

kubectl get pods -l app=primes -w
NAME                    READY     STATUS    RESTARTS   AGE
primes-parallel-7gsl8   0/1       Pending   0          0s
primes-parallel-7gsl8   0/1       Pending   0         0s
primes-parallel-7gsl8   0/1       ContainerCreating   0         0s
primes-parallel-7gsl8   1/1       Running   0         3s
primes-parallel-7gsl8   0/1       Completed   0         14s
primes-parallel-nsd7k   0/1       Pending   0         0s
primes-parallel-nsd7k   0/1       Pending   0         0s
primes-parallel-nsd7k   0/1       ContainerCreating   0         0s
primes-parallel-nsd7k   1/1       Running   0         4s
primes-parallel-nsd7k   0/1       Completed   0         14s
primes-parallel-ldr7x   0/1       Pending   0         0s
primes-parallel-ldr7x   0/1       Pending   0         0s
primes-parallel-ldr7x   0/1       ContainerCreating   0         0s
primes-parallel-ldr7x   1/1       Running   0         4s
primes-parallel-ldr7x   0/1       Completed   0         14s

As you see, the kubectl started three pods sequentially waiting for the current pod to complete the operation before starting the next pod.

For more details, let's check the status of the Job again:

kubectl describe jobs/primes-parallel
Name:           primes-parallel
Namespace:      default
Selector:       controller-uid=2ec4494f-8035-11e8-a389-0800270c281a
Labels:         controller-uid=2ec4494f-8035-11e8-a389-0800270c281a
                job-name=primes-parallel
Annotations:    <none>
Parallelism:    1
Completions:    3
Start Time:     Thu, 05 Jul 2018 12:24:14 +0300
Pods Statuses:  0 Running / 3 Succeeded / 0 Failed
Pod Template:
  Labels:  controller-uid=2ec4494f-8035-11e8-a389-0800270c281a
           job-name=primes-parallel
  Containers:
   primes:
    Image:      ubuntu
    Port:       <none>
    Host Port:  <none>
    Command:
      bash
    Args:
      -c
      current=0; max=70; echo 1; echo 2; for((i=3;i<=max;)); do for((j=i-1;j>=2;)); do if [  `expr $i % $j` -ne 0 ] ; then current=1; else current=0; break; fi; j=`expr $j - 1`; done; if [ $current -eq 1 ] ; then echo $i; fi; i=`expr $i + 1`; done
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
Events:
  Type    Reason            Age   From            Message
  ----    ------            ----  ----            -------
  Normal  SuccessfulCreate  1m    job-controller  Created pod: primes-parallel-b7d4s
  Normal  SuccessfulCreate  1m    job-controller  Created pod: primes-parallel-rwww4
  Normal  SuccessfulCreate  54s   job-controller  Created pod: primes-parallel-kfpqc

As you see, all three pods successfully completed the task and exited. We can verify that the Job was completed as well by running:

kubectl get jobs/primes-parallel
NAME              DESIRED   SUCCESSFUL   AGE
primes-parallel   3         3            6m

If you look into the logs of each pod, you'll see that each of them completed the prime numbers calculation successfully (to check logs, take the pod name from the Job description):

kubectl logs primes-parallel-b7d4s
1
2
3
5
7
11
13
17
19
23
29
31
...
kubectl logs primes-parallel-rwww4
1
2
3
5
7
11
13
17
19
23
29
31
...

That's it! Parallel jobs with fixed completions count are very useful when you want to perform the same task multiple times. However, what about a scenario when we want one task to be completed in parallel by several pods? Enter parallel jobs with a work queue!

Parallel Jobs with a Work Queue

Parallel jobs with a work queue can create several pods which coordinate with themselves or with some external service which part of the job to work on. If your application has a work queue implementation for some remote data storage, for example, this type of Job can create several parallel worker pods that will independently access the work queue and process it. Parallel jobs with a work queue come with the following features and requirements:

  • for this type of Jobs, you should leave .spec.completions unset.
  • each worker pod created by the Job is capable to assess whether or not all its peers are done and, thus, the entire Job is done (e.g each pod can check if the work queue is empty and exit if so).
  • when any pod terminates with success, no new pods are created.
  • once at least one pod has exited with success and all pods are terminated, then the job completes with success as well.
  • once any pod has exited with success, other pods should not be doing any work and should also start exiting.

Let's add parallelism to the previous Job spec to see how this type of Jobs work:

apiVersion: batch/v1
kind: Job
metadata:
  name: primes-parallel-2
  labels:
    app: primes
spec:
  parallelism: 3
  template:
    metadata:
      name: primes
      labels:
        app: primes
    spec:
      containers:
      - name: primes
        image: ubuntu
        command: ["bash"]
        args: ["-c",  "current=0; max=110; echo 1; echo 2; for((i=3;i<=max;)); do for((j=i-1;j>=2;)); do if [  `expr $i % $j` -ne 0 ] ; then current=1; else current=0; break; fi; j=`expr $j - 1`; done; if [ $current -eq 1 ] ; then echo $i; fi; i=`expr $i + 1`; done"]
      restartPolicy: Never

The only difference from the previous spec is that we omitted .spec.completions field, added the .spec.parallelism field and set its value to 3.

Now, let's open two terminal windows as in the previous example. In the first terminal, watch the pods:

kubectl get pods -l app=primes -w

Let's save the spec in the job-prime-3.yaml and create the job in the second terminal:

kubectl create -f job-prime-3.yaml
job.batch "primes-parallel-2" created

Next, let's see what's happening in the first terminal window:

kubectl get pods -l app=primes -w
NAME                      READY     STATUS    RESTARTS   AGE
primes-parallel-2-b2whq   0/1       Pending   0          0s
primes-parallel-2-b2whq   0/1       Pending   0         0s
primes-parallel-2-vhvqm   0/1       Pending   0         0s
primes-parallel-2-cdfdx   0/1       Pending   0         0s
primes-parallel-2-vhvqm   0/1       Pending   0         0s
primes-parallel-2-cdfdx   0/1       Pending   0         0s
primes-parallel-2-b2whq   0/1       ContainerCreating   0         0s
primes-parallel-2-vhvqm   0/1       ContainerCreating   0         0s
primes-parallel-2-cdfdx   0/1       ContainerCreating   0         0s
primes-parallel-2-b2whq   1/1       Running   0         4s
primes-parallel-2-cdfdx   1/1       Running   0         7s
primes-parallel-2-vhvqm   1/1       Running   0         10s
primes-parallel-2-b2whq   0/1       Completed   0         17s
primes-parallel-2-cdfdx   0/1       Completed   0         21s
primes-parallel-2-vhvqm   0/1       Completed   0         23s

As you see, the kubectl created three pods simultaneously. Each pod was calculating the prime numbers in parallel and once each of them completed the task, the Job was successfully completed as well.

Let's see more details in the Job description:

kubectl describe jobs/primes-parallel
Name:           primes-parallel
Namespace:      default
Selector:       controller-uid=d8bfbf9c-8038-11e8-a389-0800270c281a
Labels:         controller-uid=d8bfbf9c-8038-11e8-a389-0800270c281a
                job-name=primes-parallel
Annotations:    <none>
Parallelism:    3
Completions:    <unset>
Start Time:     Thu, 05 Jul 2018 12:50:27 +0300
Pods Statuses:  0 Running / 3 Succeeded / 0 Failed
Pod Template:
  Labels:  controller-uid=d8bfbf9c-8038-11e8-a389-0800270c281a
           job-name=primes-parallel
  Containers:
   primes:
    Image:      ubuntu
    Port:       <none>
    Host Port:  <none>
    Command:
      bash
    Args:
      -c
      current=0; max=110; echo 1; echo 2; for((i=3;i<=max;)); do for((j=i-1;j>=2;)); do if [  `expr $i % $j` -ne 0 ] ; then current=1; else current=0; break; fi; j=`expr $j - 1`; done; if [ $current -eq 1 ] ; then echo $i; fi; i=`expr $i + 1`; done
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
Events:
  Type    Reason            Age   From            Message
  ----    ------            ----  ----            -------
  Normal  SuccessfulCreate  1m    job-controller  Created pod: primes-parallel-8ggnq
  Normal  SuccessfulCreate  1m    job-controller  Created pod: primes-parallel-gbwgm
  Normal  SuccessfulCreate  1m    job-controller  Created pod: primes-parallel-66w65

As you see, all three pods succeeded in performing the task. You can also check the pods' logs to see the calculation results:

kubectl logs primes-parallel-8ggnq
1
2
3
5
7
11
...
kubectl logs primes-parallel-gbwgm
1
2
3
5
7
11
...
kubectl logs primes-parallel-66w65
1
2
3
5
7
11
...

From the logs above, it may look like our Job with a work queue acted the same way as the job with a fixed completions count. Indeed, all three pods created by the Job calculated prime numbers in the range of 1-110. However, the difference is that in this example all three pods did their work in parallel. If we had created a work queue for our worker pods and some script to process items in that queue, we could make the pods access different batches of numbers (or messages, emails etc.) in parallel until no items left in the queue. In this example, we don't have a work queue and a script to process it. That's why all three pods created by the Job did the same task to completion. However, this example is enough to illustrate the main feature of this type of Jobs - parallelism.


In the real-world scenario, we could imagine a Redis list with some work items (e.g messages, emails) in it and three parallel worker pods created by the Job (see the Image above). Each pod could have a script to requests a new message from the list, process it, and check if there are more work items left. If no more work items exist in the list, the pod accessing it would exit with success telling the controller that the work was successfully done. This notification would cause other pods to exit as well and the entire job to complete. Given this functionality, parallel jobs with a work queue are extremely powerful in processing large volumes of data with multiple workers doing their tasks in parallel.

Cleaning Up

As our tutorial is over, let's clean up all resources:

Delete the Jobs

Deleting a Job will cause all associated pods to be deleted as well.

kubectl delete job primes-parallel-2
job.batch "primes-parallel-2" deleted
kubectl delete job primes-parallel
job.batch "primes-parallel" deleted
kubectl delete job primes
job.batch "primes" deleted

Also, delete all files with the Job specs if you don't need them anymore.

Conclusion

As you have learned, Kubernetes jobs are extremely powerful in parallel computation and batch processing of diverse workloads. However, one should remember that the Job object does not support closely-communicating parallel processes commonly found in scientific computing. The job's basic use case is parallel or sequential processing of independent but related work items such as messages, emails, numbers, files etc. Whenever you need a batch processing functionality in your Kubernetes apps, Jobs will help you implement it but you'll need to design your work queue and a script to process it. In the next tutorial, we'll walk you through several job design patterns that will help you address a number of real-world scenarios for batch processing.

 

Keep reading