Supergiant Blog

Product releases, new features, announcements, and tutorials.

Managing Memory and CPU Resources for Kubernetes Namespaces

Posted by Kirill Goltsman on July 14, 2018

From the previous tutorials, you already know that Kubernetes allows specifying CPU and RAM requests and limits for containers running in a pod, a feature that is very useful for the management of resource consumption by individual pods.

However, if you are a Kubernetes cluster administrator, you might also want to control global consumption of resources in your cluster and/or configure default resource requirements for all containers.

Fortunately, Kubernetes supports cluster resource management at the namespace level. As you might already know, Kubernetes namespaces provide scopes for names and resource quota, which allow efficiently dividing cluster resources between multiple users, projects, and teams. In Kubernetes, you can define default resource request and limits, resource constraints (minimum and maximum resource requests and limits), and resource quota for all containers running in a given namespace. These features enable efficient resource utilization by applications in your cluster and help divide resources productively between different teams. For example, using resource constraints for the namespaces allows you to control how resources are used by your production and development workloads, allowing them to consume their fair share of the limited cluster resources. This can be achieved by creating separate namespaces for production and development workloads and assigning different resource constraints to them.

In this tutorial, we show your three strategies for the efficient management of your cluster resources: setting default resource requests and limits for containers, defining minimum and maximum resource constraints, and setting resource quotas for all containers in the namespace. These strategies will help you address a wide variety of use cases leveraging the full power of Kubernetes namespaces and resource management.

Tutorial

To complete examples in this tutorial, you'll need the following prerequisites:

  • A running Kubernetes cluster. See Supergiant GitHub wiki for more information about deploying a Kubernetes cluster with Supergiant. As an alternative, you can install a single-node Kubernetes cluster on a local system using Minikube.
  • A kubectl command line tool installed and configured to communicate with the cluster. See how to install kubectl here.

Example #1: Defining Default Resource Requests and Limits for Containers in a Namespace

In this example, we're going to define default requests and limits for containers in your namespace. These default values will be automatically applied to containers that do not specify their custom resource requests and limits. In this way, default resource requests and limits can impose binding resource usage policy for containers in your namespace.

As you already know, default resource requests and limits are defined at the namespace level, so we need to create a new namespace:

kubectl create namespace default-resources-config
namespace "default-resources-config" created

Default values for resource requests and limits for a namespace must be defined in a LimitRange object. We chose to use the following spec:

apiVersion: v1
kind: LimitRange
metadata:
  name: default-requests-and-limits
spec:
  limits:
  - default:
      memory: 512Mi
      cpu: 0.8
    defaultRequest:
      memory: 256Mi
      cpu: 0.4
    type: Container

The spec.limits.default field of this spec sets default resource limits and the spec.limits.defaultRequest field sets the default requests for the containers running in our namespace.

Save this spec in the limit-range-1.yaml file and create the LimitRange running the following command:

kubectl create -f limit-range-1.yaml --namespace=default-resources-config
limitrange "default-requests-and-limits" created

Now, if we create a pod in the default-resources-config namespace and omit memory or CPU requests and limits for its container, it will be assigned the default values defined for the LimitRange above. Let's create a pod to see how this works:

apiVersion: v1
kind: Pod
metadata:
  name: default-resources-demo
spec:
  containers:
  - name: default-resources-cont
    image: httpd:2.4

Let's save thispPod spec in the default-resources-demo-pod.yaml and create a pod in our namespace:

kubectl create -f default-resources-demo-pod.yaml --namespace default-resources-config
pod "default-resources-demo" created

As you see, the Apache HTTP server container in the pod has no resource requests and limits. However, since we have specified default namespace resources, they will be assigned to the container automatically.

kubectl get pod default-resources-demo --output=yaml --namespace=default-resources-config

As you see in the output below, the default resource requests and limits were automatically applied to our container:

containers:
  - image: httpd:2.4
    imagePullPolicy: IfNotPresent
    name: default-resources-cont
    resources:
      limits:
        cpu: 800m
        memory: 512Mi
      requests:
        cpu: 400m
        memory: 256Mi

It's as simple as that!

However, what happens if we specify only requests or limits but not both? Let's create a new pod with only resource limits specified to check this:

apiVersion: v1
kind: Pod
metadata:
  name: default-resources-demo-2
spec:
  containers:
  - name: default-resources-cont
    image: httpd:2.4
    resources:
      limits:
        memory: "1Gi"
        cpu: 1

Let's save this spec in default-resources-demo-pod-2.yaml and create the pod in our namespace:

kubectl create -f default-resources-demo-pod-2.yaml --namespace default-resources-config
pod "default-resources-demo-2" created

Now, check the container resources assigned:

kubectl get pod default-resources-demo-2 --output=yaml --namespace default-resources-config

The response should be:

containers:
  - image: httpd:2.4
    imagePullPolicy: IfNotPresent
    name: default-resources-cont
    resources:
      limits:
        cpu: "1"
        memory: 1Gi
      requests:
        cpu: "1"
        memory: 1Gi

As you see, Kubernetes automatically set resource requests to match the limits specified by the container. Please pay attention to the fact that these values are applied even though the container did not initially specify resource requests.

Next, let's see what happens if memory and CPU requests are specified and resource limits are omitted. Create a spec for the third pod:

apiVersion: v1
kind: Pod
metadata:
  name: default-resources-demo-3
spec:
  containers:
  - name: default-resources-cont
    image: httpd:2.4
    resources:
      requests:
        memory: "0.4Gi"
        cpu: 0.6

Let's save the spec in the default-resources-demo-pod-3.yaml and create the pod in our namespace:

kubectl create -f default-resources-demo-pod-3.yaml --namespace default-resources-config
pod "default-resources-demo-3" created

After the pod has been created, check the container resources assigned:

kubectl get pod default-resources-demo-3 --output=yaml --namespace default-resources-config

You should get the following output in your terminal:

containers:
  - image: httpd:2.4
    imagePullPolicy: IfNotPresent
    name: default-resources-cont
    resources:
      limits:
        cpu: 800m
        memory: 512Mi
      requests:
        cpu: 600m
        memory: 429496729600m

As you see, the container was assigned the default limits and resource requests specified above.

Note: if the container's memory and CPU requests are greater than the default resource limits, the pod won't be created.

Cleaning Up

Let's clean up after this example is completed.

Delete the namespace:

kubectl delete namespace default-resources-config
namespace "default-resources-config" deleted

Example #2 : Setting Min and Max Resource Constraints for the Namespace

In this example, we're going to create resource constraints for the namespace. These constraints are essentially minimum and maximum resource amounts which containers can use in their resource requests and limits. Let's see how it works!

As in the previous example, create a namespace first:

kubectl create namespace resource-constraints-demo
namespace "resource-constraints-demo" created

Next, you are going to create a LimitRange for this namespace:

apiVersion: v1
kind: LimitRange
metadata:
  name: resource-constraints-lr
spec:
  limits:
  - max:
      memory: 1Gi
      cpu: 0.8
    min:
      memory: 500Mi
      cpu: 0.3
    type: Container

Save this LimitRange in the limit-range-2.yaml and create it:

kubectl create -f limit-range-2.yaml --namespace resource-constraints-demo
limitrange "resource-constraints-lr" created

After the LimitRange was created, let's see if our minimum and maximum resource constraints were applied to the namespace:

kubectl get limitrange resource-constraints-lr --namespace resource-constraints-demo --output=yaml 

The response should be:

spec:
  limits:
  - default:
      cpu: 800m
      memory: 1Gi
    defaultRequest:
      cpu: 800m
      memory: 1Gi
    max:
      cpu: 800m
      memory: 1Gi
    min:
      cpu: 300m
      memory: 500Mi
    type: Container

As you see, the default resource requests and limits for your namespace were automatically set equal to the max resource constraint specified in the LimitRange. Now, when we create containers in the resource-constraints-demo namespace, the following rules automatically apply:

  • If the container does not specify its resource request and limit, the default resource request and limit are applied.
  • All containers in the namespace need to have resource requests greater than or equal to 300m for CPU and 500 Mi for memory.
  • All containers in the namespace need to have resource limits less than or equal to 800m for CPU and 1Gi for memory.

Let's create a pod to illustrate how namespace resource constraints are applied to containers:

apiVersion: v1
kind: Pod
metadata:
  name: resource-constraints-pod
spec:
  containers:
  - name: resource-constraints-ctr
    image: httpd:2.4
    resources:
      limits:
        memory: "900Mi"
        cpu: 0.7
      requests:
        memory: "600Mi"
        cpu: 0.4

This spec requests 600Mi of RAM and 0.4 CPU and sets a limit of 900Mi RAM and 0.7 CPU for the httpd container within this pod. These resource requirements meet the minimum and maximum constraints for the namespace.

Let's save this spec in the resource-constraints-pod.yaml and create the pod in our namespace:

kubectl create -f resource-constraints-pod.yaml --namespace resource-constraints-demo
pod "resource-constraints-pod" created

Next, check the resources assigned to the container in the pod:

kubectl get pod resource-constraints-pod --namespace resource-constraints-demo --output=yaml

You should get the following output:

containers:
  - image: httpd:2.4
    imagePullPolicy: IfNotPresent
    name: resource-constraints-ctr
    resources:
      limits:
        cpu: 700m
        memory: 900Mi
      requests:
        cpu: 400m
        memory: 600Mi

That's it! The pod was successfully created because the container's request and limit are within the minimum and maximum constraints for the namespace.

Now, let's see what happens if we specify requests and limits beyond the minimum and maximum values defined for the namespace. Let's create a new pod with new requests and limits:

apiVersion: v1
kind: Pod
metadata:
  name: resource-constraints-pod-2
spec:
  containers:
  - name: resource-constraints-ctr-2
    image: httpd:2.4
    resources:
      limits:
        memory: "1200Mi"
        cpu: 1.2
      requests:
        memory: "200Mi"
        cpu: 0.2

Save this spec in the resource-constraints-pod-2.yaml and create the pod in our namespace:

kubectl create -f resource-constraints-pod-2.yaml --namespace resource-constraints-demo
pod "resource-constraints-pod-2" created

Since resource requests are below the minimum LimitRange value and resource limits are above the maximum values for this namespace, the pod won't be created as expected:

Error from server (Forbidden): error when creating "resource-constraints-pod-2.yaml": pods "resource-constraints-pod-2" is forbidden: [minimum memory usage per Container is 500Mi, but request is 200Mi., minimum cpu usage per Container is 300m, but request is 200m., maximum cpu usage per Container is 800m, but limit is 1200m., maximum memory usage per Container is 1Gi, but limit is 1200Mi.]

Cleaning Up

This example is over, so let's delete the namespace with all associated pods and other resources:

kubectl delete namespace resource-constraints-demo
namespace "resource-constraints-demo" deleted

Example # 3: Setting Memory and CPU Quotas for a Namespace

In the previous example, we set resource constraints for individual containers running within a namespace. However, it is also possible to restrict the resource request and limits total for all containers running in a namespace. This can be easily achieved with a ResourceQuota resource object defined for the namespace.

To illustrate how resource quotas work, let's first create a new namespace so that resources created in this exercise are isolated from the rest of your cluster:

kubectl create namespace resource-quota-demo
namespace "resource-quota-demo" created

Next, let's create a ResourceQuota object with resources quotas for our namespace:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: resource-quota
spec:
  hard:
    requests.cpu: "1.4"
    requests.memory: 2Gi
    limits.cpu: "2"
    limits.memory: 3Gi

This ResourceQuota sets the following requirements for the namespace:

  • ResouceQuota imposes the requirement for each container to define its memory and CPU requests and limits.
  • The memory request total for all containers must not exceed 2Gi.
  • The CPU request total for all containers in the namespace should not exceed 1.4 CPU.
  • The memory limit total for all containers in the namespace should not exceed 3Gi.
  • The CPU limit total for all containers in the namespace should not exceed 2 CPU.

Save this spec in the resource-quota.yaml and create the ResourceQuota running the following command:

kubectl create -f resource-quota.yaml --namespace resource-quota-demo
resourcequota "resource-quota" created

The ResouceQuota object was created in our namespace and is ready to control total requests and limits by all containers in that namespace. Let's see the ResourceQuota description:

kubectl get resourcequota --namespace resource-quota-demo --output=yaml

The response should be:

hard:
      limits.cpu: "2"
      limits.memory: 3Gi
      requests.cpu: 1400m
      requests.memory: 2Gi
  status:
    hard:
      limits.cpu: "2"
      limits.memory: 3Gi
      requests.cpu: 1400m
      requests.memory: 2Gi
    used:
      limits.cpu: "0"
      limits.memory: "0"
      requests.cpu: "0"
      requests.memory: "0"
kind: List

This output shows that no memory and CPU have been yet consumed in the namespace. Let's create two pods to change this situation.

The first pod will request 1.3Gi of RAM and 0.8 CPU and have a resource limit of 1.2 CPU and 2Gi of RAM.

apiVersion: v1
kind: Pod
metadata:
  name: resource-quota-pod-1
spec:
  containers:
  - name: resource-quota-ctr-1
    image: httpd:2.4
    resources:
      limits:
        memory: "2Gi"
        cpu: 1.2
      requests:
        memory: "1.3Gi"
        cpu: 0.8

Save this spec in the resource-quota-pod-1.yaml and create the pod in our namespace:

kubectl create -f resource-quota-pod-1.yaml --namespace resource-quota-demo
pod "resource-quota-pod-1" created

The pod was successfully created because the container's requests and limits are within the resource quota set for the namespace. Let's verify this by checking the current amount of used resources in the ResourceQuota object:

kubectl get resourcequota --namespace resource-quota-demo --output=yaml

The response should be:

status:
    hard:
      limits.cpu: "2"
      limits.memory: 3Gi
      requests.cpu: 1400m
      requests.memory: 2Gi
    used:
      limits.cpu: 1200m
      limits.memory: 2Gi
      requests.cpu: 800m
      requests.memory: 1395864371200m

As you see, the first pod has consumed some of the resources available in the ResourceQuota. Let's create another pod to increase the consumption of available resources even further:

apiVersion: v1
kind: Pod
metadata:
  name: resource-quota-pod-2
spec:
  containers:
  - name: resource-quota-ctr-2
    image: httpd:2.4
    resources:
      limits:
        memory: "1.3Gi"
        cpu: 0.9
      requests:
        memory: "1Gi"
        cpu: 0.8

Save this spec in the resource-quota-pod-2.yaml and create the pod:

kubectl create -f resource-quota-pod-2.yaml --namespace resource-quota-demo

Running this command will cause the following error:

Error from server (Forbidden): error when creating "resource-quota-pod-2.yaml": pods "resource-quota-pod-2" is forbidden: exceeded quota: resource-quota, requested: limits.cpu=900m,limits.memory=1395864371200m,requests.cpu=800m,requests.memory=1Gi, used: limits.cpu=1200m,limits.memory=2Gi,requests.cpu=800m,requests.memory=1395864371200m, limited: limits.cpu=2,limits.memory=3Gi,requests

As you see, Kubernetes does not allow us to create this pod because the container's CPU and RAM requests and limits exceed the ResourceQuota requirements for this namespace.

Cleaning Up

This example is completed, so let's clean up:

Delete the namespace:

kubectl delete namespace resource-quota-demo
namespace "resource-quota-demo" deleted

Conclusion

That's it! We have discussed how to set default resource requests and limits, and how to create resource constraints and resource quotas for containers in Kubernetes namespaces.

As you've seen, by setting default requests and limits for containers in your namespace, you can impose namespace-wide resource policies automatically applicable to all containers without manually specified resource requests and limits.

In addition, you learned how to use resource constraints to impose limitations on the quantity of resources consumed by containers in your namespace. This feature facilitates the efficient management of resources by different application classes and teams and ensures constant availability of free resources in your cluster. The same effect (but at a larger scale) can be achieved by resource quotas which allow defining resource constraints for the total consumption of resources by all containers in the namespace.

 

Keep reading

Introduction to Init Containers in Kubernetes

Posted by Kirill Goltsman on July 10, 2018

You may be familiar with a concept of Init scripts -- programs that configure runtime, environment, dependencies, and other prerequisites for the applications to run. Kubernetes implements similar functionality with Init containers that run before application containers are started. In order for the main app to start, all commands and requirements specified in the Init container should be successfully met. Otherwise, a pod will be restarted, terminated or stay in the pending state until the Init container completes. 

In this article, we discuss the peculiarities of Init containers compared to regular containers, look into their basic use cases, and walk you through a simple tutorial to create your own Init containers in Kubernetes. Let's start!

How Are Init Containers Different?

In a nutshell, Init containers are much like regular Kubernetes containers. They have all of the fields of an app container and are defined within the same pod spec. However, there are several design differences you should know before you build your Init containers. These are the most important:

  • Init containers always run to completion. In contrast, regular application containers can be either always running (e.g., Nginx or Apache HTTP server) or running to completion.

  • If multiple Init containers are specified, they run sequentially. That is, each of the Init containers must successfully complete before the next one is started.

  • Init containers cannot have readiness probes that are designed to watch whether application containers are prepared to accept the traffic. The Init container, by definition, exists because the pod isn't yet ready and needs an extra container to help. This means that Init containers should never have a readiness probe because having one would mean the pod could be declared ready (based on the probe's response from the Init container) before the main container was running, which would be a false positive.

  • Resource requests and limits for Init containers are managed differently than for regular containers. In particular, the following resource usage rules apply:

    • The effective init request/limit is the highest resource request and limit among all Init containers scheduled for running.
    • Because scheduling is done using an effective requests/limits (see the bullet point above), Init containers can reserve resources not used during the pod lifecycle.

How Can You Use Init Containers?

There are a number of use cases for Init containers that can be leveraged in your applications:

  • Init containers can run utilities and/or command that you don't want to include in the app container image for security reasons.
  • Init containers can contain the Init or setup code that is not present in the app image. Instead of making an image FROM another image to include this code or utilities, you can simply use it in the Init container. In this way, you can make your app containers more lightweight and modular. For example, you can use tools like awk or python in your Init containers to set up the environment for your app containers.
  • Similarly to app containers, Init containers use Linux namespaces. Because these namespaces are different from the namespaces of app containers, Init containers end up with their unique filesystem views. You can leverage these filesystem views to give Init containers access to secrets that app containers cannot access.
  • Since Init container run sequentially, you can easily use them to block and delay the startup of app containers until some prerequisites are met.

We illustrate some of these use cases for Init containers in the tutorial, so let's start!

Tutorial

To complete examples in this tutorial, you'll need the following prerequisites:

  • A running Kubernetes cluster. See Supergiant GitHub wiki for more information about deploying a Kubernetes cluster with Supergiant. As an alternative, you can install a single-node Kubernetes cluster on a local system using Minikube.
  • A kubectl command line tool installed and configured to communicate with the cluster. See how to install kubectl here.

Example #1: Using Init Container to Clone a GitHub Repository into a Volume

You can use Init containers as a convenient alternative to the gitRepo volume type (that is in the deprecation stage) for cloning a GitHub repository into a volume to make the repo accessible to app containers. Unlike gitRepo volumes that use kubelet, in this approach the git command is run directly within the Init container. Also, cloning GitHub repos from the Init container avoids the kubelet shelling out to git on the host, which may raise security issues on the untrusted repos. To illustrate how this works, let's first define a pod spec with our init container:

apiVersion: v1
kind: Pod
metadata:
  name: init-demo
  labels:
    app: init
spec:
  containers:
  - name: app-container
    image: busybox
    command: ['sh', '-c', 'echo The app is running! && sleep 3600']
  initContainers:
  - name: init-clone-repo
    image: alpine/git
    args:
        - clone
        - --single-branch
        - --
        - https://github.com/supergiant/supergiant.git
        - /Users/kirillgoltsman/repo 
    volumeMounts:
    - name: git-repo
      mountPath: /Users/kirillgoltsman/repo
  volumes:
  - name: git-repo
    hostPath:
         path: /Users/kirillgoltsman/repo

Note: replace the directory name in the spec.initContainers[0].args , spec.initContainers[0].volumeMounts, and spec.volumes.hostPath.path with your own user directory without root permissions. The path to this directory should be /home/<user-name>/repo on Linux and /Users/<user-name>/repo on Mac. Also, note that the directory with this name must not exist or must be empty.

The spec above will create a pod with two containers. The first one is the application container running BusyBox container image. The sh command of this container echoes a random greeting message to stdout. The second one is the Init container that clones a GitHub repository and saves it to the hostPath Volume. Let's describe the most important parameters of this spec in more detail:

spec.initContainers[0].image -- the container image pulled into the Init container. We use alpine/git container image from the Docker Hub repository, but any image containing git command will do.

spec.initContainers[0].args -- arguments for the container's command. The default command (EntryPoint) of the alpine/git container is git. Remember that when we do not specify a command for a container and just define the args field, the arguments we specify will be used with the default EntryPoint replacing Cmd (i.e., default arguments) of the image. Since we don't specify a command, we expect that the args will run with the default git command used in the image. The arguments that we defined tell git to clone the Supergiant's GitHub repository and save it to our /repo directory. As simple as that!

spec.initContainers[0].volumeMounts -- we are mounting hostPath volume specified in spec.volumes at the /repo mountPath of our Init container. This will be a directory to save our cloned GitHub repo to.

Let's check how this setup works. Save the spec in the init-demo.yaml, and create the pod running the following command:

kubectl create -f init-demo.yaml
pod "init-demo" created

Let's see the pod's status immediately upon the creation:

kubectl get pod init-demo 
NAME        READY     STATUS     RESTARTS   AGE
init-demo   0/1       Init:0/1   0          7s

As you see, the Status field indicates that the Init container has not yet completed, and, hence, our main application pod is not Ready yet.

You can see more details by running kubectl describe pod init-demo:

kubectl describe pod init-demo
...
Events:
  Type    Reason                 Age   From               Message
  ----    ------                 ----  ----               -------
  Normal  Scheduled              3m    default-scheduler  Successfully assigned init-demo to minikube
  Normal  SuccessfulMountVolume  3m    kubelet, minikube  MountVolume.SetUp succeeded for volume "git-repo"
  Normal  SuccessfulMountVolume  3m    kubelet, minikube  MountVolume.SetUp succeeded for volume "default-token-9wdtd"
  Normal  Pulling                3m    kubelet, minikube  pulling image "alpine/git"
  Normal  Pulled                 3m    kubelet, minikube  Successfully pulled image "alpine/git"
  Normal  Created                3m    kubelet, minikube  Created container
  Normal  Started                3m    kubelet, minikube  Started container
  Normal  Pulling                3m    kubelet, minikube  pulling image "busybox"
  Normal  Pulled                 3m    kubelet, minikube  Successfully pulled image "busybox"
  Normal  Created                3m    kubelet, minikube  Created container
  Normal  Started                3m    kubelet, minikube  Started container

Scroll down to pod events to see the pod's history. As you might have noticed, first, the kubelet created and started your Init container. Only after the container was completed did it create and start the main application container.

To verify what the Init container was doing, let's check its logs:

kubectl logs init-demo  -c init-clone-repo
Cloning into '/Users/kirillgoltsman/repo'...

As you see, the container was cloning the Supergiant repo into our /Users/<user-name>/repo directory. Let's verify that the cloning was successful by checking the host directory created on your node:

ls /Users/<user-name/repo
CODE_OF_CONDUCT.md    cmd            scripts
LICENSE            config            test
Makefile        docker-compose.yml    tmp
README.md        docs            ui
build            pkg            vendor

Great! The init container has successfully cloned Supergiant repo to the /repo folder on your host.

Example #2: Creating Init Containers that Wait for the Service to be Created

In the second example, we'll look into a scenario when the Init container waits for a service to be created. Until the service is created, the main application container stays in the pending state. Once the service is created, the Init container completes and kubelet starts the application container. This setup might be useful when your pod is dependent on certain service/s to run. Let's define a spec:

apiVersion: v1
kind: Pod
metadata:
  name: init-pod
  labels:
    app: init-pod
spec:
  containers:
  - name: app-container
    image: busybox
    command: ['sh', '-c', 'echo The app is running! && sleep 3600']
  initContainers:
  - name: init-myservice
    image: busybox
    command: ['sh', '-c', 'until nslookup init-service; do echo waiting for init-service; sleep 2; done;']

This spec creates a pod with two containers. As in the first example, the first one runs BusyBox shell command that echoes a custom message from the container. The second container is the Init container that uses the nslookup command from the BusyBox container to periodically look for your service. This command will be waiting for the service until it is defined and created and becomes accessible by DNS. Afterward, the Init container exits/completes.

Let's save this spec in the init-pod.yaml and create the Pod as usual:

kubectl create -f  init-pod.yaml
pod "init-pod" created

Now, let's check the status of this pod:

kubectl get pod init-pod 
NAME       READY     STATUS     RESTARTS   AGE
init-pod   0/1       Init:0/1   0          42s

As you see, the pod is not ready yet because the Init container was not completed. In fact, the main application container will never run unless we create a service pinged by the nslookup command.

Let's see a detailed description of the pod status:

kubectl describe pod init-pod
Name:         init-pod
Namespace:    default
Node:         minikube/10.0.2.15
Start Time:   Fri, 22 Jun 2018 15:53:00 +0300
Labels:       app=init-pod
Annotations:  <none>
Status:       Pending

You see that the status of the pod is pending. Our container is still running because the service was not created and the app container is not ready.

Now, let's create a simple service that our Init container is waiting for:

kind: Service
apiVersion: v1
metadata:
  name: init-service
spec:
  ports:
  - protocol: TCP
    port: 8080
    targetPort: 4301

Save this spec in the init-service.yaml and create it with the following command:

kubectl create -f init-service.yaml
service "init-service" created

Now, if you check our pod, you'll see that its status has changed to running. That is because our nslookup command finally found the init-service .

kubectl get pod init-pod
NAME            READY     STATUS             RESTARTS   AGE
init-pod        1/1       Running            0          25s

If you look into the logs of init-service container, you'll find what our init container was doing:

kubectl logs init-pod -c init-myservice 
nslookup: can't resolve 'init-service'
waiting for init-service
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
nslookup: can't resolve 'init-service'
waiting for init-service
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name:      init-service
Address 1: 10.111.178.37 init-service.default.svc.cluster.local

As you see, the nslookup command could not resolve the service until it was finally created and assigned the following DNS sub-domain: init-service.default.svc.cluster.local. Once the service was found, our Init container successfully completed and terminated.

Cleaning Up

Our tutorial is over, so let's clean up after ourselves.

Delete the pods:

kubectl delete pod init-pod
pod "init-pod" deleted
kubectl delete pod init-demo
pod "init-demo" deleted

Delete /repo directory:

rm -r /Users/<user-name/repo

Delete our Init service:

kubectl delete service init-service
service "init-service" deleted

Finally, delete all files with the specs we created if you don't need them anymore.

Conclusion

That's it! You have learned how Init containers can be easily used to set up or download necessary prerequisites for your Kubernetes application. You can use Init containers to download repositories or files required by your application containers or block the pod initialization process until certain requirements are met. Instead of integrating your Init scripts into applications, you can opt for Init containers to ensure separation of concerns (SoC) and modularity of your deployment code. That's the most powerful advantage of Init containers! Stay tuned for new content about managing Kubernetes containers to find out more.

Keep reading

Creating Liveness Probes for your Node JS application in Kubernetes

Posted by Kirill Goltsman on July 7, 2018

Kubernetes is extremely powerful in tracking the health of running applications and intervening when something goes wrong. In this tutorial, we teach you how to create liveness probes to test the health and availability of your applications. Liveness probes can catch situations when the application is no longer responding or unable to make progress and restart it. We address the case of HTTP liveness probes that send a request to the application's back-end (e.g., some server) and decide whether the application is healthy based on its response. We'll show examples of both successful and failed liveness probes. Let's start!

Benefits of Liveness Probes

Normally, when Kubernetes notices that your application has crashed, kubelet will simply restart it.

However, there are situations when the application has crashed or deadlocked without actually terminating. That's exactly a situation where liveness probes can shine! With a few lines in your pod or deployment spec, liveness probes can turn your Kubernetes application into a self-healing organism, providing:

  • zero downtime deployments
  • simple and efficient health monitoring implemented in any way you prefer
  • identification of potential bugs and deficiencies in your application

Now. we are going to show these benefits in action walking you through examples of successful and failed liveness probe. Let's start!

Tutorial

In this tutorial, we create a liveness probe for a simple Node JS server. The liveness probe will send HTTP requests to certain server routes and responses from the server will tell Kubernetes whether the liveness probe has passed or failed.

Prerequisites

To complete examples in this tutorial, you'll need:

  • a running Kubernetes cluster. See Supergiant GitHub wiki for more information about deploying a Kubernetes cluster with Supergiant. As an alternative, you can install a single-node Kubernetes cluster on a local system using Minikube.
  • A kubectl command line tool installed and configured to communicate with the cluster. See how to install kubectl here.

Step 1: Creating a Node JS App Prepared for Liveness Probes

To implement a working liveness probe, we had to design a containerized application capable of processing it. For this tutorial, we containerized a simple Node JS web server with two routes configured to process requests from the liveness probes. The application was containerized using Docker container runtime and pushed to the public Docker repository. The code that implements basic server functionality and routing is located in the server.js file:

'use strict';
const express = require('express');
// Constants
const PORT = 8080;
const HOST = '0.0.0.0';
// App
const app = express();
app.get('/', (req, res) => {
  res.send('Hello world');
});
app.get('/health-check',(req,res)=> {
 res.send ("Health check passed");
});
app.get('/bad-health',(req,res)=> {
    res.status(500).send('Health check did not pass');
});
app.listen(PORT, HOST);
console.log(Running on http://${HOST}:${PORT}); 

In this file, we've configured three server routes responding to client GET requests. The first one serves requests to the server's web root path / that sends a basic greeting from the server:

app.get('/', (req, res) => {
  res.send('Hello world');
});

The second path named /health-check returns a 200 HTTP success status telling a liveness probe that our application is healthy and running. By default, any HTTP status code greater than or equal to 200 and less than 400 indicates success. Status codes greater than 400 indicate failure.

app.get('/health-check',(req,res)=> {
 res.send ("Health check passed");
}); 

Finally, if a liveness probe accesses the third route named /bad-health , the server will respond with a 500 status code telling kubelet that the application has crashed or deadlocked.

app.get('/bad-health',(req,res)=> {
    res.status(500).send('Health check did not pass');
});

This application is just a simple example to illustrate how you can configure your server to respond to liveness probes. All you need to implement HTTP liveness probes is to allocate some paths in your application and expose your server's port to Kubernetes. As simple as that!

Step 2: Configure your Pod to use Liveness Probes

Let's create a pod spec defining a liveness probe for our Node JS application:

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness-http
spec:
  containers:
  - name: liveness
    image: supergiantkir/k8s-liveliness
    ports:
    - containerPort: 8080
    livenessProbe:
      httpGet:
        path: /health-check
        port: 8080
      initialDelaySeconds: 3
      periodSeconds: 3
      failureThreshold: 2 

Let's discuss key fields of this spec related to liveness probes:

  • spec.containers.livenessProbe.httpGet.path -- a path on the HTTP server that processes a liveness probe. Note: by default, spec.livenessProbe.httpGet.host is set to the pod's IP. Since we will access our application from within the cluster, we don't need to specify the external host.
  • spec.containers.livenessProbe.httpGet.port -- a name or a number of the port to access the HTTP server on. A port's number must be in the range of 1 to 65535.
  • spec.containers.livenessProbe.initialDelaySeconds -- number of seconds since the container has started before the liveness probe can be initiated.
  • spec.containers.livenessProbe.periodSeconds -- how often to perform the liveness probe. Default value is 10 seconds and the minimum value is 1.
  • spec.containers.livenessProbe.failureThreshold: -- a number of tries to perform the liveness probe if the probe fails on pod start. Giving up any attempts to perform a liveness probe means restarting the pod. The default value for this field is 3 and the minimum value is 1.

Let's save this spec in liveness.yaml and create the pod running the following command:

kubectl create -f liveness.yaml
pod "liveness-http" created

As you see. we defined /health-check as a server path for our liveness probe. In this case, our Node JS server will always return the success 200 status code. This means that the liveness probe will always succeed and the pod will continue running.

Let's get a shell to our application container to see responses sent by the server:

kubectl exec -it liveness-http -- /bin/bash

When inside the container, install cURL to send GET requests to the server:

apt-get update
apt-get install curl

Now, we can try to access the server to check a response from the /health-check route (Don't forget that the server is listening on port 8080):

curl localhost:8080/health-check
Health check passed

If the liveness probe passes (as in this example), the pod will continue running without any errors and restarts triggered. However, what happens when the liveness probe fails?

To illustrate that, let's change the server path indicated in the field livenessProbe.httpGet.path to /bad-health. First, exit the shell from the container typing exit and then change path name in the liveness.yaml file. Once necessary changes are made, delete the pod.

kubectl delete pod liveness-http
pod "liveness-http" deleted

Then, let's create the pod one more time.

kubectl create -f liveness.yaml
pod "liveness-http" created

Now, our liveness probe will be sending requests to the /bad-health path that return a 500 HTTP error. This error will make kubelet restart the pod. Since our liveness probe always fails, the pod will be never running again. Let's verify that the liveliness probe actually fails:

kubectl describe pod liveness-http

Check pod events at the end of the pod description:

Events:
  Type     Reason                 Age                From               Message
  ----     ------                 ----               ----               -------
  Normal   Scheduled              1m                 default-scheduler  Successfully assigned liveness-http to minikube
  Normal   SuccessfulMountVolume  1m                 kubelet, minikube  MountVolume.SetUp succeeded for volume "default-token-9wdtd"
  Normal   Started                1m (x3 over 1m)    kubelet, minikube  Started container
  Warning  Unhealthy              57s (x4 over 1m)   kubelet, minikube  Liveness probe failed: HTTP probe failed with statuscode: 500
  Normal   Killing                57s (x3 over 1m)   kubelet, minikube  Killing container with id docker://liveness:Container failed liveness probe.. Container will be killed and recreated.
  Warning  BackOff                56s (x2 over 57s)  kubelet, minikube  Back-off restarting failed container
  Normal   Pulling                42s (x4 over 1m)   kubelet, minikube  pulling image "supergiantkir/k8s-liveliness"
  Normal   Pulled                 40s (x4 over 1m)   kubelet, minikube  Successfully pulled image "supergiantkir/k8s-liveliness"
  Normal   Created                40s (x4 over 1m)   kubelet, minikube  Created container

First, as you might have noticed, the liveness probe started exactly after three seconds specified in the spec.containers.livenessProbe.initialDelaySeconds. Afterward, the probe failed with a status code 500 that triggered killing and recreating the container.

That's it! Now you know how to create liveliness probes to check the health of your Kubernetes applications.

Note: In this tutorial, we used two server routes always returning either success or error status codes. This is enough to illustrate how liveness probes work, however, in production, you'll need to have one route that will evaluate the healthiness of your application and send either success or failure response back to kubelet.

Step 3: Cleaning Up

Our tutorial is over, so let's clean up after ourselves.

  1. Delete the pod:
kubectl delete pod liveness-http
pod "liveness-http" deleted

  2. Delete the liveness.yaml file where you saved it.

Conclusion

As you saw, liveness probes are extremely powerful in maintaining your applications healthy and ensuring their zero downtime. In the next tutorial, we'll learn about readiness probes -- another important health check procedure in Kubernetes. Kubelet uses them to decide when a container is ready to start accepting traffic. Stay tuned for our blog updates to find out more!

Keep reading

Managing Kubernetes Deployments

Posted by Kirill Goltsman on July 2, 2018

In "Introduction to Kubernetes Pods", we discussed how to create pods using the deployment controller. As you might remember, deployments ensure that the desired number of pod replicas (ReplicaSet) you specified in the deployment spec always matches the actual state. In addition to this basic functionality, the deployment controller also offers a variety of options for managing your deployments on the fly. In this tutorial, we discuss some of the most common ways to manage deployments: executing rolling updates, rollouts, rollbacks, and scaling your applications. By the end of this tutorial, you'll have everything you need to properly manage stateless apps both in test and production environments. Let's start!

Prerequisites

To complete examples in this tutorial, you'll need:

  • a running Kubernetes cluster. See Supergiant GitHub wiki for more information about deploying a Kubernetes cluster with Supergiant. As an alternative, you can install a single-node Kubernetes cluster on a local system using Minikube.
  • a kubectl command line tool installed and configured to communicate with the cluster. See how to install kubectl here.

Step 1: Create the Deployment

In this example, we're going to define a deployment that creates a ReplicaSet of 3 Apache HTTP servers (httpd container) pulled from the public Docker Hub repository. Correspondingly, three pod replicas are the initial desired state of the deployment. Let's take a look at the deployment resource object:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: apache-server
  labels:
    app: httpd
spec:
  replicas: 3
  selector:
    matchLabels:
      app: httpd
  strategy:
    type: RollingUpdate
    rollingUpdate: 
      maxSurge: 40%
      maxUnavailable: 40%
  template:
    metadata:
      labels:
        app: httpd
    spec:
      containers:
      - name: httpd
        image: httpd:2-alpine
        ports:
        - containerPort: 80

Let's discuss key fields of this spec:

.metadata.name -- the name of the deployment. The name must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g., 'apache-server')

.metadata.labels -- the deployment's label. 

.spec.replicas -- a number of pod replicas for the deployment. The default value is 1.

.spec.selector -- specifies a label selector for the pods targeted by the deployment. This field should match spec.template.metadata.labels in the PodTemplateSpec of the deployment. Note: If you have multiple controllers with the identical selectors, they might not behave correctly, and conflicts might arise. Therefore, by all means, avoid overlapping selectors in your controllers. Also, In the API version apps/v1, a deployment’s label selector is immutable after it is created. However, users should be also cautious when updating the deployment's label even in the previous API versions (this is not recommended).

.spec.strategy -- a strategy used to replace old pods with new ones. Values for this field can be 'Recreate' or 'RollingUpdate',  the latter being the default value.

  • Recreate: when the recreate update strategy is specified, all existing pods are terminated before new ones are created. This strategy is appropriate for testing purposes, but it is not recommended when you are building highly available applications that need to be always running.
  • RollingUpdate: Rolling updates allow updates to take place with no downtime by using the incremental update of pods instances with new ones. If the RollingUpdate strategy is selected, you can set maxSurge and maxUnavailable parameters (see the discussion below).

.spec.strategy.rollingUpdate.maxUnavailable -- specifies the maximum number of pods that can be unavailable during the update. You can set the value as an absolute value (e.g., 4) or as a percentage of desired pod (e.g., 20%). The default value for this parameter is 25%. For example, if you set maxUnavailable to 40%, the old ReplicaSet can be immediately scaled down to 60% when the rolling update begins. Then, the controller can start scaling up a new ReplicaSet, ensuring that throughout the process the total number of unavailable pods is 40% at most.

.spec.strategy.rollingUpdate.maxSurge -- specifies the maximum number of pods that can be created over the desired number. Similarly to the maxUnavailable parameter, you can define maxSurge as an absolute value or as a percentage of desired pods. The default value for this field is 25%. For example, if you set maxSurge to 40%, the deployment controller can scale up the deployment immediately to 140% of the desired pods. As old pods are killed and new ones are created, the controller will always ensure that the maximum number of pods running is at most 140% of the desired pods.

Note: the Deployment spec supports a number of other fields such as .spec.revisionHistoryLimit discussed later in the course of this tutorial.

Now, as you understand the deployment spec, let's save the deployment object in deployment.yaml and create the deployment running the following command:

kubectl create -f deployment.yaml --record
deployment.apps "apache-server" created

Please, note that we are using --record flag that records the command with which the Deployment was created, which simplifies tracking the Deployment history later on. 

Let's verify that the deployment was created:

kubectl get deployment apache-server
NAME            DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
apache-server   3         3         3            3           45s

As the output shows, 3 Pod replicas are currently running, up-to-date and available to users. Thus, the deployment is currently in the desired state as we expected.

It might be useful to read a detailed description of the deployment by running kubectl describe deployment apache-server

Name:                   apache-server
Namespace:              default
CreationTimestamp:      Mon, 18 Jun 2018 14:31:14 +0300
Labels:                 app=httpd
Annotations:            deployment.kubernetes.io/revision=1
Selector:               app=httpd
Replicas:               3 desired | 3 updated | 3 total | 3 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  40% max unavailable, 40% max surge
Pod Template:
  Labels:  app=httpd
  Containers:
   httpd:
    Image:        httpd:2-alpine
    Port:         80/TCP
    Host Port:    0/TCP
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  <none>
NewReplicaSet:   apache-server-558f6f49f6 (3/3 replicas created)
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  5m    deployment-controller  Scaled up replica set apache-server-558f6f49f6 to 3

This description contains the synopsis of all parameters defined in the deployment spec along with the deployment status and actions performed by the deployment controller such as scaling up our first ReplicaSet. As you see in the last line of this output, the ReplicaSet created is named apache-server-558f6f49f6. A ReplicaSet's name is formatted as [DEPLOYMENT-NAME]-[POD-TEMPLATE-HASH-VALUE].

You can verify that the ReplicaSet was created by running:

kubectl get replicasets
NAME                       DESIRED   CURRENT   READY     AGE
apache-server-558f6f49f6   3         3         3         11m

Pods in this ReplicaSet have their names formatted as [DEPLOYMENT-NAME]-[POD-TEMPLATE-HASH-VALUE]-[POD-ID]. To see the pods running, let's filter kubectl get pods command by our pod label selector:

kubectl get pods -l app=httpd
NAME                             READY     STATUS    RESTARTS   AGE
apache-server-558f6f49f6-6xhll   1/1       Running   0          13m
apache-server-558f6f49f6-mvfn8   1/1       Running   0          13m
apache-server-558f6f49f6-pvwb4   1/1       Running   0          13m

That's it! Your deployment is ready to be managed. Let's first try to scale it.

Step 2: Scaling the Deployment

Assuming that your application's load has increased, you can easily scale up the deployment by running the following command:

kubectl scale deployment apache-server --replicas=5
deployment "apache-server" scaled

You can now verify that the deployment has 5 pod replicas in it running the following command:

kubectl get pods -l app=httpd
NAME                             READY     STATUS    RESTARTS   AGE
apache-server-558f6f49f6-4pck5   1/1       Running   0          15s
apache-server-558f6f49f6-bxkgg   1/1       Running   0          19m
apache-server-558f6f49f6-lmskm   1/1       Running   0          25m
apache-server-558f6f49f6-tmwx8   1/1       Running   0          25m
apache-server-558f6f49f6-vdpnl   1/1       Running   0          15s

Scaling down the deployment works the same way: we just specify fewer replicas than we did before. For example, to go back to three replicas again, just run

kubectl scale deployment apache-server --replicas=3
deployment "apache-server" scaled

Note: Scaling operations are not saved in the deployment revision history. Therefore, you can't return to the past number of replicas by using rollbacks. Unlinking deployment scaling from deployment update is the design feature that facilitates simultaneous manual- or auto-scaling. Note: Horizontal pod autoscaling is a big topic that will be discussed in future tutorials. If you need more details, please, consult the official documentation.

Step 3: Updating the Deployment

As you remember from the spec, our deployment is defined with the rolling update strategy, which allows updating applications with zero downtime. Let's test how this strategy works with our maxSurge and maxUnavailable parameters.

Open two terminal windows. In the first window, you are going to watch the Deployment's pods as they are created and terminated:

kubectl get pods -l app=httpd --watch

When you run this command, you'll see the current number of pods in your deployment:

NAME                             READY     STATUS    RESTARTS   AGE
apache-server-558f6f49f6-6xhll   1/1       Running   0          28m
apache-server-558f6f49f6-mvfn8   1/1       Running   0          28m
apache-server-558f6f49f6-pvwb4   1/1       Running   0          28m

In the second terminal, let's update the Apache HTTP server's container image (remember we initially defined the deployment with the httpd:2-alpine image):

kubectl set image deployment/apache-server  httpd=httpd:2.4

This command updates the httpd container image to the 2.4 version. Now, let's look into the first terminal to see what the deployment controller is doing:

kubectl get pods -l app=httpd --watch
NAME                             READY     STATUS    RESTARTS   AGE
apache-server-558f6f49f6-587k4   1/1       Running   0          26s
apache-server-558f6f49f6-hf2qx   1/1       Running   0          26s
apache-server-558f6f49f6-tcgcr   1/1       Running   0          26s
apache-server-5949cdc484-bhz4x   0/1       Pending   0         0s
apache-server-5949cdc484-bhz4x   0/1       Pending   0         0s
apache-server-5949cdc484-fltjn   0/1       Pending   0         0s
apache-server-5949cdc484-fltjn   0/1       Pending   0         0s
apache-server-558f6f49f6-587k4   1/1       Terminating   0         44s
apache-server-5949cdc484-bhz4x   0/1       ContainerCreating   0         0s
apache-server-5949cdc484-ptpwq   0/1       Pending   0         0s
apache-server-5949cdc484-fltjn   0/1       ContainerCreating   0         0s
apache-server-5949cdc484-ptpwq   0/1       Pending   0         0s
apache-server-5949cdc484-ptpwq   0/1       ContainerCreating   0         0s
apache-server-5949cdc484-fltjn   1/1       Running   0         1s
apache-server-5949cdc484-ptpwq   1/1       Running   0         1s
apache-server-558f6f49f6-hf2qx   1/1       Terminating   0         45s
apache-server-558f6f49f6-tcgcr   1/1       Terminating   0         45s
apache-server-558f6f49f6-587k4   0/1       Terminating   0         45s
apache-server-5949cdc484-bhz4x   1/1       Running   0         2s
apache-server-558f6f49f6-587k4   0/1       Terminating   0         48s
apache-server-558f6f49f6-587k4   0/1       Terminating   0         48s
apache-server-558f6f49f6-tcgcr   0/1       Terminating   0         48s
apache-server-558f6f49f6-hf2qx   0/1       Terminating   0         49s
apache-server-558f6f49f6-hf2qx   0/1       Terminating   0         49s
apache-server-558f6f49f6-hf2qx   0/1       Terminating   0         49s
apache-server-558f6f49f6-tcgcr   0/1       Terminating   0         50s
apache-server-558f6f49f6-tcgcr   0/1       Terminating   0         50s

As you see, the deployment controller immediately instantiated two new pods: apache-server-5949cdc484-bhz4x and apache-server-5949cdc484-fltjn in the new ReplicaSet. Afterward, it started terminating the pod named apache-server-558f6f49f6-587k4 scaling the old ReplicaSet down to 2 pods. Next, the controller scaled up a new ReplicaSet to 3 replicas and gradually scaled down the old ReplicaSet to 0.

You can find a more detailed description of this process by running kubectl describe deployment apache-server:

kubectl describe deployment apache-server
.....
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  4m    deployment-controller  Scaled up replica set apache-server-5949cdc484 to 2
  Normal  ScalingReplicaSet  4m    deployment-controller  Scaled down replica set apache-server-558f6f49f6 to 2
  Normal  ScalingReplicaSet  4m    deployment-controller  Scaled up replica set apache-server-5949cdc484 to 3
  Normal  ScalingReplicaSet  4m    deployment-controller  Scaled down replica set apache-server-558f6f49f6 to 1
  Normal  ScalingReplicaSet  4m    deployment-controller  Scaled down replica set apache-server-558f6f49f6 to 0

Now, we can verify that when scaling up and down the ReplicaSets, the deployment controller followed the rules specified in the maxSurge and maxUnavaulable fields. At the first glance, this seems to contradict the output above. Indeed, as the deployment description shows, the controller initially scaled up a new ReplicaSet to 2 pods, which implies we already have 5 pods (3 old pods and 2 new ones) at that moment. This number corresponds to approximately 166.6% of the desired state. However, as you remember, our maxSurge value is 40%, which means that the maximum number of running pods should be at most 140% of the desired state. That's weird, right? The thing is that, even though the controller initially scaled up a new ReplicaSet to two pods, it did not start creating them before 1 pod in the old ReplicaSet was fully terminated. Two containers in the new ReplicaSet were pending until the maxSurge and maxUnavailable requirements were met.

Let's experiment by updating the maxSurge to 70% and the maxUnavailable to 30%. Run kubectl edit deployment/apache-server to open the vim editor. When inside the editor, press "a" to insert text and change the maxSurge field to 70% and maxUnavailable to 30%. Once that's done, save changes by typing ESC and then :x in the editor. You should see the following success message:

deployment.extensions "apache-server" edited

Now, let's try to update our deployment once again:

kubectl set image deployment/apache-server  httpd=httpd:2-alpine

The deployment's description now looks like this:

kubectl describe deployment apache-server
Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  
  Normal  ScalingReplicaSet  10s   deployment-controller  Scaled up replica set apache-server-5949cdc484 to 3
  Normal  ScalingReplicaSet  8s    deployment-controller  Scaled down replica set apache-server-558f6f49f6 to 2
  Normal  ScalingReplicaSet  8s    deployment-controller  Scaled down replica set apache-server-558f6f49f6 to 0

As you see, the controller created a new ReplicaSet and scaled it up to three instances immediately. That is because our maxSurge value is very high (70%). Then, the controller gradually scaled down the old ReplicaSet to 0.

Step 4: Rolling Back the Deployment

Sometimes, when the update of the deployment fails for some reason, you might need to roll back it to the previous/earlier version/s. In this case, you can take advantage of the deployment's rollback functionality. It works as follows: each time a new rollout is made, a new deployment's revision is created. Revisions form a rollout history that provides access to the previous versions of your deployment. By default, all past ReplicaSets will be kept in the Kubernetes memory enabling rollbacks to any point in the revision history. However, If you set the .spec.revisionHistoryLimit to 3, for example, the controller will save only 3 latest revisions. Correspondingly, if you set the value of this field to 0 all old ReplicaSets will be automatically deleted on new updates and you'll be unable to undo them.

Note: Revisions are not triggered by manual scaling or auto-scaling. A deployment revision is created only when the pod template is changed (e.g., when updating the labels or container image). Therefore, when you roll back to the previous version of your deployment, only the pod's template part is rolled back.

Let's show how to roll back using a practical example. As in the previous examples, let's open two terminal windows and watch the deployment's pods in the first one:

kubectl get pods -l app=httpd --watch

In the second terminal, let's update the container image of your deployment and intentionally make a typo in the container image tag (httpd:23 version does not exist):

kubectl set image deployment/apache-server  httpd=httpd:23

Let's watch what happens in the first terminal window:

kubectl get pods -l app=httpd --watch
apache-server-7864679b97-6sgjx   0/1       Pending   0         0s
apache-server-7864679b97-6sgjx   0/1       Pending   0         0s
apache-server-7864679b97-sdvw2   0/1       Pending   0         0s
apache-server-7864679b97-zrmdc   0/1       Pending   0         0s
apache-server-7864679b97-zrmdc   0/1       Pending   0         0s
apache-server-7864679b97-sdvw2   0/1       Pending   0         0s
apache-server-7864679b97-6sgjx   0/1       ContainerCreating   0         0s
apache-server-7864679b97-zrmdc   0/1       ContainerCreating   0         0s
apache-server-7864679b97-sdvw2   0/1       ContainerCreating   0         0s
apache-server-7864679b97-zrmdc   0/1       ErrImagePull   0         4s
apache-server-7864679b97-sdvw2   0/1       ErrImagePull   0         6s
apache-server-7864679b97-6sgjx   0/1       ErrImagePull   0         8s
apache-server-7864679b97-zrmdc   0/1       ImagePullBackOff   0         18s
apache-server-7864679b97-sdvw2   0/1       ImagePullBackOff   0         21s
apache-server-7864679b97-6sgjx   0/1       ImagePullBackOff   0         22s

Since our container image tag is wrong, we are getting ErrImagePull and ImagePullBackOff errors for all three pod replicas. The controller will periodically continue to pull the image but the same errors will arise: our rollout is stuck in the image pull loop. We also can verify this by running:

kubectl rollout status deployments apache-server
Waiting for rollout to finish: 3 old replicas are pending termination...

Press Ctrl-C to exit the above rollout status watch.

You can also see that two ReplicaSets now co-exist: the old ReplicaSet apache-server-558f6f49f6 has two pods because one was already terminated according to maxUnavailable policy and the second ReplicaSet has 0 Ready replicas because of the image pull loop discussed above.

kubectl get replicasets
NAME                       DESIRED   CURRENT   READY     AGE
apache-server-558f6f49f6   2         2         2         2m
apache-server-7864679b97   3         3         0         1m

Now, it's evident that the Deployment's update process is stuck. How can we fix this issue? Rolling back to the previous stable revision is the most obvious decision.

To see available revision versions, let's check the revision history. It's available because we used --record flag when creating thedDeployment:

kubectl rollout history deployment/apache-server
REVISION  CHANGE-CAUSE
1         kubectl create --filename=deployment.yaml --record=true
2         kubectl set image deployment/apache-server httpd=httpd:2.4
3         kubectl set image deployment/apache-server httpd=httpd:2-alpine
4         kubectl set image deployment/apache-server httpd=httpd:23

You can also see a detailed description of each revision by specifying the --revision parameter:

kubectl rollout history deployment/apache-server --revision=4
deployments "apache-server" with revision #4
Pod Template:
  Labels:    app=httpd
    pod-template-hash=3420235653
  Annotations:    kubernetes.io/change-cause=kubectl set image deployment/apache-server httpd=httpd:23
  Containers:
   httpd:
    Image:    httpd:23
    Port:    80/TCP
    Host Port:    0/TCP
    Environment:    <none>
    Mounts:    <none>
  Volumes:    <none>

Now, let's roll back to the previous stable revision with the httpd:2-alpine image.

kubectl rollout undo deployment/apache-server --to-revision=3
deployment "apache-server" rolled back

Alternatively, to roll back to the previous version, you can just run:

kubectl rollout undo deployment/apache-server
deployment "apache-server" rolled back

Check the revision history again and you'll see that a new entry was added:

kubectl rollout history deployment/apache-server
REVISION  CHANGE-CAUSE
1         kubectl create --filename=deployment.yaml --record=true
2         kubectl set image deployment/apache-server httpd=httpd:2.4
3         kubectl set image deployment/apache-server httpd=httpd:2-alpine
4         kubectl set image deployment/apache-server httpd=httpd:23
5         kubectl set image deployment/apache-server httpd=httpd:2-alpine

Also, the rollback generated a DeploymentRollback event that can be seen in the Deployment's description:

kubectl describe deployment apache-server
....
Events:
  ....
  Normal  DeploymentRollback  3m                deployment-controller  Rolled back deployment "apache-server" to revision 3

That's it! Now you know what to do if your Deployment update failed. Choose the preferred revision version and roll back to it. As simple as that!

Conclusion

In this tutorial, we've walked you through basic options for managing Kubernetes deployments. The deployment controller is extremely powerful in scaling, updating, and rolling back your stateless applications. It also includes a number of other useful features not covered in this article, such as pausing and resuming the deployment. 

If you want to learn more about deployments, check out the official Kubernetes documentation. Deployment management tools you learned from this article, however, are already enough to prepare you for effective management of stateless applications in production ensuring zero downtime and efficient version control at any scale.

Keep reading

Working with Kubernetes Containers

Posted by Kirill Goltsman on June 30, 2018

Kubernetes offers a broad variety of features for running and managing containers in your pods. In addition to specifying container images, pull policies, container ports, volume mounts, and other container-level settings, users can define commands and arguments that change the default behavior of running containers. 

Kubernetes also exposes container start and termination lifecycle events to user-defined handlers that can be triggered by these events. In this article, we discuss available options for running containers in Kubernetes and walk you through basic steps to define commands for your containers and attach lifecycle event handlers to them. Let's start!

Containers in Kubernetes

When we think about Kubernetes containers, Docker container runtime immediately springs to mind. Indeed, until the 1.5 release, Kubernetes supported only two container runtimes: the popular Docker and rkt, which were deeply integrated into the kubelet source code. 

However, as more container runtimes have appeared over the past few years, Kubernetes developers decided to abstract the underlying container architecture from the deep layers of Kubernetes platform. That's why the 1.5 release came out with the Container Runtime Interface (CRI), a plugin interface that allows using a wide variety of container runtimes without the need to recompile. Since that release, Kubernetes has simplified the usage of various container runtimes compatible with the CRI. Docker containers, however, remain one of the most popular among Kubernetes users, so we implicitly refer to them when discussing operations with Kubernetes containers in this article.

Kubernetes wraps the underlying container runtime to provide basic functionality for containers such as pulling container images. As you might already know, container settings are defined in pods -- Kubernetes abstractions that act as the interface between Kubernetes orchestration services and running applications.

Each container defined within a pod has the image property that supports the same syntax as the Docker command does. Other basic fields of the container spec are ports and image pull policy (see the example below).

apiVersion: v1
kind: Pod
metadata:
  name: myapp-pod
  labels:
    app: myapp
spec:
  containers:
  - name: myapp-container
    image: httpd:2-alpine
    ports:
    - containerPort: 80
    imagePullPolicy: Always

Note: Since pods can have many containers, the latter are represented as an array of objects with indices starting from 0 (see the syntax below). Keeping this in mind, we discuss the container properties specified in the example above:

spec.containers[0].name -- the container's name. This value must consist of lower case alphanumeric characters or '-', and it must start and end with an alphanumeric character (e.g., 'my-name' or '123-abc').

spec.containers[0].image --   the container's image pulled from some container registry. In this example, we pull the httpd:2-alpine image from the public Docker Hub registry. Also, Kubernetes comes with a native support for private registries. In particular, the platform supports Google Container Registry (GCR), AWS EC2 Container Registry and Azure Container Registry (ACR). For registry-specific prerequisites and configuration, see the official Kubernetes documentation.

spec.containers[0].ports[0].containerPort -- port/s to open for the container. In this example, the port:80 is opened for our Apache HTTP Server.

spec.containers[0].imagePullPolicy --  a policy that defines how the container image is pulled from the registry. The default pull policy is IfNotPresent, which makes the kubelet skip pulling an image if it already exists. If you want to always pull an image, you can set this field to Always, as we did in this example. Also, you can have Kubernetes always pull a container image by using the image's :latest tag, although this feature is deprecated. You should avoid using the :latest tag in production because this makes it difficult to track which version of the image is running. Please note that if you don't use an image tag, it will be always assumed  :latest, implicitly triggering the Always pull policy. (See the official documentation for more details about best practices for configuring containers in production.)

The discussed pod spec illustrates basic container settings that satisfy many use cases. However, Kubernetes offers even more options for managing containers in your pod. In our tutorial below, we walk you through a simple process of defining commands and arguments for containers and demonstrate how to use container lifecycle hooks to control the container behavior when it starts or terminates. 

Tutorial

To complete examples in this tutorial, you'll need the following prerequisites:

  • a running Kubernetes cluster. See Supergiant GitHub wiki for more information about deploying a Kubernetes cluster with Supergiant. As an alternative, you can install a single-node Kubernetes cluster on a local system using Minikube.
  • kubectl command line tool installed and configured to communicate with the cluster. See how to install kubectl here.

Part 1: Defining Command and Arguments for Containers

Kubernetes allows defining commands and arguments for these commands for containers to use in a pod's resource object. Here's how this works using a simple example of the BusyBox container.

apiVersion: v1
kind: Pod
metadata:
  name: demo
  labels:
    app: demo
spec:
  containers:
  - name: busybox
    image: busybox
    command: ['sh']
    args: ['-c', 'MIN=5 SEC=45; echo "$(( MIN*60 + SEC ))"']

In this example, we:

  • create a pod named "demo" with a single container running Busybox image from the Docker Hub registry. BusyBox combines tiny versions of many common UNIX utilities into a single small executable.
  • specify a command for the running container to use. The command is 'sh' utility which is the Almquist shell (ash).
  • specify arguments for the shell command to use. In our case, the args property takes an array of two elements. The first one is the command argument -c (since 'sh' command accepts other commands as well) and the second one is the execution code for that command. In the execution code, we set two variables, MIN and SEC, and do the arithmetic operation with them. The operation's result is echoed to stdout.

Let's save this spec in the command-demo.yaml file and create the pod using kubectl.

kubectl create -f command-demo.yaml
pod "demo" created

Now, if you run kubectl get pod demo, you'll find out that the pod has successfully completed:

NAME            READY     STATUS      RESTARTS   AGE
test            0/1       Completed   0          7s

You can check the output produced by the command defined above by looking into the pod's logs.

kubectl logs demo
345

As you see, BusyBox utility computed a correct value and echoed it into the stdout. It's as simple as that!

In the example above, we defined command and arguments for that command as two separate spec fields. However, Kubernetes supports merging both commands and arguments into a single array of values like this:

spec:
  containers:
  - name: busybox
    image: busybox
    command: ['sh','-c', 'MIN=5 SEC=45; echo "$(( MIN*60 + SEC ))"']

This will produce the same result as above.

Also, instead of providing command arguments as variables within a script, you can put them in the environmental variables for containers. We can use a spec.containers.env field with a set of name/values in it like this:

apiVersion: v1
kind: Pod
metadata:
  name: demo
  labels:
    app: demo
spec:
  containers:
  - name: busybox
    image: busybox
    env:
    - name: one
      value: 'This is the first sentence'
    - name: two
      value: ',but it ends'
    command: ['sh']
    args: ['-c', 'echo "$(one)$(two)"']
  

Note: we should put the environmental variables in the parentheses. This is required for them to be expanded in the command or args field.

This simple command concatenates two strings stored in the environmental variables. These variables offer a convenient way to store some arbitrary data separately from the command execution context.

You may be wondering how commands and arguments you define change the default behavior of the container. The process is quite straightforward. The default command run by the Docker container is defined in the Entrypoint Docker field name, and the default arguments for that command are defined in the Cmd field name. If you don't specify any commands or override either the Entrypoint or Cmd with your command and args container settings, the following rules apply:

  • If no command or args are supplied for a container, the defaults in the Docker image are used.
  • If a command is supplied but no args are supplied, only the supplied command is used. The default EntryPoint and the default Cmd defined in the Docker image are ignored.
  • If only args are supplied, the default EntryPoint is run with the args that you supplied.
  • If both args and a command are supplied, they override the default EntryPoint and the default Cmd of the Docker image

Part 2: Attaching Container Lifecycle Hooks

As you know, containers have a finite lifecycle. It might be useful to attach handlers (functions) to various events of the container lifecycle to make them aware of these events and run code triggered by them. Kubernetes implements this functionality with container lifecycle hooks.

The platform exposes two lifecycle hooks to containers: PostStart and PreStop. The first hook executes immediately after the container is created. However, there is no guarantee that the handler in the PostStart hook will execute before the container's EntryPoint or user-defined command.

In its turn, the PreStop hook is called immediately before a container is terminated. This is a synchronous hook, so it must be completed before the call to delete the container can be sent.

Containers can access lifecycle hooks by implementing and registering a handler (function) for that hook. There are two types of hook handlers in Kubernetes:

  • Exec -- this handler executes a specific command inside the cgroups and namespaces of the container. Resources consumed by the command in the handler are counted against available container resources.
  • HTTP -- Executes an HTTP request against a specific endpoint on the container.

This is the basic theory behind lifecycle hooks. Now, let's see how to actually attach handlers to PostStart and PreStop Lifecycle Hooks.

apiVersion: v1
kind: Pod
metadata:
  name: lifecycle
spec:
  containers:
  - name: httpd
    image: httpd:2-alpine
    lifecycle:
      postStart:
        exec:
          command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/local/apache2/htdocs/index.html"]
      preStop:
        exec:
          command: ["/bin/sh", "-c", "for i in `seq 1 15`; do echo $i `date` preStop handler  >> /Users/kirillgoltsman/tmp/tmp; sleep 1; done"]
    volumeMounts:
        - mountPath: "/Users/kirillgoltsman/tmp"
          name: test-volume
  volumes:
      - name: test-volume
        hostPath:
         path: "/Users/kirillgoltsman/tmp"

In this pod spec, we create the PostStart and PreStop hook handlers for the container running Apache HTTP server. Both handlers are of the Exec type because they execute a specific command in the container environment. To illustrate how the PreStop handler works, we created a hostPath Volume on the local node. Before saving this spec, you'll need to specify your path to the hostPath Volume, which should be your user directory without any root permissions. For example, you can use /home/<user-name>/tmp on Linux or /Users/<user-name>/tmp on Mac.

Once that's done, save the spec above in the lifecycle-demo.yaml and create the pod running the following command:

kubectl create -f lifecycle-demo.yaml
pod "lifecycle" created 

Let's start with the analysis of the PostStart handler. As you see from the spec, it creates the index.html file containing a custom response from Apache HTTP server. Let's get a shell to our pod's httpd container to verify that the PostStart event fired and the handler executed the code.

kubectl exec -it lifecycle -- /bin/bash

Note: If we had two containers in the pod, we would also have to specify the container name in the command above because the shell always gets to a specific container. For example, assuming we have a Ruby container along with the httpd container, we could get a shell to the Ruby container running the following command:

kubectl exec lifecycle -c ruby-container

However, since we have only one container running in the pod, the command above works fine.

This command will get you into the httpd container's file system and network environment: root@lifecycle:/usr/local/apache2. To verify this, you can run the Linux ls command that lists current directories:

root@lifecycle:/usr/local/apache2# ls
bin  build  cgi-bin  conf  error  htdocs  icons  include  logs    modules

Now, let's see if our server returns the custom greeting written by the PreStart handler. We'll need to install cURL inside the container and send a GET request to the server to accomplish that.

apk update
apk add curl

Once cURL is installed, we can access the server on the localhost (as your remember, all containers in a pod are addressable via localhost):

curl localhost
Hello from the postStart handler

Great! As you see, the PostStart handler created the index.html file with our custom greetings.

Let's move on to the PreStop handler. It is an Exec handler that runs a 15-step iteration cycle returning the current date and saving the output to the file in the hostPath directory we've mounted. Let's verify that it works.

First, let's exit the container by typing exit since we don't need to be inside it anymore:

root@lifecycle:/usr/local/apache2/htdocs#  exit
command terminated with exit code 130

Next, let's delete the pod to trigger the PreStop handler:

kubectl delete pod lifecycle
pod "lifecycle" deleted

Finally, let's check if the handler wrote the output of the above command to the file inside our /Users/<user-name>/tmp directory (remember to use your own path to that file). Open the file in your favorite text editor:

Open /Users/kirillgoltsman/tmp/tmp
1 Tue Jun 19 11:21:56 UTC 2018 preStop handler
2 Tue Jun 19 11:21:57 UTC 2018 preStop handler
3 Tue Jun 19 11:21:58 UTC 2018 preStop handler
4 Tue Jun 19 11:21:59 UTC 2018 preStop handler
5 Tue Jun 19 11:22:00 UTC 2018 preStop handler
6 Tue Jun 19 11:22:01 UTC 2018 preStop handler
7 Tue Jun 19 11:22:02 UTC 2018 preStop handler
8 Tue Jun 19 11:22:03 UTC 2018 preStop handler
9 Tue Jun 19 11:22:04 UTC 2018 preStop handler
10 Tue Jun 19 11:22:05 UTC 2018 preStop handler
11 Tue Jun 19 11:22:06 UTC 2018 preStop handler
12 Tue Jun 19 11:22:07 UTC 2018 preStop handler
13 Tue Jun 19 11:22:08 UTC 2018 preStop handler
14 Tue Jun 19 11:22:09 UTC 2018 preStop handler
15 Tue Jun 19 11:22:10 UTC 2018 preStop handler

Great! The handler worked as expected. Now you know how to attach handlers to the container lifecycle events. One important thing to remember is that, at the moment, it's not so easy to debug these handlers if they fail. The thing is that the logs for a hook handler are not exposed in pod events. However, failed handlers broadcast their own error events. If a PostStart handler fails, it sends the FailedPostStartHook event and if the PreStop handler fails, it sends the FailedPreStopHook event. You can see these details by running kubectl describe pod <pod_name>.

Conclusion

That's it! As we've seen, Kubernetes offers a powerful API for working with containers including configuring image pull policies and container images. You also learned how to define commands and arguments for your containers to change their default behavior. In addition, we found out how to use container lifecycle hooks and container runtime environment to manage various events in the container lifecycle and interact with the running containers.

Keep reading

Managing Stateful Apps with Kubernetes StatefulSets

Posted by Kirill Goltsman on June 25, 2018

In the first part of the StatefulSets series, we discussed key purposes and concepts of StatefulSets and walked you through the process of creating a working StatefulSet. We saw how a sticky pod's UID and stable network identity can be leveraged to create apps that are stateful by design. Like deployments, StatefulSets also offer ways for managing your applications.

In the second part of the series, we look deeper into how to use StatefulSets to scale and update stateful applications. harnessing the power of ordered pod creation and controlled updates. Let's begin!

To complete examples from this article, you'll need:

  • a running Kubernetes cluster. See Supergiant GitHub wiki for more information about deploying a Kubernetes cluster with Supergiant. As an alternative, you can install a single-node Kubernetes cluster on a local system using Minikube.
  • a kubectl command line tool installed and configured to communicate with the cluster. See how to install kubectl here.
  • Before using examples from this article, you'll also need to create a working StatefulSet following simple steps described in the first part.

Scaling a StatefulSet

As you might already know, deployments and ReplicationControllers allow users to dynamically scale their applications. For example, once the deployment is running, you can easily adjust a number of pod replicas in it to match application needs. Kubernetes offers similar functionality for StatefulSets. To illustrate how scaling works, let's open two terminal windows. We will use the first window to watch the process of pods termination and creation. In the second one, we will scale our application.

In the first terminal window run:

kubectl get pods -w -l app=httpd

You'll see the current state of your StatefulSet, which is something like this:

NAME            READY     STATUS    RESTARTS   AGE
apache-http-0   1/1       Running   4          3d
apache-http-1   1/1       Running   4          3d
apache-http-2   1/1       Running   5          3d

In the second window, let's scale up our StatefulSet from three to six replicas running the following command:

 kubectl scale sts apache-http --replicas=6
statefulset.apps "apache-http" scaled

Now, If you look into the first terminal window, you notice that when scaling up the order of pod creation looks identical to when you are creating a StatefulSet from scratch. New pod replicas are created sequentially with Kubernetes, always waiting until the previous pod is running and ready before starting the next one. In this way, Kubernetes manages ordered creation of pods to prevent any conflicts and ensure high availability of your application.

apache-http-3   0/1       Pending   0         0s
apache-http-3   0/1       Pending   0         0s
apache-http-3   0/1       Pending   0         1s
apache-http-3   0/1       ContainerCreating   0         1s
apache-http-3   1/1       Running   0         5s
apache-http-4   0/1       Pending   0         0s
apache-http-4   0/1       Pending   0         0s
apache-http-4   0/1       Pending   0         1s
apache-http-4   0/1       ContainerCreating   0         1s
apache-http-4   1/1       Running   0         6s
apache-http-5   0/1       Pending   0         0s
apache-http-5   0/1       Pending   0         0s
apache-http-5   0/1       Pending   0         1s
apache-http-5   0/1       ContainerCreating   0         1s
apache-http-5   1/1       Running   0         5s

Scaling down a StatefulSet looks similar. Suppose that now your application's load has decreased and you don't need six replicas anymore. To address this, let's scale our application down:

kubectl scale sts apache-http --replicas=3
statefulset.apps "apache-http" scaled

As you see, there is no standalone command for scaling down your StatefulSet: again, we just specify the number of replicas and Kubernetes tries to achieve the desired state. To scale up and down, you can also use kubectl patch command which updates the StatefulSet resource object with a desired number of pod replicas:

kubectl patch sts apache-http -p '{"spec":{"replicas":3}}'
statefulset "apache-http" patched

As you might have noticed, when scaling down, pods are also deleted sequentially but in the reverse order (from the sixth pod to the fourth).

apache-http-5   1/1       Terminating   0         8m
apache-http-5   0/1       Terminating   0         8m
apache-http-5   0/1       Terminating   0         8m
apache-http-5   0/1       Terminating   0         8m
apache-http-4   1/1       Terminating   0         8m
apache-http-4   0/1       Terminating   0         8m
apache-http-4   0/1       Terminating   0         8m
apache-http-4   0/1       Terminating   0         8m
apache-http-3   1/1       Terminating   0         8m
apache-http-3   0/1       Terminating   0         8m
apache-http-3   0/1       Terminating   0         8m
apache-http-3   0/1       Terminating   0         8m

That's it! You have learned how to scale applications up and down. However, what happens to PersistentVolumes and PersistentVolumeClaims attached to the pods when pods are terminated like we did while scaling down? One of the StatefulSet's limitations is that deleting a pod or scaling the StatefulSet down does not result in the deletion of volumes bound to the StatefulSet. According to the official documentation, "This is done to ensure data safety, which is generally more valuable than an automatic purge of all related StatefulSet resources."

We can easily verify that the PVs and PVCs claims associated with the pods in the StatefulSet were not deleted by running the following command:

kubectl get pvc -l app=httpd

The response should be:

NAME                STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
www-apache-http-0   Bound     pvc-a340c764-6a66-11e8-a50f-0800270c281a   2Gi        RWO            fast           3d
www-apache-http-1   Bound     pvc-b50cfb0b-6a66-11e8-a50f-0800270c281a   2Gi        RWO            fast           3d
www-apache-http-2   Bound     pvc-b8b451f6-6a66-11e8-a50f-0800270c281a   2Gi        RWO            fast           3d
www-apache-http-3   Bound     pvc-6fe2bb0f-6d57-11e8-8adb-0800270c281a   2Gi        RWO            fast           25m
www-apache-http-4   Bound     pvc-72e071e3-6d57-11e8-8adb-0800270c281a   2Gi        RWO            fast           25m
www-apache-http-5   Bound     pvc-76871cc1-6d57-11e8-8adb-0800270c281a   2Gi        RWO            fast           25m

As you see, there are six PVCs in our StatefulSet still even though we scaled it down successfully in the previous step.

Updating a StatefulSet

Automated updates of StatefulSets are supported in Kubernetes 1.7 and later. The platform supports two update strategies: RollingUpdate and OnDelete. You can specify one of them in the spec.updateStrategy field of your StatefulSet resource object.

Let's first illustrate the workings of the RollingUpdate strategy that can be used to upgrade container images of containers running in the StatefulSet's pods. The rolling update is the default strategy used in the previous tutorial, so we need not change anything in our spec.

Let's upgrade the container images in our StatefulSet's pods:

kubectl patch statefulset apache-http --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/image", "value":"httpd:2-alpine"}]'
statefulset "apache-http" patched

Now, if you look into the output of the first terminal window, you'll notice that pods are terminated and started sequentially in the reverse order. This is how a rolling update works.

apache-http-2   1/1       Terminating   0         1m
apache-http-2   0/1       Terminating   0         1m
apache-http-2   0/1       Terminating   0         1m
apache-http-2   0/1       Terminating   0         1m
apache-http-2   0/1       Pending   0         0s
apache-http-2   0/1       Pending   0         0s
apache-http-2   0/1       ContainerCreating   0         0s
apache-http-2   1/1       Running   0         4s
apache-http-1   1/1       Terminating   0         1m
apache-http-1   0/1       Terminating   0         1m
apache-http-1   0/1       Terminating   0         1m
apache-http-1   0/1       Terminating   0         1m
apache-http-1   0/1       Terminating   0         1m
apache-http-1   0/1       Pending   0         0s
apache-http-1   0/1       Pending   0         0s
apache-http-1   0/1       ContainerCreating   0         0s
apache-http-1   1/1       Running   0         5s
apache-http-0   1/1       Terminating   0         1m
apache-http-0   0/1       Terminating   0         1m
apache-http-0   0/1       Terminating   0         1m
apache-http-0   0/1       Terminating   0         1m
apache-http-0   0/1       Pending   0         0s
apache-http-0   0/1       Pending   0         0s
apache-http-0   0/1       ContainerCreating   0         0s
apache-http-0   1/1       Running   0         5s

Kubernetes implements a rolling update as a way to keep applications available all the time. Rolling updates allow updates to take place with zero downtime by incrementally updating pods' instances with new ones. Unlike deployments, however, StatefulSets do not support proportional scaling. As a side note, proportional scaling allows maintaining a desired number of applications during the rollout by setting the maxSurge and maxUnavailable parameters of the deployment. For example, if we set maxUnavailable=2, the deployment controller will not allow the number of unavailable replicas to be greater than 2. This may be useful if the specified container image is not reachable for some reason: Kubernetes will simply stop the upgrade because of the maxUnavailable parameter.

StatefulSets also implement fault tolerance though. The StatefulSet controller will ensure that if the container image upgrade fails, the pod with the previous container image version is restored. In this way, the controller attempts to keep the application healthy in the presence of failures.

However, let's go back to our example.We can now check whether the container images were updated:

for p in 0 1 2; do kubectl get po apache-http-$p --template '{{range $i, $c := .spec.containers}}{{$c.image}}{{end}}'; echo; done
httpd:2-alpine
httpd:2-alpine
httpd:2-alpine

Great! As you see, the container image was upgraded to the new version and all three pods in our StatefulSet are now using it.

Staging an Update

In the previous example, we saw how to update all containers in a StatefulSet. However, what if needed to update only some pods while leaving the container images in the others the same? To achieve this, Kubernetes allows setting a partition that prevents updates of pods with the ordinal index lower than the partition's value. Let's see how it works.

First, we'll need to add a 'partition' parameter to the spec.updateStrategy field:

kubectl patch statefulset apache-http -p '{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":1}}}}'
statefulset.apps "apache-http" patched

We set the partition to 1, which means that only the pods with the ordinal index equal or greater than 1 will be updated. Now, let's update the container images for the pods in our StatefulSet:

kubectl patch statefulset apache-http --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/image", "value":"httpd:2.4.33"}]'
statefulset "apache-http" patched

Let's check whether container images were updated:

for p in 0 1 2; do kubectl get po apache-http-$p --template '{{range $i, $c := .spec.containers}}{{$c.image}}{{end}}'; echo; done

The response should be:

httpd:2-alpine
httpd:2.4.33
httpd:2.4.33

As you see, the rolling update did not upgrade the first pod with a new container image. That is because its ordinal index (0) is lower than the value of the partition (1). The container image for this pod will not be upgraded even if we delete this pod:

kubectl delete pod apache-http-0

Let's verify that we have the same result:

for p in 0 1 2; do kubectl get po apache-http-$p --template '{{range $i, $c := .spec.containers}}{{$c.image}}{{end}}'; echo; done
httpd:2.4
httpd:2.4.33
httpd:2.4.33

If you at some point decide to decrease the partition, the StatefulSet will automatically update the pods that match the new partition value. Let's try to illustrate this by patching the partition in our StatefulSet to 0:

kubectl patch statefulset apache-http -p '{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":0}}}}'

Now, if we check container images again, we'll see something like this:

for p in 0 1 2; do kubectl get po apache-http-$p --template '{{range $i, $c := .spec.containers}}{{$c.image}}{{end}}'; echo; done
httpd:2.4.33
httpd:2.4.33
httpd:2.4.33

As you see, the StatefulSet controller has automatically updated the apache-http-0 Pod because of the changed partition although we did not manually patch the update by changing the .spec.template.spec containers.0.image field.

On Delete

If you are using Kubernetes 1.6 and prior, you may want to prevent automatically updating your pods when a change is made to the StatefulSet's .spec.template field. This strategy can be enabled by setting .spec.template.updateStrategy.type to OnDelete.

Cleaning up

To clean up after these examples are completed, we'll need to do a Cascading delete like this:

kubectl delete statefulset apache-http
statefulset "apache-http" deleted

This command will terminate the pods in your StatefulSet in reverse order {N-1..0}. Note that this operation will just delete the StatefulSet and its pods but not the headless service associated with your StatefulSet. To clean up, we'll also need to delete our httpd-service Service manually.

kubectl delete service httpd-service
service "httpd-service" deleted

Finally, let's delete the StorageClass and PVCs used in this tutorial:

kubectl delete storageclass fast
kubectl delete pv -l app=httpd
kubectl delete pvc -l app=httpd

Conclusion

That's it! Hopefully, now you have a better understanding of available options for managing your stateful apps with StatefulSets. This abstraction offers users an opportunity to scale and update apps in a controlled and predictable way. Although StatefulSets have certain limitations, including the need to delete bound PVCs manually, they are otherwise extremely powerful for the vast array of tasks involved in managing your stateful applications in Kubernetes.

Keep reading