Supergiant Blog

Product releases, new features, announcements, and tutorials.

Persistent Storage with Persistent Volumes in Kubernetes

Posted by Kirill Goltsman on June 14, 2018

As you might remember from the previous tutorials, Kubernetes supports a wide variety of volume plugins both for stateless and stateful apps. Non-persistent volumes such as emptyDir or configMap are firmly linked to the pod's lifecycle: they are detached and deleted after the pod is terminated.

However, with stateful applications like databases, we want to have volumes that persist data beyond the pod's lifecycle. Kubernetes solves this problem by introducing the PersistentVolume and PersistentVolumeClaim resources that enable native and external persistent storage in your Kubernetes clusters.

In this tutorial, we'll explain how these two API resources can be used together to link various storage architectures (both native and CSP-based) to applications running in your cluster. To consolidate the theory, we are also going to walk you through a tutorial about using hostPath as a PersistentVolume in your stateful deployment. Let's start!

Why Do We Need Persistent Volumes?

The rationale for using a PersistentVolume resource in Kubernetes is quite simple. On the one hand, we have different storage infrastructures such as Amazon EBS (Elastic Block Storage), GCE Persistent Disk, or GlusterFS, each having its specific storage type (e.g., block storage, NFS, object storage, data center storage), architecture, and API. If we were to attach these diverse storage types manually, we would have to develop custom plugins to interact with the external drive's API, such as mounting the disk, requesting capacity, managing the disk's life cycle, etc. We would also need to configure a cloud storage environment, all of which would result in unnecessary overhead.

Fortunately, the Kubernetes platform simplifies storage management for you. Its PersistentVolume subsystem is designed to solve the above-described problem by providing APIs that abstract details of the underlying storage infrastructure (e.g., AWS EBS, Azure Disk etc.), allowing users and administrators to focus on storage capacity and storage types their applications will consume rather than the subtle details of each storage provider's API.

This sounds similar to the pod's resource model, doesn't it? As we remember from the previous tutorial, containers in a pod request resources in raw amounts of CPU and RAM so users do not bother about server flavors and memory types used by CSPs under the hood. PVs do the same for your storage, providing the right amount of resources for your pods regardless of what storage provider you opt to use.

Linking a persistent volume to your application involves several steps: provisioning a PV, requesting storage, and using the PV in your pod or deployment. Let's discuss these steps.

Provisioning a PersistentVolume (PV)

Kubernetes users need to define a PersistentVolume resource object that specifies the storage type and storage capacity, volume access modes, mount options, and other relevant storage details (see the discussion below). Once the PV is created, we have a working API abstraction that tells Kubernetes how to interact with the underlying storage provider using its Kubernetes volume plugin (see our article about Kubernetes Volumes to learn more).

Kubernetes supports static and dynamic provisioning of PVs. In the static provisioning, PVs are created by the cluster administrator and are allowed to use actual storage available in the cluster. This means that in order to use static provisioning, one needs to have a storage (e.g., Amazon EBS) capacity provisioned beforehand (see the tutorial below).

On the other hand, a dynamic provisioning of volumes can be triggered when a volume type claimed by the user does not match any PVs available in the cluster. For dynamic provisioning to happen, the cluster administrator needs to enable the DefaultStorageClass admission plugin that defines default storage classes for applications. (Note: this is a big topic in its own right, so it will be discussed in the next tutorials.)

Requesting Storage

To make PVs available to pods in the Kubernetes cluster, you should explicitly claim them using a PersistentVolumeClaim (PVC) resource. A PVC is bound to the PV that matches storage type, capacity, and other requirements specified in the claim. Binding the PVC to the PV is secured by the control loop that watches for new PVCs and finds a matching PV. The claim will be automatically unbound if a matching volume does not exist. For example, if the PVC is requesting 200 Gi and the cluster only has 100 Gi PVs available, the claim won't be bound to any PV until a PV with 200 Gi is added to the cluster. In the example below, the PVC will be bound to the Persistent Volume #1 and not the Persistent Volume #2 because the PVC's resource request and the volume mode match the first volume only.

Using PVs and PVCs in Pods

Pods can use PVs by specifying a PVC that matches its resource and volume options definitions. This works as follows. When the PVC is specified, the cluster finds the claim in the pod's namespace, uses it to get the PV backing the claim, and mounts the corresponding volume into the container/s in the pod.

Note that PVCs should be referenced as Volumes in a Pod's spec. For example, below we see a pod spec that uses a persistentVolumeClaim named "test-claim" referring to some PV:

kind: Pod
apiVersion: v1
metadata:
  name: test-pod
spec:
  containers:
    - name: nginx
      image: nginx
      volumeMounts:
      - mountPath: "/var/www/html"
        name: mypd
  volumes:
    - name: mypd
      persistentVolumeClaim:
        claimName: test-claim

Persistent Volume Types

We discussed available volume types in the previous tutorial. However, not all of them are persistent. Below is a table of the persistent storage volume plugins that can be used by PVs and PVCs.

Volume Name Storage Type                Description
gcePersistentDisk Block Storage A Google Compute Engine (GCE) Persistent Disk that provides SSD and HDD storage attached to nodes and pods in a K8s cluster.
awsElasticBlockStore Block Storage Amazon EBS volume is a persistent block storage volume offering consistent and low-latency performance.
azureFile Network File Shares Microsoft Azure file volumes are fully managed file shares in Microsoft Azure accessible via the industry standard Server Message Block (SMB) protocol.
azureDisk Block Storage A Microsoft Azure data disk provides block storage with SSD and HDD options.
fc Data Center Storage and Storage Area Networks (SAN) Fibre channel is a high-speed networking technology for the lossless delivery of raw block data. FC is primarily used in Storage Area Networks (SAN) and commercial data centers.
FlexVolume Allows Creating Volume Plugins FlexVolume enables users to develop Kubernetes volume plugins for vendor-provided storage.
flocker Container Data Storage and Management Flocker is an open-source container data volume manager for Dockerized applications. The platform supports container portability across diverse storage types and cloud environments.
nfs Network File System NFS refers to a distributed file system protocol that allows users to access files over a computer network.
iscsi Networked Block Storage iSCSI (Internet Small Computer Systems Interface) is an IP-based storage networking protocol for connecting data storage facilities). It is used to facilitate data transfer over intranets and to manage storage over long distances by enabling location-independent data storage.
rbd Ceph Block Storage Ceph RADOS Block Device (RBD) is a building block of Ceph Block Storage that leverages RADOS capabilities such as snapshotting, consistency, and replication.
cephfs Object Storage and Interfaces for Block and File Storage Ceph is a storage platform that implements object storage on a distributed computer cluster.
cinder Block Storage Cinder is a block storage service for openstack designed to provide storage resources to end users that can be used by the OpenStack Compute Project (Nova).
glusterfs Networked File System Gluster is a distributed networked file system that aggregates storage from multiple servers into a single storage namespace.
vsphereVolume VMDK Stands for a virtual machine disk (VMDK) provided by the vSphere (VMware).
quobyte Data Center File System Quobyte volume plugin mounts Quobyte data center file system.
hostPath Local Cluster File System hostPath volumes mounts directories from the host node’s filesystem into a pod.
portworxVolume Block Storage A portworxVolume is a Portworx's elastic block storage layer that runs hyperconverged with Kubernetes. Portworx's storage system is designed to aggregate capacity across multiple servers similarly to Gluster.
scaleIO Shared Block Networked Storage ScaleIO is a software-defined storage product from Dell EMC that creates a server-based Storage Area Network (SAN) from local server storage. It is designed to convert direct-attached storage into shared block storage.
storageos Block Storage StorageOS aggregates storage across a cluster of servers and exposes it as high-throughput and low-latency block storage.

Defining a PersistentVolume API Resource

Now, let's discuss how PVs actually work. First, PVs are defined as Kubernetes API objects with a spec and a list of parameters. Below is an example of a typical PersistentVolume definition in Kubernetes.

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-nfs
spec:
  capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Recycle
  storageClassName: slow
  mountOptions:
    - hard
    - nfsvers=4.2
  nfs:
    path: /tmp
    server: 172.15.0.6

As you see, here we defined a PV for the NFS volume type. Key parameters defined in this spec are the following:

  • spec.capacity -- Storage capacity of the PV. In this example, our PV has a capacity of 10 Gi (gigibytes). The capacity property uses the same units as defined in the Kubernetes resource model. It allows users to represent storage as unadorned integers or as fixed-point integer with one of these SI suffices (E, P, T, G, M, K, m) or as their binary equivalents (Ei, Pi, Ti, Gi, Mi, Ki). Currently, Kubernetes users can only request storage size. However, future attributes may include throughput, IOPS, etc.

  • spec.volumeMode (available since Kubernetes v.1.9 ) -- The volume mode property supports raw block devices and filesystems. Block storage mode offers a raw unformatted block storage that avoids filesystem overhead and, hence, ensures lower latency and higher throughput for mission-critical applications such as databases and object stores. Valid values for this field include "Filesystem" (default) and "Block".

  • spec.accessModes -- Defines how the volume can be accessed. (Note: valid values vary across persistent storage providers.) In general, the supported field values are:
    • ReadWriteOnce –- the volume can be mounted as a read/write volume only by a single node.
    • ReadOnlyMany -- many nodes can mount the volume as read-only.
    • ReadWriteMany -- many nodes can mount the volume as read-write.

Note: a volume can be only mounted with one access mode at a time, even if it supports many.

  • spec.storageClassName -- A storage class of the volume defined by the StorageClass resource. A PV of a given class can only be bound to PVCs requesting that class. A PV with no storageClassName defined can only be bound to PVCs that request no particular class.

  • spec.persistentVolumeReclaimPolicy -- A reclaim policy for the Volume. At the present moment, Kubernetes supports the following reclaim policies:

    • retain: If this policy is enabled, the PV will continue to exist even after the PVC is deleted. However, it won't be available to another claim until the previous claimant's data remains on the volume and PV are deleted manually. It's also worth noting that where Retain really shines is that users can reuse that data if they want -- for example, if they wanted to use the data without a PV (e.g., switch to a traditional database model), or if they wanted to use that data on a different, new PV (migrating to another cluster, testing, etc.).
    • recycle: (Deprecated): This reclaim policy performs a basic scrub operation (rm -rf /thevolume/*) on a given volume and makes it available again for a new claim.
    • delete: This reclaim policy deletes the PersistentVolume object from the Kubernetes API and associated storage capacity in the external infrastructure (e.g., AWS EBS, Google Persistent Disk, etc.). AWS EBS, GCE PD, Azure Disk, and Cinder volumes support this reclaim policy.
  • spec.mountOptions -- A K8s administrator can use this field to specify additional mount options supported by the storage provider. In the spec above, we mount an NFS hard drive and specify that the NFS version 4.2 should be used.

    Note: Not all providers support mount options. For more information, see the official documentation.

  • spec.nfs -- The list of the NFS-specific options. Here, we say that nfs Volume should be mounted at the /tmp path of a server with an IP 172.15.0.6 .

Defining PVC

As we've already said, PersistentVolumeClaims must be defined to claim resources of a given PV. Similarly to PVs, PVCs are defined as Kubernetes API resource objects. Let's see an example:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: volume-claim
spec:
  accessModes:
    - ReadWriteMany
  volumeMode: Block
  resources:
    requests:
      storage: 10Gi
  storageClassName: slow
  selector:
    matchLabels:
      release: "stable"

This PVC:

  • targets volumes that have ReadWriteMany access mode (spec.accessModes).
  • requests the storage only from volumes that have "Block" volume mode enabled (spec.volumeMode).
  • claims 10Gi of storage from any matching PV (spec.resources.requests.storage).
  • filters volumes that match a "slow" storage class (spec.storageClassName). Note: If the default StorageClass is set by the administrator, the PVC with no storageClassName can be bound only to PVs of that default.
  • specifies a label selector to further filter a set of volumes. Only volumes with a label "stable" can be bound to this claim.

This PVC matches the StorageClass of the PV defined above. However, it does not match the volumeMode and accessMode in that PV. Therefore, our PVC cannot be used to claim resources from the pv-nfs PV.

Using Persistent Volumes by Pod

Once the PV and PVC are created, using persistent storage in a pod becomes straightforward:

kind: Pod
apiVersion: v1
metadata:
  name: persistent-pod
spec:
  containers:
    - name: httpd
      image: httpd
      volumeMounts:
      - mountPath: "/usr/local/apache2/htdocs"
        name: test-pv
  volumes:
    - name: test-pv
      persistentVolumeClaim:
        claimName: volume-claim

In this pod spec, we:

  • pull Apache HTTP Server from the Docker Hub repository (spec.containers.image)
  • define a Volume "test-pv" and use the PVC "volume-claim" to claim some PV that matches this claim.
  • mount the Volume at the path /usr/local/apache2/htdocs in the httpd container.

That's it! Hopefully, now you understood the theory behind PVs and PVCs and key options and parameters available to these API resources. Let's consolidate this knowledge with the tutorial.

Tutorial: Using hostPath Persistent Volumes in Kubernetes

In this tutorial, we'll create a PersistentVolume using hostPath volume plugin and claim it for the use in the Deployment running Apache HTTP servers. hostPath volumes use a file or directory on the Node and are suitable for the development and testing purposes.

Note: hostPath volumes have certain limitations to watch out. It's not recommended to use them in production. Also, in order for the hostPath to work, we will need to run a single node cluster.  See the official documentation for more info.

To complete this example, we used the following prerequisites:

  • A Kubernetes cluster deployed with Minikube. Kubernetes version used was 1.10.0.
  • A kubectl command line tool installed and configured to communicate with the cluster. See how to install kubectl here.

Step #1 Create a Directory on Your Node

First, let's create a directory on your Node that will be used by the hostPath Volume. This directory will be a webroot of Apache HTTP server.

First, you need to open a shell to the Node in the cluster. Since you are using Minikube, open a shell by running minikube ssh.

In your shell, create a new directory. Use any directory that does not need root permissions (e.g., user's home folder if you are on Linux):

mkdir /home/<user-name>/data

Then create the index.html file in this directory containing a custom greeting from the server (note: use the directory you created):

echo 'Hello from the hostPath PersistentVolume!' > /home/<user-name>/data/index.html

Step #2 Create a Persistent Volume

The next thing we need to do is to create a hostPath PersistentVolume that will be using this directory.

kind: PersistentVolume
apiVersion: v1
metadata:
  name: pv-local
  labels:
    type: local
spec:
  storageClassName: local
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/home/<user-name>/data"

This spec:

  • defines a PV named "pv-local" with a 10Gi capacity (spec.capacity.storage).
  • sets the PV's access mode to ReadWriteOnce, which allows the volume to be mounted as read-write by a single node (spec.accessModes).
  • assigns a storageClassName "local" to the PersistentVolume.
  • configures hostPath Volume plugin to mount local directory at /home/<user-name>/data

Save this spec in the file (e.g hostpath-pv.yaml) and create the PV running the following command:

kubectl create -f hostpath-pv.yaml
persistentvolume "pv-local" created

Let's check whether our PersistentVolume was created:

NAME       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM     STORAGECLASS   REASON    AGE
pv-local   10Gi       RWO            Retain           Available             local                    2m

This response indicates that the  volume is already available but still does not have any claim bound to it, so let's create one.

Step #3 Create a PersistentVolumeClaim (PVC) for your PersistentVolume

The next thing we need to do is to claim our PV using a PersistentVolumeClaim. Using this claim, we can request resources from the volume and make them available to our future pods.

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: hostpath-pvc
spec:
  storageClassName: local
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
  selector:
    matchLabels:
      type: local

Our PVC does the following:

  • filters the volumes labeled "local" to bind our specific hostPath volume and other hostPath volumes that might be created later (spec.selector.matchLabels) .
  • targets hostPath volumes that have ReadWriteOnce access mode (spec.accessModes).
  • requests a volume of at least 5Gi (spec.resources.requests.storage).

First, save this resource definition in the hostpath-pvc.yaml, and then create it similar to what we did with the PV:

kubectl create -f hostpath-pvc.yaml 
persistentvolumeclaim "hostpath-pvc" created

Let's check the claim running the following command:

kubectl get pvc hostpath-pvc

The response should be something like this:

NAME           STATUS    VOLUME     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
hostpath-pvc   Bound     pv-local   10Gi       RWO            local          29s

As you see, our PVC was already bound to the volume of the matching type. Let's verify that the PV we created was actually selected by the claim:

kubectl get pv pv-local

The response should be something like this:

NAME       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS    CLAIM                  STORAGECLASS   REASON    AGE
pv-local   10Gi       RWO            Retain           Bound     default/hostpath-pvc   local                    16m

Did you notice the difference from the previous status of our PV? You'll see that it is now bound by the claim hostpath-pvc we just created (the claim is living in the default Kubernetes namespace). That's exactly what we wanted to achieve!

Step #4 Use the PersistentVolumeClaim as a Volume in your Deployment

Now everything is ready for the use of your hostPath PV in any pod or deployment of your choice. To do this, we need to create a deployment with a PVC referring to the hostPath volume. Since we created a PV that mounts a directory with the index.html file, let's deploy Apache HTTP server from the Docker Hub repository.

apiVersion: apps/v1 #  use apps/v1beta2 for versions before 1.9.0
kind: Deployment
metadata:
  name: httpd
spec:
  replicas: 2
  selector:
    matchLabels:
      app: httpd
  template:
    metadata:
      labels:
        app: httpd
    spec:
      containers:
      - image: httpd:latest
        name: httpd
        ports:
        - containerPort: 80
          name: web
        volumeMounts:
        - name: web
          mountPath: /usr/local/apache2/htdocs
      volumes:
      - name: web
        persistentVolumeClaim:
          claimName: hostpath-pvc

As you see, along with the standard deployment parameters like container image and container port, we have also defined a volume named "web" that uses our PersistentVolumeClaim. This volume will be mounted with our custom index.html at /usr/local/apache2/htdocs, which is the default webroot directory of Apache HTTP for this Docker Hub image. Also, deployment will have access to 5Gi of data in the hostPath volume.

Save this spec in httpd-deployment.yaml and create the deployment using the following command:

kubectl create -f httpd-deployment.yaml 
deployment.apps "httpd" created

Let's check the deployment's details:

...
Mounts:
      /usr/local/apache2/htdocs from web (rw)
  Volumes:
   web:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  hostpath-pvc
    ReadOnly:   false

Along with other details, the output shows that the directory /usr/local/apache2/htdocs was mounted from the web (rw) volume and that our PersistentVolumeClaim was used to provision the storage.

Now, let's verify that our Apache HTTP pods actually serve the index.html file we created in the first step. To do this, let's first find the UID of one of the pods and get a shell to the Apache server container running in this pod.

kubectl get pods -l app=httpd
NAME                     READY     STATUS    RESTARTS   AGE
httpd-5958bdc7f5-fg4l5   1/1       Running   0          15m
httpd-5958bdc7f5-jjf9r   1/1       Running   0          15m

We'll enter the httpd-5958bdc7f5-jjf9r pod using the following command:

kubectl exec -it httpd-5958bdc7f5-jjf9r -- /bin/bash

Now, we are inside the Apache2 container's filesystem. You may verify this by using Linux ls command.

root@httpd-6c96b87dfc-g4r45:/usr/local/apache2# ls
bin  build  cgi-bin  conf  error  htdocs  icons  include  logs  modules

The image uses Linux environment, so we can easily install curl to access our server:

root@httpd-6c96b87dfc-g4r45:/usr/local/apache2# apt-get update
root@httpd-6c96b87dfc-g4r45:/usr/local/apache2# apt-get install curl

When the curl is installed, let's send a GET request to our server listening on localhost:80 (remember that containers within a Pod communicate via localhost).

curl localhost:80 
Hello from the hostPath PersistentVolume!

That's it! Now you know how to define a PV and a PVC for hostPath volumes and use this storage type in your Kubernetes deployments. This tutorial demonstrated how both PV and PVC take care of the underlying storage infrastructure and filesystem, so users can focus on just how much storage they need for their deployment.

Step #5 Clean Up

This tutorial is over, so let's clean up after ourselves.

Delete the Deployment:

kubectl delete deployment httpd 
deployment "httpd" deleted

Delete the PV:

kubectl delete pv pv-local
persistentvolume "pv-local" deleted

Delete the PVC:

kubectl delete pvc hostpath-pvc
persistentvolumeclaim "hostpath-pvc" deleted

Finally, don't forget to delete all files and folders like /home/<user-name>/data, PV, and PVC resource definition files we created. 

Conclusion

We hope that you now have a better understanding of how to create stateful applications in Kubernetes using PersistentVolume and PersistentVolumeClaim. As we've learned, persistent volumes are powerful abstractions that enable user access to diverse storage types supported by the Kubernetes platform. Using PVs you can attach and mount almost any type of persistent storage such as object-, file-, network- level storage to your pods and deployments. In addition, Kubernetes exposes a variety of storage options such as capacity, reclaim policy, volume modes, and access modes, making it easy for you to adjust different storage types to your particular application's requirements and needs. Kubernetes makes sure that your PVC is always bound to the right volume type available in your cluster, enabling the efficient usage of resources, high availability of applications, and integrity of your data across pod restarts and node failures.

Keep reading

Kubernetes Storage: Introduction

Posted by Kirill Goltsman on June 4, 2018

Kubernetes storage options include both native storage, CSP disks, network file systems, object storage, and many more. In this article, we introduce you to Kubernetes volumes and demonstrate how they can be used to share resources between containers in a stateless app. Let's start!

Keep reading

Kubernetes Networking Explained: Introduction

Posted by Kirill Goltsman on May 30, 2018

Kubernetes is a powerful platform for managing containerized applications. It supports their deployment, scheduling, replication, updating, monitoring, and much more. Kubernetes has become a complex system due to the addition of new abstractions, resource types, cloud integrations, and add-ons. Further, Kubernetes cluster networking is perhaps one of the most complex components of the Kubernetes infrastructure because it involves so many layers and parts (e.g., container-to-container networking, Pod networking, services, ingress, load balancers), and many users are struggling to make sense of it all.

The goal of Kubernetes networking is to turn containers and Pods into bona fide "virtual hosts" that can communicate with each other across nodes while combining the benefits of VMs with a microservices architecture and containerization. Kubernetes networking is based on several layers, all serving this ultimate purpose:

  • Container-to-container communication using localhost and the Pod's network namespace. This networking level enables the container network interfaces for tightly coupled containers that can communicate with each other on specified ports much like the conventional applications communicate via localhost.
  • Pod-to-pod communication that enables communication of Pods across Nodes. If you want to learn more about Pods, see our recent article).
  • Services. A Service abstraction defines a policy (microservice) for accessing Pods by other applications.
  • Ingress, load balancing, and DNS.

Sounds like a lot of stuff, doesn't it? It is. That's why we decided to create a series of articles explaining Kubernetes networking from the bottom (container-to-container communication) to the top (pod networking, services, DNS, and load balancing). In the first part of the series, we discuss container-to-container and pod-to-pod networking. We demonstrate how Kubernetes networking is different from the "normal" Docker approach, what requirements for networking implementations it imposes, and how it achieves a homogeneous networking system that allows Pods communication across nodes. We think that by the end of this article you'll have a better understanding of Kubernetes networking that will prepare you for the deployment of the full-fledged microservices applications using Kubernetes services, DNS, and load balancing.

Fundamentals of Kubernetes Networking

Kubernetes platform aims to simplify cluster networking by creating a flat network structure that frees users from setting up dynamic port allocation to coordinate ports, designing custom routing rules and sub-nets, and using Network Address Translation (NAT) to move packets across different network segments. To achieve this, Kubernetes prohibits networking implementations involving any intentional network segmentation policy. In other words, Kubernetes aims to keep the networking architecture as simple as possible for the end user. The Kubernetes platform sets the following networking rules:

  • All containers should communicate with each other without NAT.
  • All nodes should communicate with all containers without NAT.
  • The IP as seen by one container is the same as seen by the other container (in other words, Kubernetes bars any IP masquerading).
  • Pods can communicate regardless of what Node they sit on.

To understand how Kubernetes implements these rules, let's first discuss the Docker model that serves as a point of reference for Kubernetes networking.

Overview of the Docker Networking Model

As you might know, Docker supports numerous network architectures like overlay networks and Macvlan networks, but its default networking solution is based on host-private networking implemented by the bridge networking driver. To clarify the terms, as with any other private network, Docker's host-private networking model is based on a private IP address space that can be freely used by anybody without the approval of the Internet registry but that has to be translated using NAT or a proxy server if the network needs to connect to the Internet. A host-private network is a private network that lives on one host as opposed to a multi-host private network that covers multiple hosts.

Governed by this model, Docker's bridge driver implements the following:

  • First, Docker creates a virtual bridge (docker0) and allocates a subnet from one of the private address blocks for that bridge. A network bridge is a device that creates a single merged network from multiple networks or network segments. By the same token, a virtual bridge is an analogy of a physical network bridge used in the virtual networking. Virtual network bridges like docker0 allow connecting virtual machines (VMs) or containers into a single virtual network. This is precisely what the Docker's bridge driver is designed for.
  • To connect containers to the virtual network, Docker allocates a virtual ethernet device called veth attached to the bridge. Similarly to a virtual bridge, veth is a virtual analogy of the ethernet technology used to connect hosts to LAN or Internet or package and to pass data using a wide variety of protocols. The veth is mapped to eth0 network interface, which is Linux's Ethernet interface that manages Ethernet device and connection between the host and the network. In Docker, each in-container eth0 is provided with an IP address from the bridge's address range. In this way, each container gets its own IP address from that range.

The above-described architecture is schematically represented in the image below.


In this image, we see that both Container 1 and Container 2 are part of the virtual private network created by the docker0 bridge. Each of the containers has a veth interface connected to the docker0 bridge. Since both containers and their veth interfaces are on the same logical network, they can easily communicate if they manage to discover each other's IP addresses. However, since both containers are allocated a unique veth, there is no shared network interface between them, which hinders coordinated communication, container isolation, and ability to encapsulate them in a single abstraction like pod. Docker allows solving this problem by allocating ports, which then can be forwarded or proxied to other containers. This has a limitation that containers should coordinate the ports usage very carefully or allocate them dynamically.

Kubernetes Solution

Kubernetes bypasses the above-mentioned limitation by providing a shared network interface for containers. Using the analogy from the Docker model, Kubernetes allows containers to share a single veth interface like in the image below.


As a result, Kubernetes model augments the default host-private networking approach in the following way:

  • Allows both containers to be addressable on veth0 (e.g., 172.17.02 in the image above).
  • Allows containers to access each other via allocated ports on localhost. Practically speaking, this is the same as running applications on a host with added benefits of container isolation and design of tightly coupled container architectures.

To implement this model, Kubernetes creates a special container for each pod that provides a network interface for other containers. This container is started with a "pause" command that provides a virtual network interface for all containers, allowing them to communicate with each other.

By now, you have a better understanding of how container-to-container networking works in Kubernetes. As we have seen, it is largely based on the augmented version of the bridge driver but with an added benefit of a shared network interface that provides better isolation and communication for containerized applications.

Tutorial

Now, let's illustrate a possible scenario of the communication between two containers running in a single pod. One of the most common examples of the multi-container communication via localhost is when one container like Apache HTTP server or NGINX is configured as a reverse proxy that proxies requests to a web application running in another container.

Elaborating upon this case, we are going to discuss a situation when the NGINX container is configured to proxy request from its default port (:80) to the Ghost publishing platform accessible on some port (e.g port:2368)

To complete this example, we'll need the following prerequisites:

  • A running Kubernetes cluster. See Supergiant GitHub wiki for more information about deploying a Kubernetes cluster with Supergiant. As an alternative, you can install a single-node Kubernetes cluster on a local system using Minikube.
  • A kubectl command line tool installed and configured to communicate with the cluster. See how to install kubectl here.

Step #1: Define a ConfigMap

ConfigMaps are Kubernetes objects that allow decoupling the app's configuration from the Pod's spec enabling better modularity of your settings. In the example below, we are defining a ConfigMap for NGINX server that includes a basic reverse proxy configuration.

apiVersion: v1
kind: ConfigMap
metadata:
  name: nginx-conf
data:
  nginx.conf: |-
    user  nginx;
    worker_processes  2;
    error_log  /var/log/nginx/error.log warn;
    pid        /var/run/nginx.pid;
    events {
      worker_connections  1024;
    }
    http {
      sendfile        on;
      keepalive_timeout  65;
      include /etc/nginx/conf.d/*.conf;
      server {
        listen 80 default_server;
        location /ghost {
          proxy_pass http://127.0.0.1:2368;
        }
      }
    }

In brief, this ConfigMap tells NGINX to proxy requests from its default port localhost:80 to localhost:2368 on which the Ghost is listening to requests.

This ConfigMap should be first passed to Kubernetes before we can deploy a Pod. Save the ConfigMap in a file (e.g.,  nginx-config.yaml), and then run the following command:

kubectl create -f nginx-config.yaml

Step #2: Create a Deployment

The next thing we need to do is to create a Deployment for our two-container pod (see our recent article for the review of the Pod deployment options in Kubernetes).

apiVersion: apps/v1
kind: Deployment
metadata:
  name: tut-deployment
  labels:
    app: tut
spec:
  replicas: 1
  selector:
    matchLabels:
      app: tut
  template:
    metadata:
      labels:
        app: tut
    spec:
      containers:
      - name: ghost
        image: ghost:latest
        ports:
        - containerPort: 2368
      - name: nginx
        image: nginx:alpine
        ports:
        - containerPort: 80
        volumeMounts:
        - name: proxy-config
          mountPath: /etc/nginx/nginx.conf
          subPath: nginx.conf
      volumes:
      - name: proxy-config
        configMap:
          name: nginx-conf

These deployment specs:

  • Define a deployment named 'tut-deployment' (metadata.name) and assign a label 'tut' to all pods of this deployment (metadata.labels.app).
  • Sets desired state of the deployment to 2 replicas (spec.replicas).
  • Define two containers: 'ghost' that uses ghost Docker container image and 'nginx' container that uses the nginx image from Docker repository.
  • Open a container port:80 for the 'nginx' container (spec.containers.name.image).
  • Create a volume 'proxy-config' and one volume of a type configMap named 'nginx-config' that will be used by containers to access a ConfigMap resource titled "nginx-config" created in the previous step.
  • Mounts 'proxy-config' volume to the path /etc/nginx/nginx.conf to enable the container's access to NGINX configuration.

To create this deployment, save the above manifest in the tut-deployment.yaml file and run the following command:

kubectl create -f tut-deployment.yaml

If everything is OK, you will be able to see the running deployment using kubectl get deployment tut-deployment:

NAME             DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
tut-deployment   2         2         2            0           13s

Step #3: Exposing a NodePort

Now, as our Pods are running, we should expose the NGINX port:80 to the public Internet to see if the reverse proxy works. This can be done by exposing the Deployment as a Service (in the next tutorial, we are going to cover Kubernetes services in more detail):

kubectl expose deployment tut-deployment --type=NodePort --port=80
service "tut-deployment" exposed

After our deployment is exposed, we need to find a NodePort dynamically assigned to it:

kubectl describe service tut-deployment

This command will produce an output similar to this:

Name:           tut-deployment
Namespace:      default
Labels:         app=tut
Selector:       app=tut
Type:           NodePort
IP:         10.3.208.190
Port:           <unset> 80/TCP
NodePort:       <unset> 30234/TCP
Endpoints:      10.2.6.6:80,10.2.6.7:80
Session Affinity:   None

We need a NodePort value, which is 30234 in our case. Now you can access the ghost publishing platform through NGINX using http://YOURHOST:30234.

That's it! Now you see how containers can easily communicate via localhost using built-in Pod's virtual network. As such, a container-to-container networking is a building block of the next layer, which is a pod-to-pod networking discussed in the next section.

From Container-to-Container to Pod-to-Pod Communication

One of the most exciting features in Kubernetes is that pods and containers within pods can communicate with each other even if they land on different nodes. This feature is something that is not implemented in Docker by default (Note: Docker supports multi-host connectivity as a custom solution available via overlay driver). Before delving deeper into how Kubernetes implements pod-to-pod networking, let's first discuss how networking works on a pod level.

As we remember from the previous tutorial, pods are abstractions that encapsulate containers to provide Kubernetes services like shared storage, networking interfaces, deployment, and updates to them. When Kubernetes creates a pod, it allocates an IP address to it. This IP is shared by all containers in that pod and allows them to communicate with each other using localhost (as we saw in the example above). This is known as "IP-per-Pod" model. It is an extremely convenient model where pods can be treated much like physical hosts or VMs from the standpoint of port allocation, service discovery, load balancing, migration, and more.

So far, so good! But what if we want our pods to be able to communicate across nodes? This becomes a little more complicated.

Referring to the example above, let's assume that we now have two nodes hosting two containers each. All these containers are connected using docker0 bridges and have shared veth0 network interfaces. However, on both nodes a Docker bridge (docker0) and a virtual ethrenet interface ( veth0 ) are now likely to have the same IP address because they were both created by the same default Docker function. Even if veth IPs are different, we still do not avoid a problem of an individual node being unaware of private network address space created on another node, which makes it difficult to reach pods on it.

How Does Kubernetes Solve this Problem?

Let's see how Kubernetes elegantly solves this problem. As we see in the image below, veth0, custom bridge, eth0, and a gateway that connects two nodes are now parts of the shared private network namespace centered around the gateway (10.100.01). This configuration implies that Kubernetes has somehow managed to create a separate network that covers two nodes. You may also notice that addresses to bridges are now assigned depending on what node a bridge is living on. So for example, we now have a 10.0.1... address space shared by a custom bridge and veth0 on Node 1 and a 10.0.2... address space shared by the same components on Node 2. At the same time, however, eth0 on both nodes share the address space of the common gateway, which allows both nodes to communicate (10.100.0.0 address space).


The design of this network is similar to an overlay network. (In a nutshell, an overlay network is a network built on top of another low-level network.) For example, the internet was originally built as an overlay over the telephone network. A pod network in Kubernetes is an example of an overlay network that takes individual private networks within each node and transforms them into a new software-defined network (SDN) with a shared namespace, which allows pods to communicate across nodes. That's how the Kubernetes magic works!

Kubernetes ships with this model by default, but there are several networking solutions that achieve the same result. Remember that any network implementation that violates Kubernetes networking principles (mentioned in the Intro) will not work with Kubernetes. Some of the most popular networking implementations supported by Kubernetes are the following:

  • Cisco Application Centric Infrastructure -- an integrated overlay and underlay SDN solution with the support for containers, virtual machines, and bare metal servers.
  • Cilium -- open source software for container applications with a strong security model.
  • Flannel -- a simple overlay network that satisfies all Kubernetes requirements while being one of the most easiest to install and run.

For more available networking solutions, see the official Kubernetes documentation.

Conclusion

In this article, we covered two basic components of the Kubernetes networking architecture: container-to-container networking and pod-to-pod networking. We have seen that Kubernetes uses overlay networking to create a flat network structure where containers and pods can communicate with each other across nodes. All routing rules and IP namespaces are managed by Kubernetes by default, so there is no need to bother creating subnets and using dynamic port allocation. In fact, there are several out-of-the-box overlay network implementations to get you started. Kubernetes networking enables an easy migration of applications from VMs to pods, which can be treated as "virtual hosts" with the functionality of VMs but with an added benefit of container isolation and microservices architecture. In our following tutorial, we discuss the next layer of the Kubernetes networking: services, which are abstractions that implement microservices and service discovery for pods, enabling highly available applications accessible from the outside of a Kubernetes cluster.

Keep reading

Kubernetes Networking: Services

Posted by Kirill Goltsman on May 30, 2018

In this part of our Kubernetes networking series, we are moving to the discussion of Kubernetes services, which are one of the best features of the platform. We discuss how services work under the hood and how they can be created using Kubernetes native tools. By the end of this article, you'll have a better understanding of how to turn your pods into fully operational microservices capable of working at any scale.

Keep reading

Deploying and Autoscaling a Kubernetes Cluster on Packet.net with Supergiant

Posted by Kirill Goltsman on May 29, 2018

By the end of this tutorial, you'll know how to link Packet.net cloud account and leverage Supergiant's autoscaling packing algorithm to deploy the cluster and minimize cloud costs. Sounds promising? It actually is!

Keep reading

Assigning Computing Resources to Containers and Pods in Kubernetes

Posted by Kirill Goltsman on May 21, 2018

In this tutorial, we shall describe the inner workings of the Kubernetes resource model and walk you through assigning compute resources (CPU and RAM) to containers using Kubernetes native tools and API. We shall also discuss how resources can be assigned using the Supergiant platform that provides a Kubernetes-as-a-Service solution. 

Keep reading