ALL THINGS KUBERNETES

Kubernetes Storage: Introduction

Kubernetes offers a number of resources like volumes, StatefulSets, or StorageClass that provide diverse storage options for applications running in your clusters. The platform’s storage options include both native storage, CSP disks, network file systems, object storage, and many more. In this article, we introduce you to Kubernetes Volumes and demonstrate how they can be used to share resources between containers in a stateless app. Let’s start!

What Are Kubernetes Volumes?

Volumes are Kubernetes abstractions that allow containers to use various storage and file system types, share storage, and keep state. By themselves, containers do not maintain their state if terminated or restarted. To avoid losing information, containers must be defined with volumes mounted at specific paths in the container image’s file system.

Even though volumes outlive containers to which they are attached, when a Pod dies, the volume dies, too. This does not necessarily mean that volume data will be lost forever. Its destiny will depend on the type of volume used (persistent vs. nonpersistent). Therefore, it is useful to distinguish between two types of applications that use volumes: stateless and stateful apps. Here is the difference.

Stateless apps do not store the client data generated during sessions. When a session ends, all data generated by the user is lost. A typical example of a volume suitable for stateless applications is emptyDir . A volume of this type will exist until a pod to which it’s attached is removed from a node for some reason. When that happens, all data in the emptyDir  will be deleted forever.

In contrast, stateful apps like databases need some way to store application and user data. Kubernetes has a support for stateful apps implemented in such resources as PersistentVolume  (PV). The latter is a volume plugin with a lifecycle independent on the pod that uses it. PersistentVolumes  allow using any external storage (e.g., AWS ELB) without the knowledge of the underlying cloud environment.

Volumes are very powerful indeed because they abstract container storage from the underlying storage infrastructure (e.g., devices, file systems) and storage providers much like Kubernetes resource requests and limits abstract CPU and memory from the VMs and bare-metal servers (see the image below). In addition, Kubernetes users can extend the platform with their own storage types. New volume plugins can be created for any conceivable type of storage using Container Storage Interface (CSI) and FlexVolume interfaces that expose volume drivers to container environments.

 

Kubernetes Volumes

Currently, Kubernetes supports over 25 volume plugins. Describing each of them is beyond the scope of this article, but you can find more information in the Kubernetes documentation. Just to get some idea of various storage options supported by Kubernetes, it will be useful to break down the available volume plugins by category:

  • Volumes of cloud service providers (CSPs). One example of this type is awsElasticBlockStore  , which allows mounting an AWS EBS volume into a Pod. The contents of this volume type are preserved when a Pod is removed. Other available CSP volume types include Microsoft’s azureDisc  and GCE’s gcePersistentDisk  among others. CSP-based volume drivers are normally ‘claimed’ by PersistentVolumes  that “rent” access to the underlying storage infrastructure.
  • Object storage systems. For example, Kubernetes supports CephFS ( cephfs ) that provides interfaces for object-level, block-level, and file-level storage.
  • Native Kubernetes volume types. emptyDir  and configMap  are two examples of volumes supported by Kubernetes natively. For example, a configMap  volume can be used to inject configuration defined in the ConfigMap  object for the use of containers in your Pod.
  • Volumes for remote repositories. Such volume plugins as gitRepo  can be used for cloning git repositories into empty directories for your Pods to access.
  • Network filesystems for accessing files across the network. Kubernetes supports NFS (Network File System) ( nfs ), iscsi  (IP-based storage networking protocol for linking data storage facilities), Gluster ( glusterfs ), and some more.
  • Persistent storage for stateful applications. For example, PersistentVolumes  allow users to “claim” persistent storage options like GCE PersistentDisk  without bothering about the details of a particular cloud environment. Other examples of persistent storage include StorageOs ( storageos ) and Portworx ( portworxVolume ).
  • Data center filesystems. Kubernetes ships with Quobyte ( quobyte ) volume plugin that mounts Quobyte Data Center File System.
  • Secrets volumes. Kubernetes offers a secret  volume for storing sensitive information in the Kubernetes API that can be mounted as files by Pods. Secret volumes use tmpfs  (a RAM-basked filesystem) so they are never written on disk.

Defining a Volume

In Kubernetes, the provisioning of volumes for Pods is quite simple. You can use the spec.volumes  field to specify what volumes to provide and spec.containers.volumeMounts  field to indicate where to mount these olumes into containers. Note: mount paths should be specified for each container in a Pod individually.

Below is a simple example of defining and mounting a volume for some arbitrary Pod (Pod meta details are omitted for brevity).

This Pod spec:

  • Creates a default volume named httpd-config  and tells Pod to use ConfigMap  volume to inject Apache HTTP server configuration. See ConfigMap resource documentation to learn more.
  • Mounts the volume at the path /etc/apache2/  directory that contains httpd  application files (e.g., configuration). Mounting volumes needs more explanation though. In a nutshell, the container’s filesystem is composed of the Docker image and volumes (containerization creates a partial view of the file system used by the application). The Docker image is at the root of this filesystem. All new volumes are mounted at the specified paths within this image. In our example, we have the container with the Docker filesystem that might contain /var , /etc , /bin  and other directories used by the Apache HTTP server. By mounting the volume at /etc/apache2/  location, we make all contents of this folder accessible to it.

In the example below, the volume will be populated by Apache HTTP Server data. However, if we were to create an empty volume, we could use an emptyDir  volume type.

This way we would have a completely empty directory, which is useful for a disk-based merge sort, caching, and more.

Using SubPath

It is a good practice when containers have their individual directories (folders) within a shared volume. This design is especially useful for stacked applications with several tightly coupled containers. The subPath  field allows mounting a single volume multiple times with different sub-paths.

In the example below, we define a Pod where NGINX data is mapped to html  sub-path and MySQL database is stored in mysql  folder of a shared persistent “site-data” volume.

Now, both NGINX and MySQL have their individual folders inside a shared Volume.

Note: It’s worth mentioning that subPath  currently has a few vulnerabilities to watch out. See this article to learn more.

So far so good! By now we have a general understanding of how volumes work in Kubernetes and what volume types the platform offers out of the box. However, what are some use cases for volumes? In what follows, we guide you through the tutorial showing how to use Kubernetes volumes to share data between two containers in a Pod. Let’s go!

Tutorial: Communication between Containers Using Shared Storage

In this tutorial, we demonstrate a typical use case for a shared storage when one container is writing logs to the log file while another container (referred to as a sidecar logger) streams these logs to its own stdout . Since kubelet  controls stdout , the application’s logs can be then accessed using kubectl logs PODNAME CONTAINER_NAME  (see the image below).

Kubernetes Sidecar Logging

To complete this example, we need the following prerequisites:

  • A working Kubernetes cluster. See our guide for more information about deploying a Kubernetes cluster with Supergiant. As another option, you can install a single-node Kubernetes cluster on a local system with Minikube.
  • A kubectl command line tool installed and configured to communicate with the cluster. See how to install kubectl here.

We create a Pod that pulls an NGINX container (application container) writing some logs and the sidecar container with a busybox  Docker image that provides several Unix tools like bash in a single executable file.

In this Pod definition:

  • Create the Pod named ‘sidecar-logging‘ with NGINX and busybox containers.
  • Define a volume “varlog” to be used by both containers. We are mounting /var/log/nginx  directory from the Docker’s filesystem so that the sidecar container (busybox) has access to NGINX log files ( error.log  and access.log )
  • Open containerPort: 80  for the NGINX container.
  • Using Unix shell from the busybox distribution, we ask busybox to tail logs from /var/log/nginx/access.log . Tail is a UNIX native program that allows tracking the tail end of text files, stream, and display new lines as they are added. This is very useful for dynamic tracking of NGINX log files. The log file that will be tracked stores logs generated by HTTP client requests to NGINX server such as visiting the page in a browser or sending curl  requests to the server.

As you see, our sidecar logging container is quite simple, but that is enough to illustrate how shared storage can be used for inter-container communication.

The next thing we need to do is to save the above Pod spec in the sidecar-logging.yaml  and deploy our Pod running the following command:

Our Pod is now running but is not exposed to the external world. In order to access NGINX server from outside of our cluster, we need to create a NodePort  Service type like this:

Now, we can access NGINX on the yourhost:NodePort , which will redirect to the NGINX container listening on the port:80 . However, first we need to find out what port Kubernetes assigned to the NodePort  service.

As you see, we have the 31399  port assigned. Now, we can trigger the server to write some logs to the /var/log/nginx/access.log  file by accessing yourhost:31399  from your browser or sending arbitrary curl  requests.

Let’s check out what logs our sidecar logging container has to display by running kubectl logs sidecar-logging busybox . This will return access logs specifying the IP of the machine that sent the request, HTTP resources to which requests were sent, type of requests, user agents (browsers), and dates. You should have an output similar to this:

That’s it! Now you understand how a sidecar container can use shared storage to collect logs from the main application and display them. The container can also ship logs to some logging backend, but the example above is enough for you to get some basic idea of using shared storage in Kubernetes.

Conclusion

In this article, we examined Kubernetes volumes — powerful abstractions that enable diverse storage options for Pods and containers. Thanks to over 25 supported volume types and the capacity to create new volume plugins, Kubernetes allows deploying applications with any storage types, both stateless and stateful. In this tutorial, however, we largely focused on stateless applications that use volumes with a finite lifecycle. In our next articles, we will discuss persistent storage using PersistentVolumes  and other storage resources for the deployment of the full-fledged stateful applications in Kubernetes. Stay tuned for new content to find out more!

Subscribe to our newsletter