ALL THINGS KUBERNETES

Persistent Storage with Persistent Volumes in Kubernetes

As you might remember from the previous tutorials, Kubernetes supports a wide variety of volume plugins both for stateless and stateful apps. Non-persistent volumes such as emptyDir  or configMap  are firmly linked to the pod’s lifecycle: they are detached and deleted after the pod is terminated.

However, with stateful applications like databases, we want to have volumes that persist data beyond the pod’s lifecycle. Kubernetes solves this problem by introducing the PersistentVolume  and PersistentVolumeClaim  resources that enable native and external persistent storage in your Kubernetes clusters.

In this tutorial, we’ll explain how these two API resources can be used together to link various storage architectures (both native and CSP-based) to applications running in your cluster. To consolidate the theory, we are also going to walk you through a tutorial about using hostPath  as a PersistentVolume  in your stateful deployment. Let’s start!

Why Do We Need Persistent Volumes?

The rationale for using a PersistentVolume  resource in Kubernetes is quite simple. On the one hand, we have different storage infrastructures such as Amazon EBS (Elastic Block Storage), GCE Persistent Disk, or GlusterFS, each having its specific storage type (e.g., block storage, NFS, object storage, data center storage), architecture, and API. If we were to attach these diverse storage types manually, we would have to develop custom plugins to interact with the external drive’s API, such as mounting the disk, requesting capacity, managing the disk’s life cycle, etc. We would also need to configure a cloud storage environment, all of which would result in unnecessary overhead.

Fortunately, the Kubernetes platform simplifies storage management for you. Its PersistentVolume  subsystem is designed to solve the above-described problem by providing APIs that abstract details of the underlying storage infrastructure (e.g., AWS EBS, Azure Disk etc.), allowing users and administrators to focus on storage capacity and storage types their applications will consume rather than the subtle details of each storage provider’s API.

This sounds similar to the pod’s resource model, doesn’t it? As we remember from the previous tutorial, containers in a pod request resources in raw amounts of CPU and RAM so users do not bother about server flavors and memory types used by CSPs under the hood. PVs do the same for your storage, providing the right amount of resources for your pods regardless of what storage provider you opt to use.

Linking a persistent volume to your application involves several steps: provisioning a PV, requesting storage, and using the PV in your pod or deployment. Let’s discuss these steps.

Provisioning a PersistentVolume (PV)

Kubernetes users need to define a PersistentVolume  resource object that specifies the storage type and storage capacity, volume access modes, mount options, and other relevant storage details (see the discussion below). Once the PV is created, we have a working API abstraction that tells Kubernetes how to interact with the underlying storage provider using its Kubernetes volume plugin (see our article about Kubernetes Volumes to learn more).

Kubernetes supports static and dynamic provisioning of PVs. In the static provisioning, PVs are created by the cluster administrator and are allowed to use actual storage available in the cluster. This means that in order to use static provisioning, one needs to have a storage (e.g., Amazon EBS) capacity provisioned beforehand (see the tutorial below).

On the other hand, a dynamic provisioning of volumes can be triggered when a volume type claimed by the user does not match any PVs available in the cluster. For dynamic provisioning to happen, the cluster administrator needs to enable the DefaultStorageClass  admission plugin that defines default storage classes for applications. (Note: this is a big topic in its own right, so it will be discussed in the next tutorials.)

Requesting Storage

To make PVs available to pods in the Kubernetes cluster, you should explicitly claim them using a PersistentVolumeClaim  (PVC) resource. A PVC is bound to the PV that matches storage type, capacity, and other requirements specified in the claim. Binding the PVC to the PV is secured by the control loop that watches for new PVCs and finds a matching PV. The claim will be automatically unbound if a matching volume does not exist. For example, if the PVC is requesting 200 Gi and the cluster only has 100 Gi PVs available, the claim won’t be bound to any PV until a PV with 200 Gi is added to the cluster. In the example below, the PVC will be bound to the Persistent Volume #1 and not the Persistent Volume #2 because the PVC’s resource request and the volume mode match the first volume only.

Kubernetes Persistent Storage

Using PVs and PVCs in Pods

Pods can use PVs by specifying a PVC that matches its resource and volume options definitions. This works as follows. When the PVC is specified, the cluster finds the claim in the pod’s namespace, uses it to get the PV backing the claim, and mounts the corresponding volume into the container/s in the pod.

Note that PVCs should be referenced as Volumes in a Pod’s spec. For example, below we see a pod spec that uses a persistentVolumeClaim  named “test-claim” referring to some PV:

Persistent Volume Types

We discussed available volume types in the previous tutorial. However, not all of them are persistent. Below is a table of the persistent storage volume plugins that can be used by PVs and PVCs.

Volume Name Storage Type                Description
gcePersistentDisk Block Storage A Google Compute Engine (GCE) Persistent Disk that provides SSD and HDD storage attached to nodes and pods in a K8s cluster.
awsElasticBlockStore Block Storage Amazon EBS volume is a persistent block storage volume offering consistent and low-latency performance.
azureFile Network File Shares Microsoft Azure file volumes are fully managed file shares in Microsoft Azure accessible via the industry standard Server Message Block (SMB) protocol.
azureDisk Block Storage A Microsoft Azure data disk provides block storage with SSD and HDD options.
fc Data Center Storage and Storage Area Networks (SAN) Fibre channel is a high-speed networking technology for the lossless delivery of raw block data. FC is primarily used in Storage Area Networks (SAN) and commercial data centers.
FlexVolume Allows Creating Volume Plugins FlexVolume enables users to develop Kubernetes volume plugins for vendor-provided storage.
flocker Container Data Storage and Management Flocker is an open-source container data volume manager for Dockerized applications. The platform supports container portability across diverse storage types and cloud environments.
nfs Network File System NFS refers to a distributed file system protocol that allows users to access files over a computer network.
iscsi Networked Block Storage iSCSI (Internet Small Computer Systems Interface) is an IP-based storage networking protocol for connecting data storage facilities). It is used to facilitate data transfer over intranets and to manage storage over long distances by enabling location-independent data storage.
rbd Ceph Block Storage Ceph RADOS Block Device (RBD) is a building block of Ceph Block Storage that leverages RADOS capabilities such as snapshotting, consistency, and replication.
cephfs Object Storage and Interfaces for Block and File Storage Ceph is a storage platform that implements object storage on a distributed computer cluster.
cinder Block Storage Cinder is a block storage service for openstack designed to provide storage resources to end users that can be used by the OpenStack Compute Project (Nova).
glusterfs Networked File System Gluster is a distributed networked file system that aggregates storage from multiple servers into a single storage namespace.
vsphereVolume VMDK Stands for a virtual machine disk (VMDK) provided by the vSphere (VMware).
quobyte Data Center File System Quobyte volume plugin mounts Quobyte data center file system.
hostPath Local Cluster File System hostPath  volumes mounts directories from the host node’s filesystem into a pod.
portworxVolume Block Storage A portworxVolume  is a Portworx’s elastic block storage layer that runs hyperconverged with Kubernetes. Portworx’s storage system is designed to aggregate capacity across multiple servers similarly to Gluster.
scaleIO Shared Block Networked Storage ScaleIO is a software-defined storage product from Dell EMC that creates a server-based Storage Area Network (SAN) from local server storage. It is designed to convert direct-attached storage into shared block storage.
storageos Block Storage StorageOS aggregates storage across a cluster of servers and exposes it as high-throughput and low-latency block storage.

Defining a PersistentVolume API Resource

Now, let’s discuss how PVs actually work. First, PVs are defined as Kubernetes API objects with a spec and a list of parameters. Below is an example of a typical PersistentVolume  definition in Kubernetes.

As you see, here we defined a PV for the NFS volume type. Key parameters defined in this spec are the following:

  • spec.capacity  — Storage capacity of the PV. In this example, our PV has a capacity of 10 Gi (gigibytes). The capacity property uses the same units as defined in the Kubernetes resource model. It allows users to represent storage as unadorned integers or as fixed-point integer with one of these SI suffices (E, P, T, G, M, K, m) or as their binary equivalents (Ei, Pi, Ti, Gi, Mi, Ki). Currently, Kubernetes users can only request storage size. However, future attributes may include throughput, IOPS, etc.
  • spec.volumeMode  (available since Kubernetes v.1.9 ) — The volume mode property supports raw block devices and filesystems. Block storage mode offers a raw unformatted block storage that avoids filesystem overhead and, hence, ensures lower latency and higher throughput for mission-critical applications such as databases and object stores. Valid values for this field include “Filesystem” (default) and “Block“.
  • spec.accessModes  — Defines how the volume can be accessed. (Note: valid values vary across persistent storage providers.) In general, the supported field values are:
    • ReadWriteOnce –- the volume can be mounted as a read/write volume only by a single node.
    • ReadOnlyMany — many nodes can mount the volume as read-only.
    • ReadWriteMany — many nodes can mount the volume as read-write.

Note: a volume can be only mounted with one access mode at a time, even if it supports many.

  • spec.storageClassName  — A storage class of the volume defined by the StorageClass resource. A PV of a given class can only be bound to PVCs requesting that class. A PV with no storageClassName  defined can only be bound to PVCs that request no particular class.
  • spec.persistentVolumeReclaimPolicy  — A reclaim policy for the Volume. At the present moment, Kubernetes supports the following reclaim policies:
    • retain: If this policy is enabled, the PV will continue to exist even after the PVC is deleted. However, it won’t be available to another claim until the previous claimant’s data remains on the volume and PV are deleted manually. It’s also worth noting that where Retain  really shines is that users can reuse that data if they want — for example, if they wanted to use the data without a PV (e.g., switch to a traditional database model), or if they wanted to use that data on a different, new PV (migrating to another cluster, testing, etc.).
    • recycle: (Deprecated): This reclaim policy performs a basic scrub operation ( rm -rf /thevolume/* ) on a given volume and makes it available again for a new claim.
    • delete: This reclaim policy deletes the PersistentVolume  object from the Kubernetes API and associated storage capacity in the external infrastructure (e.g., AWS EBS, Google Persistent Disk, etc.). AWS EBS, GCE PD, Azure Disk, and Cinder volumes support this reclaim policy.
  • spec.mountOptions  — A K8s administrator can use this field to specify additional mount options supported by the storage provider. In the spec above, we mount an NFS hard drive and specify that the NFS version 4.2 should be used.Note: Not all providers support mount options. For more information, see the official documentation.
  • spec.nfs  — The list of the NFS-specific options. Here, we say that nfs  Volume should be mounted at the /tmp  path of a server with an IP 172.15.0.6  .

Defining PVC

As we’ve already said, PersistentVolumeClaims  must be defined to claim resources of a given PV. Similarly to PVs, PVCs are defined as Kubernetes API resource objects. Let’s see an example:

This PVC:

  • targets volumes that have ReadWriteMany  access mode ( spec.accessModes ).
  • requests the storage only from volumes that have “Block” volume mode enabled ( spec.volumeMode ).
  • claims 10Gi of storage from any matching PV ( spec.resources.requests.storage ).
  • filters volumes that match a “slow” storage class ( spec.storageClassName ). Note: If the default StorageClass  is set by the administrator, the PVC with no storageClassName  can be bound only to PVs of that default.
  • specifies a label selector to further filter a set of volumes. Only volumes with a label “stable” can be bound to this claim.

This PVC matches the StorageClass  of the PV defined above. However, it does not match the volumeMode  and accessMode  in that PV. Therefore, our PVC cannot be used to claim resources from the pv-nfs  PV.

Using Persistent Volumes by Pod

Once the PV and PVC are created, using persistent storage in a pod becomes straightforward:

In this pod spec, we:

  • pull Apache HTTP Server from the Docker Hub repository ( spec.containers.image )
  • define a Volume “test-pv” and use the PVC “volume-claim” to claim some PV that matches this claim.
  • mount the Volume at the path /usr/local/apache2/htdocs  in the httpd  container.

That’s it! Hopefully, now you understood the theory behind PVs and PVCs and key options and parameters available to these API resources. Let’s consolidate this knowledge with the tutorial.

Tutorial: Using hostPath Persistent Volumes in Kubernetes

In this tutorial, we’ll create a PersistentVolume  using hostPath  volume plugin and claim it for the use in the Deployment running Apache HTTP servers. hostPath  volumes use a file or directory on the Node and are suitable for the development and testing purposes.

Note: hostPath  volumes have certain limitations to watch out. It’s not recommended to use them in production. Also, in order for the hostPath  to work, we will need to run a single node cluster.  See the official documentation for more info.

To complete this example, we used the following prerequisites:

  • A Kubernetes cluster deployed with Minikube. Kubernetes version used was 1.10.0.
  • A kubectl command line tool installed and configured to communicate with the cluster. See how to install kubectl here.

Step #1 Create a Directory on Your Node

First, let’s create a directory on your Node that will be used by the hostPath  Volume. This directory will be a webroot of Apache HTTP server.

First, you need to open a shell to the Node in the cluster. Since you are using Minikube, open a shell by running minikube ssh .

In your shell, create a new directory. Use any directory that does not need root permissions (e.g., user’s home folder if you are on Linux):

Then create the index.html  file in this directory containing a custom greeting from the server (note: use the directory you created):

Step #2 Create a Persistent Volume

The next thing we need to do is to create a hostPath  PersistentVolume  that will be using this directory.

This spec:

  • defines a PV named “pv-local” with a 10Gi capacity ( spec.capacity.storage ).
  • sets the PV’s access mode to ReadWriteOnce, which allows the volume to be mounted as read-write by a single node ( spec.accessModes ).
  • assigns a storageClassName  “local” to the PersistentVolume .
  • configures hostPath  Volume plugin to mount local directory at /home/<user-name>/data

Save this spec in the file (e.g hostpath-pv.yaml ) and create the PV running the following command:

Let’s check whether our PersistentVolume was created:

This response indicates that the  volume is already available but still does not have any claim bound to it, so let’s create one.

Step #3 Create a PersistentVolumeClaim (PVC) for your PersistentVolume

The next thing we need to do is to claim our PV using a PersistentVolumeClaim . Using this claim, we can request resources from the volume and make them available to our future pods.

Our PVC does the following:

  • filters the volumes labeled “local” to bind our specific hostPath  volume and other hostPath  volumes that might be created later ( spec.selector.matchLabels ) .
  • targets hostPath  volumes that have ReadWriteOnce  access mode ( spec.accessModes ).
  • requests a volume of at least 5Gi ( spec.resources.requests.storage ).

First, save this resource definition in the hostpath-pvc.yaml , and then create it similar to what we did with the PV:

Let’s check the claim running the following command:

The response should be something like this:

As you see, our PVC was already bound to the volume of the matching type. Let’s verify that the PV we created was actually selected by the claim:

The response should be something like this:

Did you notice the difference from the previous status of our PV? You’ll see that it is now bound by the claim hostpath-pvc  we just created (the claim is living in the default Kubernetes namespace). That’s exactly what we wanted to achieve!

Step #4 Use the PersistentVolumeClaim as a Volume in your Deployment

Now everything is ready for the use of your hostPath  PV in any pod or deployment of your choice. To do this, we need to create a deployment with a PVC referring to the hostPath  volume. Since we created a PV that mounts a directory with the index.html  file, let’s deploy Apache HTTP server from the Docker Hub repository.

As you see, along with the standard deployment parameters like container image and container port, we have also defined a volume named “web” that uses our PersistentVolumeClaim . This volume will be mounted with our custom index.html  at /usr/local/apache2/htdocs , which is the default webroot directory of Apache HTTP for this Docker Hub image. Also, deployment will have access to 5Gi of data in the hostPath  volume.

Save this spec in httpd-deployment.yaml  and create the deployment using the following command:

Let’s check the deployment’s details:

Along with other details, the output shows that the directory /usr/local/apache2/htdocs  was mounted from the web (rw) volume and that our PersistentVolumeClaim  was used to provision the storage.

Now, let’s verify that our Apache HTTP pods actually serve the index.html  file we created in the first step. To do this, let’s first find the UID of one of the pods and get a shell to the Apache server container running in this pod.

We’ll enter the httpd-5958bdc7f5-jjf9r  pod using the following command:

Now, we are inside the Apache2 container’s filesystem. You may verify this by using Linux ls  command.

The image uses Linux environment, so we can easily install curl to access our server:

When the curl is installed, let’s send a GET request to our server listening on localhost:80  (remember that containers within a Pod communicate via localhost ).

That’s it! Now you know how to define a PV and a PVC for hostPath volumes and use this storage type in your Kubernetes deployments. This tutorial demonstrated how both PV and PVC take care of the underlying storage infrastructure and filesystem, so users can focus on just how much storage they need for their deployment.

Step #5 Clean Up

This tutorial is over, so let’s clean up after ourselves.

Delete the Deployment:

Delete the PV:

Delete the PVC:

Finally, don’t forget to delete all files and folders like /home/<user-name>/data , PV, and PVC resource definition files we created.

Conclusion

We hope that you now have a better understanding of how to create stateful applications in Kubernetes using PersistentVolume  and PersistentVolumeClaim . As we’ve learned, persistent volumes are powerful abstractions that enable user access to diverse storage types supported by the Kubernetes platform. Using PVs you can attach and mount almost any type of persistent storage such as object-, file-, network- level storage to your pods and deployments. In addition, Kubernetes exposes a variety of storage options such as capacity, reclaim policy, volume modes, and access modes, making it easy for you to adjust different storage types to your particular application’s requirements and needs. Kubernetes makes sure that your PVC is always bound to the right volume type available in your cluster, enabling the efficient usage of resources, high availability of applications, and integrity of your data across pod restarts and node failures.

Subscribe to our newsletter