ALL THINGS KUBERNETES

Software-Defined Storage Solutions for Kubernetes

According to Gartner, 50% of global enterprises will be running containers in production by the year 2020. By that time, over 20% of enterprise storage capacity will be allocated to container workloads, compared to only 1% today.

Moving to containers, however, presents new challenges for storage provisioning. In particular, using traditional storage solutions for distributed containerized workloads turns out to be too complex and costly. Key issues associated with the traditional storage solutions include the following:

  • The number of block volumes that can be attached to a server is limited in most operating systems. For example, AWS Linux-based instances set a limit to 40 block volumes. Attaching more can provoke boot failures and is only supported on a best effort basis (i.e., it is not guaranteed). However, the container technology allows running hundreds of containerized applications on a single host in compact containers. These containers may require more volumes than can be provided by the OS. Therefore, container users need a more flexible approach that supports storage virtualization and pooling.
  • The dynamic nature of containers demands dynamic storage. Containers are constantly created, destroyed, and moved across hosts. Accordingly, stateful applications running in these containers require storage solutions that can automatically attach or detach storage capacities, instantly provision new storage in the availability zones where the container lands, and regularly create backups and replicas of data to ensure High Availability (HA). Manual storage provisioning can’t address these requirements efficiently at scale. Even if it could, this would require additional Ops and administrators staff.
  • Storage diversity and heterogeneity in distributed compute environments requires abstracting storage hardware from software. Containerized workloads may use a variety of storage solutions including HDD, SSD, block or object storage, etc. Also, storage and volume types containers use may differ across cloud providers. That being said, running cloud-native applications at scale requires an additional abstraction layer that could abstract the diverse storage types and expose them as a single pool of resources to these applications.

It goes without saying that addressing the challenges listed above requires a cloud-native storage solution optimized for containers. It should be flexible, hardware-agnostic, scalable, and tightly integrated into popular container orchestration frameworks such as Kubernetes, Docker, or Mesos. This leads us to the concept of Software-Defined Storage (SDS).

What Is Software-Defined Storage?

In a nutshell, software-defined storage (SDS) is a storage architecture that abstracts storage software from its hardware and allows presenting hardware capacity to users as a unified storage pool. A developed SDS solution can remove the storage software’s dependence on proprietary hardware on any industry-standard x86 system, which implies support for major storage and hardware solutions in the market.

Why is SDS so useful? Imagine you have multiple different x86 server flavors, each with a different storage capacity type (for instance, SSD or HDD), storage software, and file systems types (NTFS, ext4). Each of these diverse storage types would require specific expertise and configuration to work properly. Fortunately, SDS allows you to abstract all capacity provided by this hardware and merge it into a single storage pool that is infinitely flexible and scalable. Consumers of this storage pool would not need to bother about specific pieces of storage software they need for managing different volume types because everything is managed by the SDS system.

In addition to abstracting different hardware types and storage pooling, SDS systems can also provide standard APIs for the management and maintenance of storage devices, storage access services (e.g., via NFS and SMB), storage backups and replication, encryption, and hyperconvergence of storage and applications, etc.

In general, SDS offers the following benefits for the containerized applications:

  • You can choose any storage hardware to support your containerized workloads. Developed SDS systems support the most popular hardware solutions available on x86 systems, such as HDD, SDD, external drives and a number of storage technologies such as block and object storage. As a result, you can manage heterogenous storage types as the unified pool of storage resources.
  • Scalability. Most SDS systems support on-demand storage provisioning that meets your current business requirements. Scaling storage becomes easier, too, because you don’t need to worry about the underlying storage hardware and software any more.
  • Distributed storage in different environments. Modern SDS technologies can join storage originating from different environments. Whether you use cloud-based or on-premises storage solutions, SDS will ensure that remote storage services are interconnected and accessible via the SDS API, which allows instant provisioning and scaling of resources, no matter where they are located.
  • High Availability. SDS solutions are designed for High Availability of your data in the storage cluster. This HA is achieved by cross-AZ replication and backups, data migration from unhealthy nodes, and collocation of data closer to applications to ensure lower latency and faster access, etc.
  • Integration with container orchestration services. Many modern SDS solutions can be integrated into the container orchestration framework of your choice. The SDS integrated into the orchestration platform can benefit from its native features such as scheduling and disaster recovery to better manage container storage services.

SDS Volumes in Kubernetes

In one of our earlier blogs, we discussed the design of Kubernetes storage system. As you may remember, Kubernetes ships with numerous volume plugins, which are abstractions allowing containers to use different storage and file systems types and architectures, such as volumes of CSPs, network filesystems, and object and block storage. Along with these volume types, Kubernetes supports a number of volume plugins for popular SDS providers including GlusterFS, Quobyte, Portworx, and ScaleIO.

The scope of SDS solutions you can use in Kubernetes is, however, unlimited. In Kubernetes, you can create a volume plugin for any available SDS solution using Container Storage Interface (CSI) and Flexvolume plugin interfaces. Using these solutions, you can create custom storage plugins on top of Kubernetes and expose storage from these plugins to their container workloads.

In what follows, we’ll overview major SDS solutions currently supported by Kubernetes and demonstrate some examples of how to use them in your Pods and containers.

GlusterFS

GlusterFS is a software-defined distributed file system that can aggregate disk storage from multiple nodes into a single global namespace. It is POSIX compatible, can scale to several petabytes, handle thousands of clients, and use any on-disk filesystem with the support for extended attributes. Also, GlusterFS provides network connectivity using industry standard protocols like SMB and NFS, supports replication, snapshots, bitrot detection, etc.

GlusterFS architecture

 

                                                                                    Source: GlusterFS documentation

Kubernetes supports glusterfs  volume plugin that allows GlusterFS volumes to be mounted into your Pods. GlusterFS volumes are persistent, which means that data is preserved if the volume is detached. This allows pre-populating them with data.

In order to use GlusterFS in Kubernetes, users should have:

  • a working GlusterFS server cluster.
  • a GlusterFS volume.
  • GlusterFS endpoints defined in Kubernetes. These endpoints should be populated with the addresses of the nodes in the GlusterFS cluster.

If these prerequisites are met, you can mount a GlusterFS volume to your Pod using the specs below:

For more information about GlusterFS volumes in Kubernetes, consult this article.

ScaleIO

ScaleIO is the SDS system designed by Dell EMC. It creates a storage area network (SAN) from local server Direct-attached Storage (DAS) using existing customer hardware. The system has support for physical, virtual or cloud servers using any storage type, including disk drives (HDD, SSD), flash drives, cloud volumes, flash, etc. It can scale fast from 3 storage nodes to over 1,000 nodes and drive up to 240 million IOPS.

ScaleIO interacts with the local storage by installing its software tools on each application hosts. In their turn, these hosts market their DAS to the ScaleIO cluster. After the storage capacity is contributed to the cluster, hosts can use software-defined volumes via the ScaleIO API. The storage consumption is managed by the ScaleIO Data Client (SDC), a compact device driver located on each host that needs access to the ScaleIO cluster. The SDCs have a small in-memory map that can map petabytes of data with just several megabytes of RAM. Besides storage pooling, ScaleIO supports data recovery, data protection, replication, backups, and thin provisioning.

In Kubernetes, the scaleIO  volume plugin allows Pods to access existing ScaleIO volumes. The plugin also supports dynamic volume provisioning with the scaleIO  StorageClass and corresponding Persistent Volume Claims (PVCs). To use ScaleIO in Kubernetes, you need to have a ScaleIO cluster deployed and connected to Kubernetes and pre-provisioned ScaleIO volumes if you don’t use dynamic storage provisioning.

Example:

For more information about using scaleIO volumes in Kubernetes, read this article.

Quobyte

Quobyte is a software-defined storage solution and distributed file system optimized for data centers. It works with HDDs, SSDs, and NVMs devices, and it supports block (Cinder), object (S3) storage, and Hadoop, among others.

 

Quobyte architecture

 Source: Quobyte website

Quobyte has the following features:

  • Linear scalability. Doubling the node count doubles the storage cluster performance.
  • Unified storage. Quobyte allows multiple clients using different filesystems and access protocols simultaneously working on the same file. For example, a Windows user can be editing a video while a Mac user is watching the same file. There is no need to copy this video file to another system. Unified storage can benefit environments where data needs to be transferred between different OSs like Linux, Mac, or Windows.
  • Self-monitoring and self-healing capabilities. Quobyte monitors the state of the storage cluster and intervenes if something goes wrong.
  • Data backup and recovery. Quobyte supports volume mirroring for automatic backups of volumes in the cluster.
  • Thin provisioning. Quobyte supports thin provisioning — a storage management paradigm that operates by allocating storage resources among multiple users in a flexible manner based on the current user’s need in storage space.
  • Support for various storage access methods. Quobyte supports POSIX, NFS, S3, SMB, and Hadoop file access methods.
  • Wide support of hardware devices. Quobyte works well with HDDs, SSDs, and NVMe devices.
  • Efficient hardware management. Quobyte supports automatic detection and repair of corrupted data and disks. The corrupted data detection is managed by the smart monitoring layer and hardware watchdogs.

Kubernetes has a built-in quobyte  volume type that can be mounted to the Pod like this:

For more information about Quobyte plugin in Kubernetes, you can read this article.

Portworx

Portworx is the SDS that aggregates available storage attached to worker nodes and creates a unified persistent storage layer specifically optimized for containerized databases or other stateful apps. The SDS supports both VMs and bare metal servers and has a limit of 1000 nodes per cluster.

What sets apart Portworx from other SDS systems is the deep integration with the Kubernetes native scheduling. Portworx ships with the built-in storage orchestrator for Kubernetes, STORK (STorage Orchestrator Runtime for Kubernetes). Released in the early-2018, Stork supports storage-aware scheduling via Kubernetes to ensure the optimal placement of volumes in the cluster. In essence, STORK extends the native Kubernetes scheduler to provide container-data hyperconvergence, storage health monitoring, snapshot-lifecycle management, and failure-domain awareness for stateful applications running in Kubernetes.

One of the best features of Portworx in Kubernetes is hyperconvergence. The thing is that state stateful apps like Elasticsearch and Cassandra perform best when run in close proximity with their data. However, Kubernetes volume plugin system does not support primitives that can be used to optimize location of pods. You can use labels and node affinity to get around these issues, but this introduces overhead when scaling to large clusters. Stork overcomes this limitation by implementing a Kubernetes scheduler extender. This feature can be used to influence pod scheduling based on the location of volumes that a pod requires.

In Kubernetes, Portworx volumes can be mounted using the portworxVolume  plugin. A portworxVolume  can be dynamically created using a StorageClass or it can be pre-provisioned and referenced inside a Kubernetes Pod. Here is an example Pod using the portworxVolume :

Conclusion

In this article, we have discussed the architecture of Software-Defined Storage and reviewed key SDS solutions for Kubernetes. SDS is a very efficient solution for distributed compute environments dependent on diverse storage types and filesystems. It’s also a good option for containerized applications that require dynamic storage provisioning, instant storage scaling, and HA across different availability zones and server types. In the next blog, we’ll discuss how to use these and other features of SDS in Kubernetes using Portworx as an example. Stay tuned to the Supergiant blog to learn more!