ALL THINGS KUBERNETES

Advanced Storage Management for Your Kubernetes Stateful Apps with Portworx

In the previous tutorial, we learned how to deploy Portworx to your Kubernetes cluster, create Portworx volumes with a specific number of replicas, filesystem, and access parameters, and use them in your Kubernetes stateful applications.

In this blog, we’ll demonstrate how Portworx automatic replication ensures HIgh Availability and data integrity using a failover scenario with the MySQL deployment. We’ll also show you a simple way to implement hyperconvergence of the Kubernetes application with its data and fast restoration of volumes using Portworx manual and periodic snapshots via STORK. By the end of this article, you’ll get a better understanding of how Portworx provides the HA, data integrity, and disaster recovery for your stateful applications in Kubernetes. Let’s get started!

Drawbacks of Data Failover using Conventional Methods

It goes without saying that stateful applications require persistent storage to ensure the integrity of data across application restarts, migrations, and node disasters. However, a node’s local storage does not provide a secure option for data persistence. Because local storage is attached to individual nodes, if the node dies we could lose all the data. The alternative solution is to use some kind of remote block storage like Amazon EBS. Fortunately, cloud volumes like AWSElasticBlockStore , AzureDisk  or GCEPersistentDisk  are natively supported in Kubernetes. However, cloud-based storage solutions have certain limitations. The problem with the cloud-based persistent storage in Kubernetes is that such volumes as Amazon EBS need to be always detached and re-attached to a new Pod when the latter is rescheduled. This process takes a significant amount of time and may be error-prone. The thing is that when a node dies Kubernetes would need to do the following:

  • re-schedule stateful containers to a healthy node.
  • detach the EBS or Azure volumes from the old node.
  • attach these volumes to a new node.
  • mount the volume(s) to a rescheduled container on a new node.

However, this process may fail at each of the above stages. For example:

  • the API call to AWS may fail.
  • the EBS volume may not be detached from the old node for some reason.
  • the volume may not be attached to a new node because the latter already has too many EBS volumes attached. This is due to the fact that a number of block volumes that can be attached to a server is limited in most operating systems. For example, Linux-based AWS instances set this limit to 40 block volumes.

Even if the volume re-attachment is done, the above-described failover mechanism is not that fast because too many API calls and intermediate steps are involved. Finally, using raw cloud-based block storage is not cost-efficient because there is no thin provisioning. You pay for all storage you bought no matter how much of it is actually in use.

How Does Portworx Solve This Problem?

Portworx pools all the underlying cloud drives into the data layer and provides storage available on them as virtual slices on demand using thin provisioning. Portworx needs to create, attach, mount, and format the cloud-based drives only once after which they join the storage pool available on demand. Because Portworx decouples the underlying storage from the container volumes, thousands of containers could be launched using the same number of cloud drives. Thus, there is no longer the one-to-one relationship between volumes and containers. Moreover, we also avoid the block devices limit problem mentioned above. Since the storage is virtualized, we can have hundreds of containers per host, each with their own volume.

In addition, Portworx automatic data replication and hyperconvergence makes failover scenario much easier and faster. With Portworx SDS, Kubernetes would need to do the following if the node dies:

  • reschedule the stateful container to a healthy node.
  • start the container on the new node using a Portworx volume replica that already exists on the new host. This feature is known as hyperconvergence. It ensures that container is scheduled to a node that has a copy of data required by the container. Because your application is co-located with its data you get lower latency and better HA for your stateful app. Also, since Portworx has a replica of the volume, there is no need to detach and re-attach persistent volumes anymore. In Portworx, hyperconvergence is implemented with STorage Orchestrator for Kubernetes — STORK (see the discussion later).

However, how does Portworx ensure that the volume replicas are up-to-date? Portworx uses synchronous replication of data where each write is synchronously replicated to a quorum set of nodes. Every time an instance of your application writes data to a volume, this write is replicated across all volume replicas.

Thus, if the drive fails you always have the latest acknowledged writes on your volume replicas. In addition, Portworx can be set up to take periodic snapshots of your data or you can take manual snapshots and use them to restore the database if it is corrupted.

In what follows, we’ll illustrate some failover scenarios with your stateful apps in Kubernetes and show how to use Portworx backups, hyperconvergence, and snapshots to ensure that your data is always intact and available.

Tutorial

To complete examples used in this tutorial, the following prerequisites are required:

  • a Kubernetes 1.11.6 cluster deployed on AWS with Kops. We tested Portworx deployment in the K8s cluster deployed with Kops on AWS. Thus, to reproduce all steps of this tutorial, you’ll need a running Kops cluster. Here is a detailed guide for deploying a K8s cluster on AWS with Kops.
  • AWS CLI tools for managing the AWS cluster. Read this guide to install the AWS CLI.
  • A kubectl command line tool installed and configured to communicate with the cluster. See how to install kubectl here.
  • Portworx deployed to your Kubernetes cluster. Read our previous tutorial to learn how to do this.

Now that you have Portworx deployed to your Kubernetes cluster, we’ll first demonstrate how Portworx synchronous replication and hyperconvergence ensure automatic and fast failover mechanism for your MySQL database.

The first thing we need to do is to create a StorageClass  for Portworx that will be used for dynamic volume provisioning.

The manifest for the StorageClass  looks as follows:

This StorageClass  will dynamically provision two-replica volumes with high IO priority. The volumes will also have a periodic snap schedule (every 20 minutes) and can be shared by multiple Pods (multi-write volumes).

Save this manifest to px-sc.yaml  and create the StorageClass:

The next step is to define the Persistent Volume Claim (PVC) that can request storage capacity from the Portworx StorageClass :

The PVC must have spec.accessModes  set to ReadWriteMany  to make the volume shareable across Pods. The PVC will request 3Gi of storage from our StorageClass .

Go ahead and create the PVC with the following command:

You can verify that the persistentvolume-controller  has successfully provisioned the new volume named pvc-451ef100-2eaa-11e9-b411-06d762ec793e  using the Storage Class created above:

We’re going to use the Portworx volume in our MySQL deployment. We’ll create a MySQL Service and Deployment manifests below:

As you see, we defined a volume mysql-persistent-storage  using the PVC created above and mounted it to the MySQL container at its data path. Also, the Deployment will use STORK for storage-aware scheduling. To use Stork, you should ensure that it was deployed along with Portworx and set schedulderName  to stork  in your Deployment’s PodSpec . If you installed Portworx by following our tutorial you already have STORK running. Once the STORK is set as a scheduler for your Deployment, it will ensure that the Pods managed by the Deployment will be always scheduled onto nodes with the available Portworx replicas. This feature is known as hyperconvergence. You’ll see it in action later in the article.

Finally, take note of the MYSQL_USER  and MYSQL_ROOT_PASSWORD  that will be used to access MySQL database inside the container.

Now that you understand what this manifest does, let’s create the Deployment and the Service

And verify that Pods have been started:

Next, let’s check what happens when the MySQL database with the data written to it is deleted from the node. First, let’s create a database named FAILOVER_TEST  and write some data to it.

Find and save the name of your MySQL Pod to the bash variable:

Get a shell to the running MySQL container:

Log in to MySQL:

Create a new database:

Change the context to the new database:

Create a new table titled “tests”:

Insert a row into this table:

Finally, verify that the data was saved:

Great! We have data saved in the MySQL database. We can now exit the shell and proceed to the next step. Now we’ll make the node where the MySQL Pod is running unschedulable and delete the Pod from it. First, let’s find the node where our MySQL Pod is running:

Next, cordon the node to make it unschedulable:

Cordoning the node does not affect the Pods running on it so we have to manually delete the MySQL Pod:

Next, verify that Pod was re-scheduled onto another node:

As you see, the Pod was successfully rescheduled to a new node. Let’s check if the data written to the MySQL database is intact.

Awesome, you can see the FAILOVER_TEST  database created earlier. Thus, Portworx automatic replication and hyperconvergence work. The Portworx volume mounted to the re-scheduled Pod has the same data as the original volume we had on the cordoned node. Also, Portworx ensured that the Pod was re-scheduled to a node with a local copy of data. This way our MySQL container accesses its data volume much faster. Also, since Portworx was not attaching block devices to different nodes – the total time to failover was much less.

If you want to uncordon the node again you can run:

Recovering from Database Failure or Corruption Using Snapshots

In this example, we will take a snapshot, destroy our MySQL database, and recover the data from the snapshot. We are going to use STORK local snapshots which are the snapshots stored locally in the Portworx cluster’s storage pool. Alternatively, we can use periodic snapshot created by our StorageClass  snapshot schedule.

We can create a snapshot of our MySQL volume by using the VolumeSnapshot  spec:

This volume snapshot references pre  and post – snapshot rules we created to ensure that the snapshot is consistent. These rules are referenced in the annotations.

The pre-snapshot rule forces MySQL to flush all pending writes to disk and locks the entire MySQL databases to prevent additional writes until the snapshot is done. We define this rule in the separate manifest and enable it:

Create the pre-snap rule:

The post-snapshot rule releases this lock with the UNLOCK TABLES  MySQL command. This rule will run after the snapshot is made:

Create the post-snap rule:

Now that the rules are enabled, we can go ahead and create the VolumeSnapshot  object:

Once you create the above object you can check the status of the snapshots using kubectl :

Also, check the creation of the volumesnapshotdatas  object:

It indicates that the snapshot has been created. If you describe the volumesnapshotdatas  object you can find out the Portworx Volume Snapshot ID and the PVC for which the snapshot was created:

Great, we have a MySQL volume snapshot at our disposal! Now, let’s do something as crazy as deleting our MySQL database:

We have deleted the database we created earlier and all the data was gone. Hmm, let’s try to restore it!

Restoring the Database from the Snapshot

Snapshots are just like Portworx volumes so we can use them to start a new MySQL instance. First, we need to create a new PVC from the snapshot using the stork-snapshot-sc Storage Class automatically enabled by STORK upon its deployment. We’ll need to add the snapshot.alpha.kubernetes.io/snapshot  annotation referring to the snapshot name in the PVC manifest and set the storageClassName  to the Stork StorageClass  stork-snapshot-sc  as in the example below:

Save this manifest to mysql-snap-pvc.yaml  and create the PVC:

Next, let’s deploy a new MySQL instance with a new PVC. We can use a similar Deployment spec as above with slight modifications:

Save this spec to mysql-clone.yaml  and create the new Deployment:

Finally, verify that the data is available:

Great, your data is there again! As you see, STORK makes recovering databases very easy.

Using Periodic Snapshot for Recovery

As you remember, our Portworx StorageClass  uses periodic snapshot schedule. Portworx automatically creates a new snapshot every 20 minutes and keeps 3 recent snapshot versions in the storage cluster. We can use these snapshots instead of VolumeSnapshots  for recovery.

To check what periodic snapshots are available, run pxctl  from inside one of the Portworx containers. All scheduled snapshots can be listed using --snapshot-schedule  option.

Inspect the snapshot:

Once the snapshot is ready, you can use it to restore the Portworx volume in the same manner we did with the VolumeSnapshots  in the example above.

Conclusion

In this tutorial, you learned how Portworx and STORK ensure faster failover and data recovery for stateful applications in your cluster. Automatic synchronous replication and data pooling make stateful containers rescheduled from the failed node to instantly have access to the most recent replica of their data. Portworx users should not worry about volumes stuck during the attachment stage. Moreover, Portworx’s hyperconvergence feature ensures that the stateful Pod always lands on the node that has a local copy of data. This guarantees low latency and faster access of users to your stateful applications. Stay tuned to the Supergiant blog to learn about other advanced usages of Portworx and other SDS in Kubernetes.