ALL THINGS KUBERNETES

How to Use Portworx Software-Defined Storage in Your Kubernetes Cluster

In a previous tutorial, we discussed the architecture and key features of software-defined storage (SDS) systems and reviewed key SDS solutions for Kubernetes. We’ll now show you how to add SDS functionality to your Kubernetes cluster using Portworx SDS.

This post is organized as follows. In the first part, we discuss basic features and benefits of Portworx SDS. Next, we walk you through the process of deploying Portworx to your K8s cluster, creating Portworx volumes, and using them with stateful applications running in your cluster. Let’s get started!

What Is Portworx?

Portworx is the SDS system optimized for container environments and container orchestrators like Kubernetes and Mesos. It has all the benefits of traditional SDS such as storage virtualization and pooling.

What sets Portworx apart from the rest of SDS systems is its deep integration with the container environment and awareness of the orchestrator’s native scheduling functionality. This makes Portworx an excellent storage solution for applications running in your K8s cluster.

Below are some other reasons why Portworx fits well into your Kubernetes cluster:

  • Portworx creates a storage pool that can be tiered across class-of-service, availability zones, and IOPs. Portworx can design various storage tiers for your stateful applications based on the required performance, IOPs, storage size, file system type, availability zone, and other parameters. Storage tiering introduces additional cost savings derived from placing workloads on the most cost-efficient storage.
  • Container-granular replication. Portworx can guarantee storage backups for all containers with mounted Portworx volumes. It ensures that storage used by containers is replicated across availability zones and nodes in your cluster.
  • Support for storage-aware orchestration that extends native scheduling of the K8s. Portworx introduced the STORK (STorage Orchestrator Runtime for Kubernetes) add-on in early 2018. STORK implements storage-aware scheduling that extends Kubernetes scheduler to ensure the optimal placement of volumes in the cluster. The component offers container-data hyperconvergence, storage health monitoring, snapshot-lifecycle features, and failure-domain awareness for applications in your cluster. Thanks to STORK, Portworx volumes can be placed on the most secure, healthy, and performant nodes and can be co-located with applications that use Portworx storage.
  • Data security. Portworx provides secure, key-managed encryption for container volumes. Its encryption component integrates well with the popular key management systems such as Hashicorp Vault and AWS KMS. Also, with Portworx, you can implement access control policies for volumes and data in your stateful applications.
  • Storage Health Monitoring. Portworx scans hard drives for media errors and tries to repair broken drivers and volumes. If the volume can’t be repaired, Portworx can automatically attach a new volume. This solves a problem when containers continue to run unaware of disk errors. Even though the application running in the container can’t write to the volume, users and admins regard them as healthy. Portworx media error detection system addresses this problem.

All these features make Portworx a good SDS solution for your Kubernetes cluster.

We’ll now show you how to deploy Portworx to Kubernetes and use Dynamic Volume Provisioning to mount Portworx volumes to applications in your K8s cluster.

Tutorial

To complete examples used in this tutorial, the following prerequisites are required:

  • A Kubernetes 1.11.6 cluster deployed on AWS with Kops. We tested Portworx deployment in the K8s cluster deployed with Kops on AWS. To reproduce all steps of this tutorial, you’ll need a running Kops cluster. Here is a detailed guide for deploying a K8s cluster on AWS with Kops.
  • AWS CLI tools for managing the AWS cluster. Read this guide to install the AWS CLI.
  • A kubectl command line tool installed and configured to communicate with the cluster. See how to install kubectl here.

Portworx Requirements

To successfully run Portworx, a worker node should have at minimum:

  • 4 CPU cores
  • 4GB memory
  • 128GB of raw unformatted storage
  • 10Gbps network speed

Note: Most Portworx SDS services are paid, but you can deploy Portworx on Kubernetes using a 31-days trial.

Step #1: Granting AWS Permissions to Portworx

First, we need to create an AWS IAM user for Portworx with a custom policy allowing Portworx to manage EBS volumes in your AWS cluster. We need to create the following IAM user policy:

You can do this with the AWS CLI or directly from your AWS console. Let’s do it the easiest way: using the AWS console.

  1. Sign in to the AWS Management Console and open the IAM console.
  2. Chose “Users” and then “Add user” in the navigation menu.
  3. Select a name for the Portworx user. For example, “Portworx.
  4. Select the access type for your Portworx user. We need to select the “Programmatic access” because Portworx will be using AWS API to manage volumes.

Add Portworx user

5. Next, we need to create a policy for the Portworx user. If you have an existing policy with the permissions specified above, you can reuse it. Here we’re going to create a new policy. Click “Create policy” to open a new browser tab with the editable JSON file. Paste the policy definition we provided above into this window.

Portworx EBS policy

6. Give the name to a new policy, and finalize the process (see the image below)

Portworx policy created     7. Now, we can attach a new policy to our Portworx user. Click “Attach existing policies directly” and select the policy (our policy is named porworx-ebs-policy).

Portworx EBS policy attached     8. Finally, we can activate the Portworx user. Don’t forget to download and save the Access key and the Secret Key generated for your Portworx user. These credentials should be granted to Portworx later to manage your EBS volumes.

For more information about creating a new IAM user, see the official AWS docs.

Step #2: Preparing Portworx DaemonSet for Kubernetes

Portworx website features a Kubernetes spec generator that helps you optimize Portworx DaemonSet for your Kubernetes environment (click on “Generating the Portworx specs” on this page to access the generator). This is quite useful because storage and networking options may differ across different cloud providers and environments, and users don’t necessarily know all the details.

Using the Portworx’s Kubernetes spec generator we can configure Portworx etcd , storage type and size, network, and several other settings. Let’s get started!

Portworx Spec Generator

 

First, we need to provide the K8s version used by our cluster. You can find it by running:

Configuring ETCD

Portworx requires an etcd  cluster to maintain its metadata and cluster state. You can choose among the following options for etcd:

  • Using your own etcd  cluster. In this scenario, you should point Portworx to the existing etcd  cluster endpoint (e.g.,  etcd-1.com.net:2379 ).
  • A Portworx-hosted etcd . With this option selected, Portworx will use its own hosted etcd  cluster. However, this option is not recommended for production.
  • Built-in cluster. In this case, Portworx will create and manage an internal key-value store ( kvdb ) cluster. Users can restrict the built-in etcd  to certain nodes by attaching the label px/metadata-node=true  to them. Only the nodes with this label will be able to participate in the kvdb  cluster.

We don’t have our own etcd  cluster and don’t want to host it, so we chose the built-in cluster option. Click “Next” to go to the “Storage” configuration page.

Portworx spec generator:storage

 

Here, we have to select the environment in which our Kubernetes cluster is running. Because we’ve deployed our K8s cluster on AWS with Kops, first select “Cloud” and then “AWS.” This will open the AWS configuration dialogue (see the image below).

Portworx spec generator: AWS config

 

Here, we can configure AWS storage devices. The wizard recommends using the Portworx Auto-Scaling Group (ASG) feature that allows Portworx to manage the entire lifecycle of EBS volume storage. This feature is available if your EC2 instances are part of the ASG (Auto-Scaling Groups). Under the hood, Kops uses the AWS Auto-Scaling Groups, so we can go ahead with this feature and select “Create Using a Spec.”

Note: an AWS Auto-Scaling Group is a collection of EC2 instances treated as a logical grouping for scaling and management purposes. AWS EC2 Auto-Scaling is needed for Portworx to dynamically provision EBS volumes, create snapshots, etc.

Therefore, if we choose the “Create Using a Spec” option, Portworx will create its own EBS volumes. We selected a GP2 (General Purpose) Volume type with 30GB of storage. You can add as many volumes as you want. Optionally, you can specify “Max Storage nodes per availability zone.” If the value is specified, Portworx will ensure that many storage nodes exist in each AZ.

Because we’ve specified a built-in KVDB option in the previous section, it is recommended to allocate a separate device for storing internal KVDB metadata for production clusters. This allows separating metadata I/O from storage I/O (see the image below).

Portworx spec generator: metadata device

 

Please note that the minimum size of the Metadata Device is 64GB. Once all edits were made, click “Next” to go to the “Network” configuration.

Portworx spec generator: network

 

We don’t want to change anything here. Just leave the “auto” parameter for the Data Network Interface and Management Network Interface for Portworx to use its networking defaults.

Final Step: Customize

We need to add several finishing touches to prepare our Portworx installation: specify environmental variables, registry and image settings, and some advanced settings.

Portworx spec generator: customize

 

We should pay particular attention to the “Environment Variables” tab inside this dialogue.

 

Portworx creds

As you remember, Portworx needs access to the AWS API to manage EBS volumes. We have created a Portworx user with the Access Key and Secret and policy to manage EBS volumes. Now we have to provide these credentials to the Portworx DaemonSet  as environmental variables. The AWS Access Key will be stored in the AWS_ACCESS_KEY_ID  variable, and the AWS Secret will be stored in the AWS_SECRET_ACCESS_KEY .

If you are planning to use any custom container registry with Portworx, you can specify this registry in the “Registry and Image Settings” (see the image below).

Portworx spec generator: customize container registry

 

Finally, in the “Advanced” settings, we can choose to enable/disable Stork, GUI, and monitoring for your Portworx cluster. We recommend using all the suggested options here. For example, you’ll need Stork for storage-aware placement of Pods, health monitoring, and data-application hyperconvergence.

Portworx spec generator: advanced settings

 

That’s it! Now, click “Finish,” and the wizard will generate the Portworx DaemonSet  spec for Kubernetes.

 

Portworx: spec generated

The spec is lengthy so just copy the URL displayed in the window. You can see that AWS credentials required by Portworx were added to the URL as parameters.

Step #3: Deploy Portworx to your Kubernetes Cluster

Now, let’s use the spec generated by the wizard to deploy Portworx to your Kubernetes cluster. Run kubectl apply  with the spec’s URL as the -f  value:

It will take some time for Portworx to create KVDB and data volumes we specified in the previous step. Portworx will create a data and metadata volume for each node in your cluster. Access your AWS console to verify this:

Portworx volumes: AWS console

Also, let’s verify that Portworx has successfully launched Pods. Because we’ve deployed Portworx as a DaemonSet , it had to launch one Pod per node in your cluster:

Step #4: Using Portworx

Now that you have successfully deployed Portworx to the cluster, let’s learn how to use it. To manage Portworx, we can use the pxctl  tool available on every node where Portworx is running. You can access the Portworx CLI inside the container at /opt/pwx/bin/pxctl  or directly on the host.

First, let’s use the CLI to check the Portworx cluster status. Below, we first save the Portworx Pod name to PX_POD  shell variable for later reuse and then get a shell to the Portworx container to run pxctl status  command:

The status details above tell us that the PX cluster is operational. The total capacity of the storage cluster is 60GiB and 3.5 GiB have been used so far.

To use storage capacity allocated to Portworx, you should create a Portworx volume and expose it to your Pod. It can be done through manual pre-provisioning or using Kubernetes dynamic volume provisioning.

Pre-provisioning a Portworx Volume

You can pre-provision a Portworx volume using the pxctl  tool. To access the tool, you can either get a shell to one of the PX Pods as we did in the example above or ssh  to one of the nodes in your K8s cluster and access pxctl  directly at /opt/pwx/bin/pxctl  of your instance. Below is the example of the second option:

Here, we used pxctl volume create command  that has the following format: pxctl volume create [command options] volume-name . This command creates a 5GB volume named “test-disk” with the ext4  file system and two copies across the cluster. We specified the number of copies to create using the --repl  argument. Please check the official pxctl CLI reference for more information about this command.

Dynamic Volume Provisioning

With the DVP, you don’t need to pre-provision Portworx volumes before using them in your applications. Cluster administrators can create Storage Classes that define different classes of Portworx Volumes offered in the cluster. Thereafter, applications can request dynamic provisioning of these volumes.

Below is the example of the Portworx StorageClass :

Please note that we need to specify kubernetes.io/portworx-volume  as the storage provisioner to link this StorageClass  to Portworx.

Also, in the parameters  field, we specified the Portworx volume parameters for provisioning volumes. Let’s briefly describe them:

  • repl  — a number of Portworx volume replicas.
  • fs  — a filesystem type used by the volume.
  • shared  — a Boolean flag to create a globally shared volume that can be used by multiple Pods. It is useful when you want multiple Pods to access the same volume at the same time even if the Pods are running on different hosts.
  • sticky  — “sticky” volumes cannot be deleted until the “sticky” flag is disabled.
  • snap_schedule  — this parameter defines a snapshot schedule for PX volumes (PX 1.3 and higher). The following formats are accepted: periodic=mins,snaps-to-keep , daily=hh:mm,snaps-to-keep ,
    weekly=weekday@hh:mm,snaps-to-keep , and monthly=day@hh:mm,snaps-to-keep . We used a periodic snap schedule with a period of 60 minutes and 10 snaps to keep.

You can find an exhaustive list of all available parameters for the Portworx volume in this article.

Step #5: Deploying PostgreSQL with Portworx

In what follows, we’ll demonstrate how to dynamically provision Portworx volumes for PostgreSQL database in Kubernetes.

As in the example above, let’s first create a StorageClass  for Portworx volumes. We’ll use a simple spec with just a few necessary parameters. This StorageClass  allows Portworx volumes to be shared between Pods.

Save this spec to postgresql.yml  and run:

Verify that the StorageClass  was created:

Next, we need to create a Persistent Volume Claim (PVC) that requests Portworx provisioner to dynamically provision the volume type specified in the StorageClass . The PVC spec may look something like this:

Please, note that we need to set the spec.accessModes  of this PVC to ReadWriteMany  to allow mounting this PVC to multiple Pods.

Save this spec to postgres-pvc.yml  and create the PVC:

Let’s verify that the PVC has successfully provisioned the volume using kubernetes.io/portworx-volume  provisioner:

As the events section suggests, the Portworx Volume storage provisioner has successfully provisioned the volume and bound our PVC to it.

Now, we are ready to use the Portworx “shared” volume in the PostgreSQL deployment. But before doing this, let’s inspect the Portworx volume we’ve just created:

As we’ve requested, the PX volume has 3GiB of memory and two replicas spread across two nodes of our cluster. However, the volume is currently detached, as the “State” section indicates. Let’s change this by deploying PostgreSQL. Portworx documentation recommends doing this using STORK. As you remember, we included STORK during Portworx spec generation, so it was automatically deployed to our K8s cluster.

To deploy PostgreSQL, you’ll need to define the following environment variables for security credentials:

  • POSTGRES_USER  —  PostgreSQL username.
  • POSTGRES_PASSWORD  — PostgreSQL password
  • PGDATA  — Data Directory for PostgreSQL Database.

And the Deployment Spec looks something like this:

As you see, we defined a volume with the PVC created above. This PVC will mount the requested Portworx volume to the PostgreSQL container at /var/lib/postgresql/data .

Now, save this spec to postgres-deploy.yml  and create the Deployment:

It can take some time for the Deployment controller to start the PostgreSQL Pod and attach the Portworx volume to it. Wait some time and then inspect the PX volume again:

As you see, the volume is now attached and used by the PostgreSQL consumer. As the “Replication Status” section suggests, two volume replicas have been created and are up. So far, everything worked out as we expected!

Remember that we created a shared Portworx Volume, right? Let’s verify that two different applications can use the same Portworx volume by attaching it to another Pod:

As you see, this Pod uses the same PVC as the PostgreSQL deployment. Save this manifest to pod2.yml  and run:

Now, inspect the Portworx volume again:

As you see, the volume is now shared by two consumers: a new “pod2” consumer and PostgreSQL database. As simple as that!

Conclusion

In this tutorial, you learned how to deploy Portworx to Kubernetes and use it to dynamically provision “shared” volumes for applications running in your Kubernetes cluster. With Portworx, you can create multiple volume replicas, set snapshot policies, and make volumes shareable between applications. In this article, we just scratched the surface of Portworx features in Kubernetes. Portworx SDS can be used to perform a number of storage management tasks such as volume migration, snapshotting, monitoring, disaster recovery, and co-location with the application data (hyperconvergence). We’ll show you how to use these Portworx features in Kubernetes in subsequent tutorials. Stay tuned to the Supergiant blog to find out more!