ALL THINGS KUBERNETES

Transitioning to Kubernetes: Major Mistakes and Solutions

Kubernetes is currently the first choice among companies seeking to adopt container orchestration — and with good reason. Kubernetes offers automation, support for multiple workloads, container runtimes, and deployment patterns, portability, and deep integration with cloud-native and container standards (e.g., Container Storage Interface).

However, many companies seeking to adopt Kubernetes fail to recognize important prerequisites and challenges inherent in transitioning to Kubernetes. As a result, they make many mistakes and end up with the cluster setup that falls short of their expectations.

One of the most widespread mistakes is lack of planning and research. It’s wrong to believe that you can adopt Kubernetes without developing strong expertise in the cloud-native ecosystem, making necessary organizational changes, and preparing for multiple challenges along the way.

Fortunately, the Kubernetes community has accumulated a lot of knowledge and experience to help companies avoid these mistakes and make a smooth transition. In this article, we’ll discuss some of the biggest mistakes made by companies adopting Kubernetes and suggest best practices and solutions to facilitate a smoother transition to Kubernetes. You’ll be able to save time and other resources by applying advice put forward in this article.

Major Mistakes

It’s hard to cover all possible mistakes made by companies adopting Kubernetes. We decided to focus on the major mistakes that can compromise Kubernetes migration and lead to inefficient K8s clusters and applications. In general, all these mistakes fall into the following categories: failure to plan, failure to change, failure to follow best practices. Here’s the full list of mistakes discussed in this article.

  • not having the right level of expertise
  • inefficient migration of a monolith to microservices
  • buying too much server power
  • failing to plan for stateful apps
  • limited High Availability
  • lack of monitoring
  • weak cluster security
  • failing to transform the code development process
  • inefficient resource management
  • inefficient database management

Not Having the Right Level of Expertise

Transitioning to Kubernetes is more than just spinning up a cluster and deploying your application on it. That may be enough for development, but it will not suffice for production.

Kubernetes is a system of different components, orchestration services, and network layers that should be fit together to create a production-grade cluster.

Kube-apiserver, kube-controller-manager, etcd, kube-proxy, and other components should be configured in a way that meets your specific operational and business needs. This involves a lot of cluster design decisions based on strong Kubernetes expertise, as well as the entire ecosystem of cloud native tools.

A DevOps team that migrates your workloads to Kubernetes should make the following decisions aligned with your company’s budget and business needs:

  • Where to run K8s: on-premises? or in the cloud?
  • What infrastructure to use: bare metal? or virtual servers?
  • How many redundant masters, etcd servers, and master components do you need? How: many failures can your system tolerate?
  • How should networking be configured?
  • What type of storage (network storage, block storage, software-defined storage) do you need?
  • and more…..

Without profound K8s expertise, you can get lost along the way to Kubernetes success. To avoid rookie mistakes, you may need to develop K8s expertise in-house or, if you are short of time, reach to third-party Kubernetes support services.

(If you are interested in building a stellar team of Kubernetes experts, check out this talk given by two members of our Kubernetes support team, Clarke Vennerbeck and Aaron Teague, at KubeCon + CloudNativeCon Barcelona in May 2019.)

Developing powerful K8s expertise is not just about learning the platform, but also reformatting your organization’s IT processes. To reap the benefits of K8s, you should also plan for improvements to your Ci/CD, automated testing, app development process, and more. Your entire IT department should become closely aligned with the new cloud native standards you seek to embrace.

Inefficient Migration of a Monolith to Microservices

Many companies want to migrate their large monolithic applications to microservices using Kubernetes. Indeed, migration to microservices offers many benefits such as faster release cycle, granular feature updates, smaller and more efficient teams, and more.

However, many companies erroneously assume that migrating to microservices is a straightforward process of just splitting a monolith into multiple microservices for each feature, packaging them into containers, and moving over to Kubernetes. The reality is more complex than that. In fact, treating the monolith-to-microservices transition this way may cause serious problems to your application.

Migrating monolith to microservices is a complex process that requires serious planning, research, and fact-driven design decisions. Here are the steps:

Understand the monolith. Before migrating to microservices, you should have a clear understanding of the monolith’s architecture, structure, and the code-base. You should have a bird’s eye view of the monolith’s major API endpoints and how each request per endpoint is processed by the system. You also need a comprehensive knowledge of all features, processes, and services of your monolithic app.

Identify dependencies. Knowing dependencies is important for maintaining your application operational after migration. It may be necessary to break the monolith’s code base into smaller services and detect any dependencies to external services, tools, databases, message queues, etc.

This will help you understand what additional software (e.g., databases, message queues) need to be packaged into containers or attached externally. If your monolith’s code is modular, this may help. In any case, identifying dependencies may take serious effort on the part of your team, so it should be planned.

Analyze Endpoints. It’s a good practice to focus first on the top endpoints in terms of CPU consumption and throughput. These should be extracted from the monolith and optimized first. This process is more efficient than taking random guesses on which parts of the monolith to convert into microservices.

Containerization. The next step is to containerize the selected endpoints and services with all their dependencies.

Migrate Feature by Feature. It’s advisable to migrate your monolith feature by feature so you can assess how it works gradually and prevent major disruptions.

Validate Migration. The last step is to validate that the migrated version works as good or better than the monolith. You can run various performance tests to this end. Also, pay attention to the fact that containers are ephemeral and the migrated app may work despite the data loss or corruption. That’s where monitoring becomes important.

Buying Too Much Server Power

Some companies make excessive upfront investments into infrastructure (e.g., AWS reserved instances, physical servers), anticipating that Kubernetes will require a lot of resources.

While this may be the case, you will be able to easily scale up your cluster after migration, so you don’t need much upfront investment. To avoid over-provisioning of the infrastructure you can:

  • Benchmark the resource usage by your application(s) and optimize the cluster size accordingly (e.g., decreasing the number of nodes to the peak load scenario).
  • Define requests and limits for containers so that they don’t seize resources of other containers.
  • Use Horizontal Pod Autoscaler (HPA) to autoscale your Pods based on the CPU consumption or some custom metrics.

Failure to Plan for Stateful Apps

Kubernetes has great support for stateful apps and persistent storage, and these features should be configured before running your application in production.

Before migrating to Kubernetes, you should have a clear plan for stateful apps and storage. In particular, you should have answers to the following questions:

  • What storage type do you need? Network storage? block storage? etc.
  • Is this storage type supported by K8s in-tree volume plugins? If not, are there any CSI or FlexVolume plugins for it? If there are no K8s plugins for your storage, consider using another storage solution or developing a plugin.
  • Do you require such storage features as thin provisioning, storage backups, hyperconvergence, etc.? If you need fine-grained control over your storage infrastructure, you may consider using Software Defined Storage (SDS) solutions supported in Kubernetes.
  • Do you need storage orchestration? In particular, you may need hyperconvergence to schedule volumes closer to your workloads or spread them across the entire cluster.

Also, you’ll need to enable the following features for your storage solution:

  • Configure storage security. For example, volume access policies, multi-user access, etc.
  • Configure storage High Availability. For example, distribute storage replicas evenly across the AZ or the region.
  • Enable persistent storage for temporary data.

You may need to analyze all possible options and prepare all needed features before you migrate to Kubernetes. Kubernetes provides great support for stateful apps, but your storage architecture should be carefully planned.

Limited High Availability

High Availability is often confused with a multi-master setup. Companies adopting Kubernetes sometimes believe that having several masters will make their K8s cluster highly available — and that is only partly true.

In reality, you should enable High Availability for multiple cluster components, including masters, etcd servers, worker nodes, availability zones, load balancers, Control Plane components, and applications.

Unless you have a comprehensive HA setup in your cluster, you may end up with critical HA issues.

For example, let’s take a case when you have a cluster with three masters (multi-master setup) and one cloud load balancer in front of them. If your single load balancer fails, the external traffic won’t be able to flow into the cluster. Therefore, having multiple load balancer replicas is as important for your cluster’s HA as having several masters.

Also, it’s crucial to decide how fault-tolerant your HA setup should be. In other words, how many times are your components allowed to fail? For example, if you have three etcd replicas, you can lose only one instance. If you need a more fault-tolerant HA, you may need five etcd replicas, although this does introduce additional costs. Thus, your IT department should consider various trade-offs between strong HA and infrastructure costs when designing an HA cluster.

Lack of Monitoring

Failure to integrate monitoring into your cluster and applications at the time of adoption may lead to serious issues. One of them is resource exhaustion caused by the lack of visibility into the resource utilization in your cluster.

Many companies incorrectly believe that they can use traditional monitoring solutions in Kubernetes.

Using traditional monitoring approaches in Kubernetes does not work due to:

  • Complex monitoring targets. Applications in Kubernetes are intricately connected with different abstractions such as containers, Kubernetes Pods, or Deployments. You should be able to retrieve container metrics to understand how your applications work. Also, you’ll need to monitor numerous other things including nodes, namespaces, clusters.
  • If you are running microservices, you should be able to monitor cross-service network transactions as well.
  • Kubernetes applications and infrastructure are fluid. You need monitoring tools that can dynamically catch container events and be integrated with Kubernetes controllers and schedulers.

Therefore, companies adopting Kubernetes should prioritize the creation of the K8s-compliant monitoring system for their clusters.

Although Kubernetes offers native monitoring tools, their main purpose is to provide metrics to K8s scheduler. What you need is a full metrics pipeline using the K8s Custom Metrics API.

There are a few established solutions you can use including Prometheus, Jaeger, Weave Scope, Sysdig Monitor, or Dynatrace, among others.

Installing a third-party monitoring pipeline is only one piece of the puzzle. To enable a production-grade monitoring system, you’ll also need to:

  • Configure monitoring for different workloads, which may involve the knowledge of domain-specific configuration language.
  • Enable and configure metrics endpoints in your applications. These are API endpoints that will ship metrics to your monitoring pipeline
  • Aggregating and storing metrics.
  • Visualizing, analyzing, and producing insights from your metrics.

Weak Cluster Security

Some companies believe that containers and Kubernetes provide security out of the box. However, containers do not provide the same level of isolation as Virtual Machines (VMs). Containers and K8s clusters alike should be configured with best security practices to avoid critical scenarios.

Awareness of security issues is currently very low among tech companies, but,according to Threat Stack, 73% of companies have at least one critical security issue when working with containers. Thus, it’s very critical for companies adopting Kubernetes to understand all potential security risks.

So, what are key security issues and prerequisites to address when building a secure Kubernetes cluster?

Configuration Errors and Exposing Sensitive Information

Containers running databases or other critical software often need to consume secrets, passwords, and API keys. You don’t want this information to be hardcoded into K8s manifests, and Kubernetes provides some features to expose it safely. However, if you rely solely on K8s mechanisms like Secrets or ConfigMaps for that purpose, misconfigurations can lead to serious security issues. It is preferable to encrypt your sensitive information with third-party tools such as HashiCorp Vault and then expose it to containers using built-in K8s tools.

Minimizing an Attack’s Blast Radius

“Blast radius” term is often used to describe the effect of a security breach on the software system. Specifically, it defines the scope or size of critical data and applications that can be compromised by an attacker.

For example, if an attacker gets access to the container running in a privileged mode, they can escalate these privileges to take control of other containers or the entire API server. In this case, the blast radius is large.

The goal of K8s cluster designer is to minimize the blast radius. There are several K8s native methods and tools for this discussed below. You can also leverage Kubernetes isolation mechanisms such as network segmentation and Namespaces.

Implementing RBAC

By default, each container has access to the Kubernetes API. If the token mounted to the container has cluster admin rights, a hacker can easily escalate privileges to the entire cluster. To prevent this, you can use Role-Based Access Control (RBAC) policies.

RBAC allows defining specific cluster roles and assigning them to specific service accounts (users and applications). Roles specify privileges (read, write), API endpoints that can be accessed and other important security parameters.

Use Trusted Container Registries

You can run into serious security vulnerabilities if you are running containers from untrusted registries. Use only those container images you created or those obtained from trusted providers like Docker. Also, K8s administrators should regularly check containers for vulnerable dependencies even if containers are packaged by your own dev team.

Don’t Run Containers in a Privileged Mode

As we’ve already mentioned, running Docker containers with the --privileged ¬†flag will grant all root rights to the container, including access to all host devices. If an attacker gets access to such a container, they can control the entire cluster.

Therefore, the best practice is to avoid running containers in a privileged mode. If you want to grant certain root privileges to containers or Pods, you can use Linux capabilities that provide a fine-grained configuration of root privileges without granting them a full set of root rights.

Image size

The large container image size can lead to excessive memory footprint in your cluster. It’s therefore crucial to avoid the uncontrolled growth of your containers. To keep containers in check, you should watch container log files, clean caches, collect garbage after untidy commands, and follow best practices for creating and running containers.

Failure to Transform the Code Development Process

If you plan to develop apps for containers and Kubernetes, your IT workflow should incorporate new development and DevOps practices and approaches. You may need to reimagine the entire DevOps workflow to deliver containers for Kubernetes. This can save your time on testing, debugging, and release management, etc.

The first thing to do is to select a development mode for your K8s apps. There are options: offline, proxied, live, and online.

In the offline mode, you develop locally using a combination of Minikube, Docker or Minishift. The benefit of this mode is that you don’t have to pay for live infrastructure. However, the downside is that the synchronization between dev and production environments is more difficult.

In the proxied mode, you can sync local development with the live cluster via proxying and traffic forwarding in and to the cluster. Corresponding, in the live mode, you build and deploy against a remote cluster. The main benefits of this mode are large compute and storage capacity of live cluster and support of collaborative workflows because other developers can easily access the development environment. Finally, both the development environment and your cluster are remote in the pure online mode.

Other important prerequisites for your new development process include CI/CD automation, debugging, and remote development. Let’s discuss them in more detail.

CI/CD automation. It may be helpful to automate the generation of Dockerfiles and Helm charts for your K8s apps, and you can use such tools as Draft. This tool identifies the programming language in which your app is written and generates a Dockerfile along with a Helm chart. You can then use Draft to run the build and deploy the resultant image to the target cluster via the Helm chart. With this tool, you can also easily set up port forwarding to localhost.

Debugging. Since you will be using containers, it’s necessary to introduce container-aware debugging functionality to your development process. To this end, you can use such programs as Squash, which includes a debug server that allows you to insert breakpoints into Dockerfiles and applications. Using this tool, you can directly attach the debugger to Pods and containers running in your K8s cluster.

Remote development. When you are developing against live K8s cluster, it’s critical to keep the development environment in sync, and you can use use such tools as Telepresence to do so. Telepresence allows running a Docker container locally while proxying it to your Kubernetes cluster.

Inefficient Resource Management

Poorly configured resource constraints of containers and Pods may lead to resource shortage in the cluster and noisy neighbor problem when one Pod steals resources from other Pods. Poor resource management may also cause the eviction of Pods, as well as failure to schedule new workloads, latency issues, service disruptions, etc.

To avoid these issues, you can use Kubernetes resource management features for containers, namespaces, and the entire cluster.

For example, with resource requests and limits, you can set the lower and upper boundaries of resources that a given Pod can use. To identify containers with missing requests and limits, you can use such tools as Supergiant Analyze. This tool searches for all containers in the cluster and identifies those lacking resource request and limits.

Also, you can control how resources are used at the Namespace level by setting resource quotas, default requests and limits values, and other resource constraints. Kubernetes offers all the necessary features to avoid the problems mentioned above.

Inefficient Database Management

Databases deployed in Kubernetes require special management procedures. Unlike stateless workloads, you can’t just spin up and down them on a moment’s notice because this can lead to the data inconsistency.

For example, there may be some pending operations that can be compromised if you spin down the database. Therefore, companies should find a way to handle various pre- and post- deployment hooks to ensure data consistency.

Companies transitioning to Kubernetes must either build scripts in order to properly operate their databases on Kubernetes or utilize a Kubernetes operator designed for the task.

Conclusion

To sum it up, companies adopting Kubernetes may encounter a lot of challenges and problems if they fail to follow best practices and plan their transition.

Before moving to Kubernetes, start transforming your IT culture, app development process, and CI/CD pipelines. Also be aware of existing security and configuration issues, and address those when creating their clusters. We hope that this article helps companies make their transition to Kubernetes more smoothly.

Supergiant.io offers Kubernetes Enterprise support subscriptions to help your company successfully adopt Kubernetes. Visit this page to learn more.

Subscribe to our newsletter