K8s clusters are complex and dynamic environments where nodes and application exist in a constant state of flux. Supergiant Capacity and Supergiant Analyze are two components of the Supergiant 2.0.0 toolkit that help K8s administrators manage this complexity.
While Supergiant Capacity enables intelligent auto-scaling of nodes to reduce infrastructure costs, this is just one part of the puzzle.
Administrators who manage multi-user clusters with multiple namespaces and applications installed need a fine-grained and real-time view of the utilized resources, application metrics and performance, and — even more important — actionable insights that help fix issues as they arise. This is exactly what Supergiant Analyze does.
In this blog, we’ll discuss the architecture of the Supergiant Analyze and show how to use the Resource Requests and Limits plugin designed to improve resource optimization in your Kubernetes clusters.
Supergiant Analyze may be compared to Artificial Intelligence or a smart advisor that helps Kubernetes administrators identify problems in their cluster in real-time and fix them based on Analyze recommendations.
The tool collects metrics, checks configuration, and suggests actions for users to improve health/performance/optimization of cluster(s) and applications(s). These tasks are performed by the “virtual” team of “workers” — Analyze plugins that are responsible for different types of issues and scenarios. Analyze Control Plane periodically invokes each plugin and stores the results of k8s cluster/hosted apps checks and analysis by each plugin the etcd key value store.
Analyze may be defined as a Service that interacts with each plugin using a well-defined API based on gRPC protobufs. In general, the Analyze Service:
In its turn, based on the analysis of metrics, each plugin informs the user about the state of a given metrics/object/component. Depending on the individual plugin’s algorithm, each state can be:
Each plugin determines if the check falls within any of these categories based on its specific requirements.
After the check is done, the plugin suggests a set of actions for the user to improve the state of the cluster or application(s). Users can either execute the action or dismiss it. If the action is approved, a plugin will interact with the Kubernetes API or using other means to modify the cluster state. This can involve removing a node, rescheduling applications to another node, or changing Pod configuration.
Each plugin is an autonomous application that can be integrated with different external environments. For example, in the near future, Analyze will allow plugin integrations with:
Let’s get the feel of how Analyze and its plugins work by looking into its UI. Let’s get started!
To complete examples in this tutorial, you’ll need:
At the moment, the Supergiant Analyze UI has two main pages: Home and Plugins.
On the Home page, you can see the list of plugin checks stamped by the date when there were run (see the image below).
On the Plugins page, you can see the list of installed plugins. By default, Supergiant Analyze ships with two plugins developed by our team: “Underutilized nodes sunsetting” plugin and “Resources (CPU/RAM) requests and limits” (or simply “Requests/Limits plugin”). We’ll discuss one of them in a moment.
Let’s go back to the Home Page and discuss a plugin notification. In the image below, you can see a single notification by the Requests/Limits plugin.
Each notification has the following elements:
Let’s discuss these features using a real example of the Requests and Limits plugin.
As you remember from our previous tutorial titled “Assigning Computing Resources to Containers and Pod in Kubernetes,” Kubernetes resource requests and limits are important tools for optimizing resource utilization in your Kubernetes cluster. In short, requests ensure that containers get a minimum amount of resource they need and are scheduled on nodes that have these resources, and they handle guarantees that containers can use the amount of resources (RAM and CPU) up to a certain limit. This allows your applications to burst when the traffic grows, for instance.
Requests/Limits plugin checks to see if resource requests and limits are properly configured in containers deployed in the cluster managed by Supergiant. Based on these findings, the plugin suggests actions to take. The plugin can dramatically improve resource utilization in your cluster by aligning resource requests/limits configuration with the Kubernetes resource model’s best practices.
That’s how the Requests/Limits plugin works. For each node in the cluster and for each Pod running on these nodes, the plugin checks container requests and limits and compiles a detailed table describing the status of requests/limits configuration for each container (see the image below). You can see this table by expanding the “Details” section inside the plugin notification.
As you see in the table above, the plugin analyzed containers in two Pods residing on the node in our cluster: alertmanager-prometheus-operator-alertmanager-0 and prometheus-operator-grafana-7654f69d89-mhhkg . For each container in the Pod, the plugin displays the container name, container image, and requests/limits configuration for both RAM and CPU.
For example, let’s take a look at the requests and limits check for alertmanager container running the first Pod. As you see, the container has a properly configured RAM request, but the CPU request is not set. The plugin treats this case as a major error indicated by the red highlighting of the text: “is not set.” In contrast, if limits are not set, the plugin assigns the yellow status to the container. That’s because setting requests is considered to be more important than setting limits since the absence of such a request can prevent a Pod from being scheduled at all.
As you see in the image above, the general status of the plugin check is red. How does the Requests/Limits plugin decide on what general status to assign to the full check? In general, the following rules apply.
Users can choose to Dismiss or Approve the actions suggested by the plugin. If you want to dismiss the notification, select the “Dismiss Notification” tab and click “Run.” The Analyze will delete this plugin notification from the notifications list.
In contrast, if you want to approve the actions suggested by the plugin, select the “Set missing requests/limits” tab and click “Run.” The plugin will set missing requests/limits automatically or apply custom requests/limits for each container suggested by the user.
In this article, we introduced you to the Supergiant Analyze — a tool in Supergiant 2.0.0 toolkit that enables smart cluster checks and recommendations to improve the efficiency of resource utilization in your cluster and its components. At the moment, the Analyze ships with two built-in plugins, but the Supergiant team is working on making Analyze pluggable and extendable. In the near future, users will be able to create their own plugins following a set of Supergiant plugin interface standards. Supergiant users will be able to develop plugins that cover the following use cases:
Learn more about Supergiant toolkit using the following resources: