This is a really dense subject, and I wanted to take the time to create a deep dive view of the Supergiant packing method and how it translates to savings for your infrastructure.
The Supergiant packing method is based on a Kubernetes concept of minimum versus maximum compute resource settings. This is a deep and very valuable feature of Kubernetes that a lot of users may not be aware exists.
The Kubernetes Compute Resource Model
Kubernetes has a concept of minimum and maximum resource allocation. This concept can be applied to many objects in the Kubernetes cluster such as pods (collections of containers), Namespaces (Collections of pods, services, and replication controllers), and Nodes (the physical servers in your Kubernetes cluster.). This min/max value was included in Kubernetes as a way for multiple users in an environment to be able to allocate a resource ratio to their applications without having noisy neighbor impacts on other users.
Example of Resource Values in Kubernetes
We will keep our example to cpu for now, but ram ratios apply in a similar way. Let’s say my application has a minimum CPU value of 4 cpus, and a maximum CPU value of 8 cpu. How would this application behave on a Kubernetes node with 8 physical processors?
How many of copies of my app could I fit on the node?
The answer is 2. An important thing to remember about Kubernetes is that it really does treat your resource pool as one giant pile of CPU and RAM. Add another 8 processor node, and your cluster will now be able to support 4 instances of your application. Kubernetes will allow scheduling of an application if its minimum resource request is within the capabilities of the cluster. If the cluster cannot support the minimum number of CPU you have requested, you will get an error that scheduling failed due to insufficient CPU.
So now you're thinking… My max CPU is set to 8 cpu. What happens if my application starts to try using MORE processor capacity then my node has? Let's consider this situation…
If your application starts to exceed its maximum CPU allocation, Kubernetes will attempt to move it to another node if possible.
This can happen incredibly quickly. For a stateless app running in a well written container, this can be as low as 1 second. This can take more like 30 seconds for a stateful app because the persistent storage needs time to detach from Node 1 and re-attach to node 2.
Supergiant Resource Model
So how does Supergiant augment this default Kubernetes resource behavior?
Well... Lets first look at autoscaling. The term "autoscaling" gets thrown around a lot, but there are multiple types of autoscaling. By default, Kubernetes supports “Horizontal Autoscaling.” This is the ability to scale your application based on factors like resource usage, network latency, etc.
Today this is an expected part of any container, cluster, or cloud compute system. But what about cost efficiency autoscaling? It is either overlooked or may not be in the interest of the provider to easily expose features like this to you for obvious reasons. This is where Supergiant shines.
Let's refer to our example from above. This is a great setup right? My app starts to use a lot of resources, and Kubernetes moves things around to make sure the demands of the app are being met within its resource min/max values. But what about Node 2? You may feel like this particular situation is okay because now “App instance 2” has some headroom and can continue to run with heavy load.
But the node “App instance 2” is on now has a lot of unused CPU. You paid for this CPU, and now it is getting wasted! This really translates to constrictions of profit margin you would otherwise like to avoid.
The Supergiant method augments Kubernetes resource management by picking hardware settings for your nodes that most efficiently match your overall CPU and RAM needs. This is how that situation would look in Supergiant.
Node 2 would be automatically created/sized to best match the CPU/RAM requirements of your migrating app.
Our “cost” autoscaling method will slowly work to ensure your applications are “packed” on your hardware the most efficient way possible..
Now let's add that Kubernetes minimum resource value back into the mix. Here is our ratio:
The minimum resource value translates to a maximum number of components that can fit on a node.
The maximum resource value translates to: When should this component be either throttled or moved to a node with more resource?
Let's continue with our examples above. What would it look like if we had “App instance 1” (set with a min 2/ max 4 CPU ratio) and several other smaller apps (set with a min 1/max 2 CPU ratio)?
Whoah! Now we really get an idea of the cost savings here…
These apps are now able to occupy a server that only really has the processor resources for half of them… AND.. we do not run into noisy neighbor issues.
If any of these components start to really work hard, and if that work looks like it may start to impact the resources needed by the other components…
the app will quickly move over to a new/underutilized node.
You may be thinking “But my apps would move all the time -- right?”
The reality of shared computing is that most of the time compute resources are not really being used by most of the apps in the cluster. Your app will only move if its resource needs come in conflict with all the other apps on the node. Even after packing components onto a cluster with a minimum resource value of ½ max, most environments will see an average total CPU usage around 30-40% on the node, which means there is a fair bit of burstable head room for any one or two components that temporally go nuts.
This is where the minimum resource value sort of moves into a savings vs. statefulness type of ratio. You get stability and economy! The higher your minimum resource allocation as a percentage or max allocation, the more your application will tend to stay put on a node. If you were to lower your min value, to ⅓ of max, ⅛ of max, or even 0, you would see a corresponding likelihood that the app may move from node to node.
To illustrate, I will use a real-world example that we encounter at Qbox, where our business is managing clusters of Elasticsearch components.
Kibana, the open source visualization package from the same core team that built Elasticsearch is one of the most used integrations, so we make it easy to install. We provide Kibana as a supporting package in your ES component. We would like to minimize the movement of ES pods from node to node, so we typically would not set our minimum resource needs to less than ½ of our maximum value. Kibana however is much lighter weight from a resource perspective. We could set its resource ratio to a minimum of ¼ of max. If it moves around, it is unlikely anyone would notice or care.
This is a real view from a simple “Packing Visualizer” we use to monitor our packing efficiency. Each colored box is a component, being packed into a physical node.
Ultimately, we decided that this hand-rolled solution was way too cool to keep to ourselves.
We (qbox.io) struggled as a service provider to provide the best possible performance to our customers and still maintain a survivable profit margin. Other container management platforms were cool, but they didn’t really focus on our need to more efficiently manage our hardware spending. We thought that if we released our platform to the wild, others may find it useful in keeping their hardware costs down. We also added a pretty sexy UI to boot.
Supergiant is currently in a pre-1.0 release state.
We would love to get your feedback, contributions, and to have you try it out.
We also would like to hear about other infrastructure capacity issues you have experienced with other container/cluster management tools, so we can look at possibly providing solutions in future versions of Supergiant.