Supergiant Blog

Product releases, new features, announcements, and tutorials.

Why Is the Supergiant Packing Algorithm Unique? How Does It Save Me Money?

Posted by Mike Johnston on May 19, 2016

This is a really dense subject, and I wanted to take the time to create a deep dive view of the Supergiant packing method and how it translates to savings for your infrastructure. 

The Supergiant packing method is based on a Kubernetes concept of minimum versus maximum compute resource settings. This is a deep and very valuable feature of Kubernetes that a lot of users may not be aware exists.

The Kubernetes Compute Resource Model

Kubernetes has a concept of minimum and maximum resource allocation. This concept can be applied to many objects in the Kubernetes cluster such as pods (collections of containers), Namespaces (Collections of pods, services, and replication controllers), and Nodes (the physical servers in your Kubernetes cluster.). This min/max value was included in Kubernetes as a way for multiple users in an environment to be able to allocate a resource ratio to their applications without having noisy neighbor impacts on other users.

Example of Resource Values in Kubernetes

We will keep our example to cpu for now, but ram ratios apply in a similar way. Let’s say my application has a minimum CPU value of 4 cpus, and a maximum CPU value of 8 cpu. How would this application behave on a Kubernetes node with 8 physical processors?

How many of copies of my app could I fit on the node?

The answer is 2. An important thing to remember about Kubernetes is that it really does treat your resource pool as one giant pile of CPU and RAM. Add another 8 processor node, and your cluster will now be able to support 4 instances of your application. Kubernetes will allow scheduling of an application if its minimum resource request is within the capabilities of the cluster. If the cluster cannot support the minimum number of CPU you have requested, you will get an error that scheduling failed due to insufficient CPU.

So now you're thinking… My max CPU is set to 8 cpu. What happens if my application starts to try using MORE processor capacity then my node has? Let's consider this situation…

If your application starts to exceed its maximum CPU allocation, Kubernetes will attempt to move it to another node if possible.

This can happen incredibly quickly. For a stateless app running in a well written container, this can be as low as 1 second. This can take more like 30 seconds for a stateful app because the persistent storage needs time to detach from Node 1 and re-attach to node 2.

Supergiant Resource Model

So how does Supergiant augment this default Kubernetes resource behavior?

Well... Lets first look at autoscaling. The term "autoscaling" gets thrown around a lot, but there are multiple types of autoscaling. By default, Kubernetes supports “Horizontal Autoscaling.” This is the ability to scale your application based on factors like resource usage, network latency, etc. 

Today this is an expected part of any container, cluster, or cloud compute system. But what about cost efficiency autoscaling? It is either overlooked or may not be in the interest of the provider to easily expose features like this to you for obvious reasons. This is where Supergiant shines.

supergiant resource model

Let's refer to our example from above. This is a great setup right? My app starts to use a lot of resources, and Kubernetes moves things around to make sure the demands of the app are being met within its resource min/max values. But what about Node 2? You may feel like this particular situation is okay because now “App instance 2” has some headroom and can continue to run with heavy load. 

But the node “App instance 2”  is on now has a lot of unused CPU. You paid for this CPU, and now it is getting wasted! This really translates to constrictions of profit margin you would otherwise like to avoid.

The Supergiant method augments Kubernetes resource management by picking hardware settings for your nodes that most efficiently match your overall CPU and RAM needs. This is how that situation would look in Supergiant.

containers and nodes

Node 2 would be automatically created/sized to best match the CPU/RAM requirements of your migrating app.

Our “cost” autoscaling method will slowly work to ensure your applications are “packed” on your hardware the most efficient way possible..

Now let's add that Kubernetes minimum resource value back into the mix. Here is our ratio:

  • The minimum resource value translates to a maximum number of components that can fit on a node.

  • The maximum resource value translates to: When should this component be either throttled or moved to a node with more resource?

Let's continue with our examples above. What would it look like if we had “App instance 1” (set with a min 2/ max 4 CPU ratio) and several other smaller apps (set with a min 1/max 2 CPU ratio)?

supergiant packing algorithm

Whoah! Now we really get an idea of the cost savings here… 

These apps are now able to occupy a server that only really has the processor resources for half of them… AND.. we do not run into noisy neighbor issues. 

If any of these components start to really work hard, and if that work looks like it may start to impact the resources needed by the other components…

apps and nodes

the app will quickly move over to a new/underutilized node.

You may be thinking “But my apps would move all the time -- right?” 

Not really. 

The reality of shared computing is that most of the time compute resources are not really being used by most of the apps in the cluster. Your app will only move if its resource needs come in conflict with all the other apps on the node. Even after packing components onto a cluster with a minimum resource value of ½ max, most environments will see an average total CPU usage around 30-40% on the node, which means there is a fair bit of burstable head room for any one or two components that temporally go nuts. 

This is where the minimum resource value sort of moves into a savings vs. statefulness type of ratio. You get stability and economy! The higher your minimum resource allocation as a percentage or max allocation, the more your application will tend to stay put on a node. If you were to lower your min value, to ⅓ of max, ⅛ of max, or even 0, you would see a corresponding likelihood that the app may move from node to node.

To illustrate, I will use a real-world example that we encounter at Qbox, where our business is managing clusters of Elasticsearch components. 

Kibana, the open source visualization package from the same core team that built Elasticsearch is one of the most used integrations, so we make it easy to install.  We provide Kibana as a supporting package in your ES component. We would like to minimize the movement of ES pods from node to node, so we typically would not set our minimum resource needs to less than ½ of our maximum value. Kibana however is much lighter weight from a resource perspective. We could set its resource ratio to a minimum of ¼ of max. If it moves around, it is unlikely anyone would notice or care.

This is a real view from a simple “Packing Visualizer” we use to monitor our packing efficiency. Each colored box is a component, being packed into a physical node.

kubernetes supergiant packing algorithm visualizer

Ultimately, we decided that this hand-rolled solution was way too cool to keep to ourselves. 

We (qbox.io) struggled as a service provider to provide the best possible performance to our customers and still maintain a survivable profit margin. Other container management platforms were cool, but they didn’t really focus on our need to more efficiently manage our hardware spending. We thought that if we released our platform to the wild, others may find it useful in keeping their hardware costs down. We also added a pretty sexy UI to boot.

Supergiantio Dashboard

Supergiant is currently in a pre-1.0 release state. 

We would love to get your feedback, contributions, and to have you try it out. 

We also would like to hear about other infrastructure capacity issues you have experienced with other container/cluster management tools, so we can look at possibly providing solutions in future versions of Supergiant.

For more information, check out our documentation, our Github, join our slack channel, or read the blog

Keep reading

How to Install Ghost Node JS Blog with Docker on Supergiant

Posted by Mark Brandon on May 10, 2016


Hello, this is Mark Brandon, CEO and Co-Founder of Qbox, the creators of Supergiant.  

Today we’re going to show you some of the cool things you can do with Supergiant, namely deploying a blog application straight from Docker Hub into your private Kubernetes environment.

We’re going to start from the Supergiant dashboard, but before we do that, I’m going to find a suitable container for the popular blogging platform, Ghost.

There are several public containers, but fortunately, there is an officially supported container complete with well-documented instructions.

From our dashboard, the first thing I’m going to do is create an App. For simplicity, I’ll just name it “Blog.”

ghost_tut_2.jpg#asset:40Now we have an empty app.  Apps aren’t useful of course until you add components. I’ll name my component “ghost server.”

ghost_tut_3.jpg#asset:41

Components are the key to understanding Supergiant. Within components, you have the data volumes, containers, and container networking. We need to add containers and data volumes.  

I’m going to first add a data volume. We’ll name it ghost-data and give it 40GB of SSD storage.

ghost_tut_5.jpg#asset:43

Now I have a volume, so the next thing I need to do is add my container.  

To specify a docker image, I need to provide a path. If you’re using Docker Hub, you can use the path specified after the pull command. In this case, it is just “ghost.” If I wanted to use another public container, I would use its full path.

ghost_tut_6.jpg#asset:44

Next I will specify a CPU value. Supergiant will automatically scale nodes within this range. Five hundred cores is a lot, but remember we won’t use that much until it’s needed. In actuality, half of one CPU is sufficient to run a medium-sized blog.

Next I’m going to select RAM. By putting a zero in both fields, I’m telling Supergiant to use as much RAM as is available within our giant resource pool and that is required by Ghost at any given moment.

If you have custom commands or environment variables, this is the place to put them, but it’s not necessary for this example.  

ghost_tut_7.jpg#asset:45

We do want to mount the volume we just created. The volume name is ghost-data, and we want to specify the path, according to the instructions.

We need to expose ports and make it public. For this image, 2368 is the appropriate port. The protocol is HTTP. We need to make it public. It’s important to know that Supergiant currently requires this port to be between 30000 and 40000. I’m going to pick a number at random, say, 30211.  

If you designate Public, then it needs to be attached to an entry point. For this install, the entry point is just supergiant.

ghost_tut_8.jpg#asset:46

You’ll notice that the component is still not running. We have to deploy it. To do that, go to Releases, Create a New Release, and then deploy it. After a few seconds, you’ll see this component is now running.  

ghost_tut_9.jpg#asset:47

All we have to do now is put in the address with the port we specified, and voila, we now have a running ghost instance. Users of ghost will recognize this screen.

ghost_tut_11.jpg#asset:48

More importantly, this instance is set to scale up and down according to its needs within the confines of our resource allocations, giving us both performance and cost efficiency.

For more information, check out our documentation, join our slack channel, or read the blog. This is Mark Brandon. Thanks for watching.

Keep reading

Introducing Supergiant: Datacenter Total Control System

Posted by Ben Hundley on April 14, 2016

At long last, we present Supergiant

Youtube-Intro.jpg#asset:15

Supergiant is an application platform for 2016. It's sexy and it's powerful. It's the excitement we had for application platforms back in 2008 -- but without all the crushing disappointment and stifling constraint. 

It's production-grade Docker containers on which you can actually run stateful, clustered datastores. It's portability and it's immutability, and it's made by hillbillies. It's sweet, sweet medicine for the large majority of your ailments (* disclaimer to follow).

Who are we? We are the team behind Qbox.io. Supergiant was built from our blood, sweat, and, primarily, our tears, while we were trying to orchestrate thousands of Elasticsearch nodes in the most performant, stable, and low-cost way possible.

Screen-Shot-2016-04-25-at-4.06.14-PM.png

Software deployed on virtual machines is not portable, and that's a big problem. 

It's a problem because servers and VMs fail unexpectedly, and replacing them is not a quick process. They are fragile giants, with bespoke configurations and interwoven processes, yet we gamble entire businesses on the ability to modify them while live. Deployments are an anxiety-ridden affair for many.

Docker is a hot topic because it's a solution to this problem ... partially. It combines code and its host configuration, which are often physically separate entities despite their inherent logical coupling. In other words, it allows developers to declare server configuration right in the relevant code repo, which dramatically reduces unexpected disparity between development and production environments.

But Dockerizing your applications doesn't solve all your production problems. Docker is software and (at the risk of stating the obvious) requires a server and operating system just like your application.

The critical difference is the indirection it provides between your application and servers. Instead of code relying directly on an underlying VM, it relies on a consistent pre-built image with all the necessary dependencies and configuration. A container is produced from the image, which can then move freely among any number of VMs running the Docker engine.

The Docker approach improves deployment and restart time, and it restricts disastrous server modifications.  Containers can be quickly replaced, so deployments do not hinge on modifying live configurations. Also, multiple containers can reside on one VM, which can greatly lower infrastructure costs.

Still, there exists the problem of deploying the containers themselves to the host machines (AKA container orchestration). That's where Kubernetes comes in.

When Kubernetes "clicked" for us, it was like a warm fuzzy punch in the face. Imagine living under a rock for 3 years and building disaster recovery systems for large-scale, mission-critical databases. The rock you're imagining here is the one Qbox emerged from not long ago...

jumanji5.jpg

Growing Pains

Qbox started in April 2013 with a simple idea. We wanted to host Elasticsearch for application developers and focus solely on the ops aspect. The original implementation was naive, at best. At worst, it was an absolute nightmare.

Qbox1.jpg#asset:17

We stood up servers with external SSDs running Elasticsearch on Rackspace, and on each of those ran our code that handled API tokens, rate limiting, logging, etc. In other words, they were multi-tenanted clusters with rigidly whitelisted routes for security.

At first, performance was great, and margins were terrible. After a few months, we had stuffed maybe 20 active users on the biggest cluster, which was costing us $1600 and generating about $400 each month. We eventually gained a customer who selected the largest usage tier we offered and then proceeded to beat the living crap out of the aforementioned cluster.

Amid the blinding spew of meaningless logs, we discovered the infamous "noisy neighbor" effect, although it was actually less about "noise." It wasn't that requests were slow. It was more that requests were totally dropped for periods of hours in the middle of the night due to timeouts while we frantically answered emails and prayed to anything listening. No amount of added scale could help the situation, and with such painful margins we were literally too poor.

The 2nd iteration of Qbox was an entirely new codebase with an entirely new approach. We wanted to support multiple clouds, namely AWS, and offer hand-select certain instance types. Users were no longer granted access to a cluster with an API token but instead had a form to configure single-tenant, multi-node clusters running on isolated virtual machines in any region.

Qbox2.jpg#asset:18

Qbox v2 was, somewhat surprisingly, a huge success over night (relative to a very low starting bar, of course). 

The request load increased by one order of magnitude, maybe two. But we could finally sleep at night because there was no single point of failure.

Bottlenecks

Fast forwarding 2 years found us unable to sleep at night because we had 4 engineers replacing dead nodes and answering support tickets all hours of the day, every day.

At that point, we concluded it was just the nature of the cloudhosting beast and that there was no escape. We wrote every possible recovery system we could think of, but issues still occurred.

What made matters worse was the volume of resources allocated compared to the usage. We had thousands of servers with a collective CPU utilization under 30%. We were spending a significant chunk of cheese on processors that were sitting there doing absolutely nothing.

Enter Docker. Our team avoided Docker for a while, probably on the vague assumption that the network and disk performance we had with VMs wouldn't be possible with containers. That assumption turned out to be entirely wrong.

To run performance tests, we had to find a system that could manage networked containers and volumes. That's when we discovered Kubernetes. It was alien to us at first, but by the time we had familiarized ourselves and built a performance testing tool, we were sold. Not only did we find performance to be as good as our previous VM model, we also found it was possible to achieve better performance. 

The performance improvement we observed was due to how many containers we could “pack” on a single machine. Ironically, we began the Docker experiment wanting to avoid “noisy neighbor,” which we assumed was inevitable when several containers shared the same VM. (After all, isolating users to their own infrastructure had been the catalyst for the success of Qbox v2.)

However, that isolation also acted as a bottleneck, both in terms of performance and of cost. A fundamental constraint of VMs is that they are a finite resource..... For example, if a machine has 2 cores and you need 3 cores, there’s a problem. A typical solution is to buy 4 cores (since it’s rare to come across 3) and not utilize them fully.

(Remember that, literally, the only good thing about Qbox v1 (with its big shared clusters) before it came crashing down had been that users had “wiggle room.” That is, users were placed on host machines that had more resources than they were requesting. On good days, that meant they had spare capacity. Users who were underutilizing had resources the overutilizer could… utilize. It's therefore probably obvious what happened on bad days:  if enough users were overutilizing, the cluster died, causing sweeping downtime.)

This is where Kubernetes really starts to shine. It has the concept of requests and limits, which provides granular control over resource sharing. Multiple containers can share an underlying host VM without the fear of “noisy neighbors”. They can request exclusive control over an amount of RAM, for example, and they can define a limit in anticipation of overflow. It’s practical, performant, and cost-effective multi-tenancy.


Multi-tenancy is at the heart of Supergiant. Multi-tenancy is a word that carries a negative connotation, to be sure, but in the context of a single user or organization, multi-tenancy means affordable scale and quick failover.

Supergiant takes that core concept and runs with it.

Over the past several months, we’ve fused Kubernetes with the last 3 years’ worth of our cloud experience, and we produced an open source solution with:

  • Automated server management / capacity control
  • Sharable load balancers
  • Volume management, resizing, backups
  • Extensible deployments
  • Resource monitoring
  • Seriously cool and spacey UI

Screen-Shot-2016-04-25-at-4.06.28-PM.png

Supergiant is an active work in progress, so that list will be growing quickly over the next few months. But it’s already being used in production -- with a major impact.

In early February, Qbox discontinued its VM-based offering on AWS, and started offering clusters exclusively on Supergiant. Our support engineers are sleeping again--for real. Our volume of users has continued to increase, and all the while the stream of support tickets has slowed to a trickle. Our users are getting twice the stability and performance at half the price

Screen-Shot-2016-04-25-at-4.03.15-PM.png

Supergiant was Qbox’s "Hail Mary." We were fed up with the cloud’s bullshit. We were done managing servers and focusing on disaster recovery. We wanted to build things again. We wanted enterprise scale and stability without all the dirty work. We’re software engineers. (We’re lazy.)

Supergiant isn’t your new cloud control center..... It’s your new cloud recliner ... your celestial chariot. 

It's free, and it's open source. Come and get it.

Keep reading