ALL THINGS KUBERNETES

Monitoring your Kubernetes Deployments with Prometheus

In the first part of the Kubernetes monitoring series, we discussed how Kubernetes monitoring architecture is divided into the core metrics pipeline for system components and monitoring pipeline based on the Custom metrics API. Full monitoring pipelines based on the Custom metrics API can process diverse types of metrics (both core and non-core), which makes them a good fit for monitoring both cluster components and user applications running in your cluster(s).

Plenty of solutions exist for monitoring your Kubernetes clusters. Some of the most popular are Heapster, Prometheus, and a number of proprietary Application Performance Management (APM) vendors like Sysdig, Datadog, or Dynatrace.

In this article, we discuss Prometheus because it is open source software with native support for Kubernetes. Monitoring Kubernetes clusters with Prometheus is a natural choice because many Kubernetes components ship Prometheus-format metrics by default and, therefore, they can be easily discovered by Prometheus.

In this post, we’ll overview the Prometheus architecture and walk you through configuring and deploying it to monitor an example application shipping the Prometheus-format metrics. Let’s get started!

What Is Prometheus?

Prometheus is an open source monitoring and alerting toolkit originally developed by SoundCloud in 2012, and the platform has attracted a vibrant developer and user community. Prometheus is now closely integrated into cloud-native ecosystem and has native support for containers and Kubernetes.

When you deploy Prometheus in production, you get the following features and benefits:

A multi-dimensional data model. Prometheus stores all data as time series identified by metric name and key/value pairs. The data format looks like this:

For example, using this format we can represent a total number of HTTP POST request to the /messages  endpoint like this:

This approach resembles the way Kubernetes organizes data with labels. Prometheus data model facilitates flexible and accurate time series data and is great if your data is highly dimensional.

A Flexible Query Language. Prometheus ships with PromQL, a functional query language that leverages high dimensionality of data. It allows users to select, query, and aggregate metrics collected by Prometheus preparing them for subsequent analysis and visualization. PromQL is powerful in dealing with time series due to its native support for complex data types such as instant vectors and range vectors, as well as simple scalar and string data types.

Efficient Pull Model for Metrics Collection. Prometheus collects metrics via a pull model over HTTP. This approach makes shipping application metrics to Prometheus very simple. In particular, you don’t need to push metrics to Prometheus explicitly. All you need to do is to expose a web port in your application and design a REST API endpoint that will expose the Prometheus-format metrics. If your application does not have Prometheus-format metrics, there are several metrics exporters that will help you convert it to the native Prometheus format. Once the /metrics  endpoint is created, Prometheus will use its powerful auto-discover plugins to collect, filter, and aggregate the metrics. Prometheus has good support for a number of metrics providers including Kubernetes, Open Stack, GCE, AWS EC2, Zookeeper Serverset, and more.

Developed Ecosystem. Prometheus has a developed ecosystem of components and tools including various client libraries for instrumenting application code, special-purpose exporters to convert data into Prometheus format, AlertManagers, web UI, and more.

Efficient auto-discove and excellent support for containers and Kubernetes make Prometheus a perfect choice monitoring Kubernetes applications and cluster components. For this tutorial, we will monitor a simple web application exporting Prometheus-format metrics. We used an example application from the Go client library that exports fictional RPC latencies of some service. To deploy the application in the Kubernetes cluster, we containerized it using Docker and pushed to the Docker Hub repository.

To complete examples used below, you’ll need the following prerequisites:

  • A running Kubernetes cluster. See Supergiant documentation for more information about deploying a Kubernetes cluster with Supergiant. As an alternative, you can install a single-node Kubernetes cluster on a local system using Minikube.
  • A kubectl command line tool installed and configured to communicate with the cluster. See how to install kubectl here.

Step 1: Enabling RBAC for Prometheus

We need to grant some permissions to Prometheus to access pods, endpoints, and services running in your cluster, and we can do this via the ClusterRole  resource that defines an RBAC policy. In the ClusterRole  manifest, we list various permissions for Prometheus to manipulate (read/write) various cluster resources. Let’s look at the manifest below:

The above manifest grants Prometheus the following cluster-wide permissions:

  • read and watch access to pods, nodes, services, and endpoints.
  • read access to ConfigMaps
  • read access to non-resource URLs such as /metrics  URLs shipping the Prometheus-format metrics.

In addition to ClusterRole , we need to create a ServiceAccount  for Prometheus to represent its identity in the cluster.

Finally, we need to bind the ServiceAccount  and ClusterRole  using the ClusterRoleBinding  resource. The ClusterRoleBinding  allows associating a list of users, groups, or service accounts with a specific role.

Note that roleRef.name  should match the name of the ClusterRole  created in the first step and the subjects.name  should match the name of the ServiceAccount  created in the second step.

We are going to create these resources in bulk, so put the above manifests into one file (e.g., rbac.yml ), separating each manifest by - - -  delimeter. Then run:

Step 2: Deploy Prometheus

The next step is configuring Prometheus. The configuration will contain a list of scrape targets and Kubernetes auto-discovery settings that will allow Prometheus to automatically detect applications that ship metrics.

As you see, the configuration contains two main sections: global configuration and scrape configuration. The global section includes parameters that are valid in all configuration contexts. In this section, we define a 15-second scrape interval and external labels.

In its turn, the scrape config section defines jobs/targets for Prometheus to watch. Here, you can override global values such as a scrape interval. In each job section, you can also provide a target endpoint for Prometheus to listen to. As you understand, Kubernetes services and deployments are dynamic. Therefore, we can’t know their URL before running them. Fortunately, Prometheus auto-discover features can address this problem. Prometheus ships with the Kubernetes auto-discover plugin named kubernetes_sd_configs  that we use in the second job definition. We set kubernetes_sd_configs  to watch for only service endpoints shipping Prometheus-format metrics. We have also included some re-label rules for replacing lengthy Kubernetes names and labels with custom values to simplify monitoring. For this tutorial, we targeted only service endpoints, but you can configure kubernetes_sd_configs  to watch nodes, pods, and any other resource in your Kubernetes cluster.

So far, we’ve mentioned just a few configuration parameters supported by Prometheus. You may be also interested in some others such as:

  • scrape_timeout  —  how long it takes until a scrape request times out.
  • basic_auth  for setting “Authorization” header for each scrape request.
  • service-specific auto-discover configurations for Consul, Amazon EC2, GCE, etc.

For a full list of available configuration options, see the official Prometheus documentation.

Let’s save the configuration above in the prometheus.yml  file and create the ConfigMap  with the following command:

Next, we will deploy Prometheus using the container image from the Docker Hub repository. Our deployment manifest looks like this:

To summarize what this manifest does:

  • Launches two Prometheus replicas listening on the port 9090 .
  • Mounts the ConfigMap  created previously at the default Prometheus config path of /etc/prometheus/prometheus.yml
  • Associates Prometheus Service Account with the deployment to grant needed permissions.

Let’s save the manifest in the prometheus-deployment.yml  and create the deployment:

To access the Prometheus web interface, we also need to expose the deployment as a service. We used the NodePort service type:

Let’s create the service, saving the manifest in the prometheus-service.yaml  and running the command below:

Alternatively, you can expose the deployment from your terminal. By doing so, you don’t need to define the Service manifest:

Once the deployment is exposed, you can access the Prometheus web interface. If you are using Minikube, you can find the Prometheus UI URL by running minikube service  with the --url  flag:

Take note of the URL to access the Prometheus UI a little bit later when our test metrics app is deployed.

Step 3: Deploy an Example App Shipping RPC Latency Metrics

Prometheus is now deployed, so we are ready to make it consume some metrics. Let’s deploy our example app serving metrics at the /metrics  REST endpoint. Below is the deployment manifest we used:

This deployment manifest is quite self-explanatory. Once deployed, the app will be shipping random RPC latencies data to /metrics  endpoint. Please make sure that all the labels and label selectors match each other if you prefer to use your own names.

Go ahead and create the deployment:

As you remember, we configured Prometheus to watch service endpoints. That’s why, we need to expose our app’s deployment as a service.

For clarity, we set the value of the spec.ports[].targetPort  to be the same as spec.ports[].port , although Kubernetes makes it automatically if no value is provided for targetPort .

As with the Prometheus service, you can either create the service from the manifest or expose it inline in your terminal. If you opt for the manifest, run:

If you prefer the quick inline way, run:

Let’s verify that the service was successfully created:

As you see, the NodePort  was assigned and the deployment’s endpoints were successfully added to the service. We can now access the app’s metrics endpoint on the specified IP and port. If you are using Minikube, you’ll first need to get the service’s IP with the following command:

Now, let’s use curl to GET some metrics from that endpoint:

As you see, the request returned a number of Prometheus-formatted RPC latencies metrics. Each metric is formatted as <metric name>{<label name>=<label value>, ...}  and has a unique value.

Thanks to the Prometheus Kubernetes auto-discover feature, we can expect that Prometheus has automatically discovered the app and has begun pulling these metrics. Let’s access the Prometheus web interface to verify this. Use you Prometheus service IP and the NodePort  obtained in Step 2 to access the Prometheus UI.

If you go to the /targets  endpoint, you’ll see the list of the current Prometheus targets. There might be a lot of targets because we’ve configured Prometheus to watch all service endpoints. Among them, you’ll find a target labeled app="rpc-app" . That’s our app. You can also find other labels and see the time of the last scrape.

Prometheus Targets

In addition, you can see the current Prometheus configuration under Status -> Configuration tab:

Prometheus Config

Finally, we can visualize RPC time series generated by our example app. To do this, go to the Graph tab where you can select the metrics to visualize.

Prometheus Visualize

In the image above, we visualized the rpc_durations_histogram_seconds_bucket  metrics. You can play around with other RPC metrics and native Prometheus metrics as well. The web interface also supports Prometheus query language PromQL to select and aggregate metrics you need. PromQL has a rich functional semantics that allows working with time series instance and range vectors, scalars, and strings. To learn more about PromQL, check out the official documentation.

Conclusion

That’s it! We’ve learned how to configure Prometheus to monitor applications serving Prometheus-format metrics.

Prometheus has a complex configuration language and settings, so we’ve just scratched the surface. Although Prometheus is a powerful tool, it might be challenging to configure and run it without a good knowledge of domain-specific language and configuration. To fill this gap, in the next tutorial we’ll look into configuring and managing your Prometheus instances with Prometheus Operator — a useful software management tool designed to simplify monitoring of your apps with Prometheus. Stayed tuned to our blog to find out more soon!