ALL THINGS KUBERNETES

Exporting Kubernetes Logs to Elasticsearch Using Fluent Bit

In a previous tutorial, we discussed how to create a cluster-level logging pipeline using Fluentd log aggregator. As you learned, Fluentd is a powerful log aggregator that supports log collection from multiple sources and sending them to multiple outputs.

In this article, we’ll continue the overview of available logging solutions for Kubernetes focusing on the Fluent Bit. This is another component of the Fluentd project ecosystem made and sponsored by Treasure Data. As we’ll show in this article, Fluent Bit is an excellent alternative to Fluentd if your environment has a limited CPU and RAM capacity. This is because Fluent Bit is a very lightweight and performant log shipper and forwarder.

However, these benefits are associated with the trade-off of fewer input and output plugins supported. Therefore, you should still consider using Fluentd as a full log aggregator solution while using Fluent Bit as a log forwarder. This approach is common in other systems like ELK (Elasticsearch-Logstash-Kibana) stack, for example. In the Elasticsearch ecosystem, Logstash is used as a general-purpose log aggregator, while various components of the Beats family (e.g., Metricbeat or Filebeat) are used as lightweight log forwarders.

Fluentd and Fluent Bit

As we have mentioned, both Fluentd and Fluent Bit focus on collecting, processing, and delivering logs. However, there are some major differences between the two projects that make them suitable for different tasks.

  • Fluentd combines log collection and processing with log aggregation. Fluentd was designed to aggregate logs from multiple inputs, process them, and route to different outputs. Its engine has very performant queue processing threads that enable fast consumption and routing of big batches of logs. In addition, Fluentd has a rich ecosystem of input and output plugins (over 650), which makes it an excellent solution for log aggregation.
  • Fluent Bit is great for log collection, processing, and forwarding — but not for log aggregation. The shipper was designed for running in the highly distributed compute environments where limited capacity and reduced overhead (memory and CPU) are a huge concern. That is why it is very lightweight (~450 KB) and performant (see the image below). The trade-off is that Fluent Bit has support for just 35 input and output plugins.

 

Fluent Bit vs. Fluentd

Source: Fluent Bit documentation

This does not mean, however, that we cannot use Fluent Bit to directly ship logs to output destinations. Fluent Bit has great support for many common inputs such as syslog, TCP, systemd, disk, CPU and can also send logs to a number of popular outputs such as Elasticsearch, Kafka REST Proxy, and InfluxDB directly.

However, the best choice you can make is to use Fluentd as a Log Aggregator and Fluent Bit as a Log Forwarder. For example, a typical logging pipeline design for Fluentd and Fluent Bit in Kubernetes could be as follows. We could deploy Fluent Bit on each node using a DaemonSet. Each node-level Fluent Bit agent would collect logs and forward them to a single Fluentd instance deployed per cluster and working as a log aggregator — processing logs and routing them to a variety of output destinations.

In this tutorial, we’ll discuss the simplest possible setup — Fluent Bit working as a direct logging pipeline sending logs to Elasticsearch. This is enough to demonstrate how Fluent Bit works.

But before we begin, let’s describe a basic Fluent Bit workflow to give you a basic intuition of the Fluent Bit architecture.

 

Fluent Bit Workflow

Source: Fluent Bit Documentation

The first step of the workflow is taking logs from some input source (e.g., stdout, file, web server). By default, the ingested log data will reside in the Fluent Bit memory until it is routed to some output destination.

Before the logs are routed, they can be also optionally processed using parsers and filters. For example, you can use built-in parsers to convert unstructured data retrieved from the input source into a structured log message. Additionally, you can use various filter plugins to alter the data ingested by the input plugins.

The routing process begins after parsing and filtering are completed. The routing engine matches tags of logs with specific output destinations. The process of sending logs to the output destinations (e.g., Elasticsearch) is in its turn handled by various output plugins. This is very similar to how Logstash or Filebeat work.

Now that you have a basic understanding of the Fluent Bit architecture, we’ll walk you through a process of deploying and configuring Fluent Bit to ship Kubernetes logs to Elasticsearch. We are going to send logs generated by various applications running on Kubernetes to Elasticsearch cluster deployed on Kubernetes.

Tutorial

To complete the tutorial, you’ll need the following prerequisites:

  • A running Kubernetes cluster. See Supergiant documentation for more information about deploying a Kubernetes cluster with Supergiant. As an alternative, you can install a single-node Kubernetes cluster on a local system using Minikube.
  • A kubectl command line tool installed and configured to communicate with the cluster. See how to install kubectl here.
  • A running Elasticsearch deployed in your Kubernetes cluster. For the simplest way to deploy Elasticsearch in Kubernetes, you can consult this article.

Step 1: Create RBAC for the Fluent Bit

First, let’s isolate our future Fluent Bit deployment from the rest of the cluster by creating a new namespace.

To collect logs from Kubernetes applications and cluster components, we need to provide identity to Fluent Bit and grant it some permissions. For the former we will create a new Service Account in the fluentbit-test  namespace where the Fluent Bit will be deployed:

Fluent Bit needs permissions to get, list, and watch namespaces and Pods in your Kubernetes cluster. These can be granted using the ClusterRole  manifest as in the example below:

Finally, we need to bind the Fluent Bit ServiceAccount to the ClusterRole  using the ClusterRoleBinding  resource.

Note: please don’t forget to specify the correct namespace for the ClusterRoleBinding .

Let’s save these manifests in the rbac.yml  separating them by the ---  delimiter and create all resources in bulk:

Step 2: Create a ConfigMap

We need to configure Fluent Bit before deploying it as a DaemonSet . Fluent Bit has a unique configuration syntax different from Fluentd. For more information about configuring Fluent Bit, please, consult the official documentation.

The Fluent Bit configuration file consists of sections and key-value entries inside those sections. Each section is defined by a name or a title placed inside brackets.

For example, above we defined a SERVICE section holding two entries: HTTP_LIsten  and HTTP_Port .

Each entry is defined by a line of text that contains a Key and a Value. Referring to the example above, the [SERVICE] section contains two entries: one is the key HTTP_Listen  with the value 0.0.0.0 , and the other is the key HTTP_PORT  with the value 2020 . As you see, entries have the indentation level of four spaces (this is the ideal indentation for Fluent Bit configuration).

Fluent Bit supports four types of sections:

We’ll explain these types in a minute using the example ConfigMap  for our Fluent Bit DaemonSet . Take a look at the example configuration:

The Service section above defines the global properties of the Fluent Bit service. The configuration for this section is stored in the main fluent-bit.conf  file. We use @INCLUDE  directive to include the configuration of inputs, filters, and outputs in this main file.

There are several parameters of the Service section worth of your attention:

  • Flush — specifies how often (in seconds) the Fluent Bit engine flushes log records to the output plugin.
  • Daemon — is a Boolean value that allows running Fluent Bit instance as a background process (Daemon)
  • Log_Level — sets the logging verbosity level. The allowed values are error, info, debug, and trace.
  • HTTP_Server — tells Fluent Bit to use a built-in HTTP Server.
  • HTTP_Listen — sets a listening interface for HTTP Server if it’s enabled (default is  0.0.0.0 )
  • HTTP_Port — sets a TCP Port for the HTTP Server

After configuring global settings, we need to include some inputs — sources where Fluent Bit watches for logs. These are defined in the INPUT section.

In the configuration above, we use the tail  input plugin that allows monitoring one or several text files. The plugin has a functionality similar to the tail -f  shell command. It reads every matched file in the path pattern and for every new line found generates a new record.

Fluent Bit has built-in support for other input sources such as:

  • cpu — measures total CPU usage of the system.
  • disk — measures Disk I/Os.
  • exec — executes external programs and collects event logs.
  • forward — Fluentd forward protocol. This plugin can be used to forward logs collected by Fluent Bit to Fluentd for aggregation.
  • head — reads first part of files.
  • proc — checks health of processes.
  • syslog — reads syslog messages from a Unix socket.

For a full list of supported plugins, consult the official documentation here.

Each input section has general and plugin-specific configuration options. Let’s discuss those specified in the Input configuration above:

  • Tag — you can associate all records coming from the input with a specific tag. For example, we associated all records with the kube.*  pattern.
  • Path — tail plugin requires a path or path pattern to a log file/s. Our Fluent Bit instance will be watching for all container log files that are stored under /var/log/containers/*.log  in Kubernetes.
  • Parser — specifies the name of a parser to interpret the entry as a structured message (e.g., Docker).
  • DB — the database file to keep track of the Fluent Bit position in the monitored files.
  • Mem_Buf_Limit — defines a memory limit the tail plugin can use before the records are flushed to the output. If the limit is reached, the tail plugin will pause collecting the records until they are flushed.
  • Refresh_Interval — the interval for refreshing a list of watched files. The default value is 60 seconds.

In the Filter section, we define the filter plugin(s) to process the collected logs. Since we are working with Kubernetes apps, we are using the Kubernetes filter plugin.

Kubernetes filter performs the following operations:

  • Analyzes the data and extracts the metadata such as Pod name, namespace, container name, and container ID (this is quite similar to what Fluentd does).
  • Queries Kubernetes API server to get extra metadata for the given Pod including the Pod ID, labels, and annotations. This metadata is then appended to each record (log message).

This data is cached locally in memory and is appended to each log record. The following parameters represent a minimum configuration for this filter used in the ConfigMap above:

  • Name — the name of the filter plugin.
  • Kube_URL — API Server end-point. E.g https://kubernetes.default.svc.cluster.local/
  • Match — a tag to match filtering against.

The next crucial part of the Fluent Bit configuration is specifying the output plugin. The output plugin defines the destination to which Fluent Bit should flush the logs it collects from the input. Each output plugin has its specific configuration options. The Elasticsearch output we use has the following options:

  • Host — Elasticsearch host.
  • Port — Elasticsearch port.
  • HTTP_User — Elasticsearch username if your cluster has authentication.
  • HTTP_Passwd — A password for user defined in HTTP_User
  • Logstash_Format — Enable Logstash format compatibility. This option takes a boolean value: True/False, On/Off

Along with inputs, filters, and outputs, we define seven parsers for common applications logs and input types. As we’ve mentioned, parsers allow transforming unstructured log data into structured form making them easier to process and filter.

Fluent Bit parsers can process log entries based on two types of formats: JSON Maps and Regular Expressions. All parsers must be defined in a parsers.conf  file. By default, Fluent Bit ships with the pre-configured parsers for:

  • Apache
  • Nginx
  • Docker
  • Syslog rfc5424
  • Syslog rfc3164

Let’s look at one of the parsers defined above to understand available configuration options:

This configuration defines a Nginx parser. The most important configuration entries are the following:

  • Format — the format of the parser. We use the regex format for Nginx parser.
  • Regex — the Ruby Regular Expression used to parse and compose the structured message. We can use built-in Fluent Bit regex variables like <remote>, <host>, <time>, <method> 
  • Time_Key — If a log entry includes a field with a timestamp, you can use this option to specify the name of this field.
  • Time_Format — Select the format of the time field so it can be properly recognized and analyzed. Fluent Bit uses strptime(3)  to parse time so you can refer to strptime documentation for available modifiers.

Great! Now that you understand key configuration options, let’s create a ConfigMap .

Step 3: Deploy Fluent Bit on Minikube

Now, we are ready to create a Fluent Bit DeamonSet using this ConfigMap. Take a look at the DaemonSet manifest we use:

Before deploying this DaemonSet, please, don’t forget to specify the environmental variables values for your Elasticsearch host, port, and any credentials if needed. In the example above, we use the DNS record of the Elasticsearch service as the value for Elasticsearch host and Elasticsearch default port 9200. These values will be mapped to placeholders we used in the ConfigMap .

Let’s save this manifest in the fluentbit-deploy.yml  and create the DaemonSet  with the following command.

Let’s now check the Fluent Bit logs to verify that everything has worked out correctly.

First, find the Fluentbit Pod in the “fluentbit-test” namespace.

Then, run kubectl logs  with the name of the Fluent Bit Pod:

These logs indicate that the Fluent Bit was successfully started and tail plugin began adding specified paths to its queue.

Let’s check if the logs were actually shipped to Elasticsearch as we expect. Assuming that you have Elasticsearch exposed as a Service, run

to retrieve the IP and port assigned to Elasticsearch.

Then, you can use the IP and port to cURL various Elasticsearch endpoints. For example, to check if the Elasticsearch is running:

Now, let’s check if the Fluent Bit has sent any logs to Elasticsearch. First, find the available indices:

As you see, Fluent Bit has added the .monitoring-es-6-2019.01.08 index that contains Elasticsearch logs from the Elasticsearch instance we deployed. Let’s get some documents from this index:

Awesome! As you see, Fluent Bit has been successful in tailing Elasticsearch log files and sending generated logs to the Elasticsearch index.

Conclusion

That’s it! You’ve learned how to deploy Fluent Bit to a Kubernetes cluster and how to ship Kubernetes applications and components logs to Elasticsearch. Fluent Bit is a lightweight and performant log shipper that has a functionality similar to Fluentd. However, because it supports less input and output plugins than Fluentd and is less powerful in accumulating logs from multiple locations, it’s not as good for log aggregation than Fluentd.

In upcoming tutorials, we’ll discuss how to combine both Fluentd and Fluent Bit to create a centralized logging pipeline for your Kubernetes cluster. Stay tuned to the Supergiant blog to learn more!