Kubernetes Logging With Fluentd and Fluent Bit
k8s logging fluentd fluent-bitIntroduction
Part of observability is logging. Logging is heavily used to debug and understand what is happening in systems.
The end goal is to have logging to Elasticsearch that is running in the cluster with fluent-bit to a fluentd.
Fluent-bit
Fluent-bit is going to be used to grab the logs from pods, check if they are json, if so parse them, and then forward them to fluentd. The reason for using both fluent-bit and fluentd is that we can aggregate the logs in fluentd. The aggregation allows us to buffer the logs and do more filtering. If we just used fluent-bit we would be sending the logs directly to Elasticsearch with many different instances of fluent-bit. It is harder to control the flow of logs to Elasticsearch when we have many nodes. Elasticsearch can be difficult to configure to get ingestion working just right. If we have more clients all sending logs then the ingest can be overloaded. Fluent-bit also do not have as many knobs to turn as fluentd.
A key feature is going to be having an index per namespace. The reason for this is we can control logs at a lower level, index per namespace, the index sizes will be smaller, and there is a limit of 1000 fields to be indexed. If all logs go to the same index then we will reach the 1000 field limit quickly. If we have an index per namespace we are much less likely to reach this limit.
Fluent-bit is a very capable program and can be used without aggregation to fluentd.
Below is almost the default configuration from the fluent-bit helm chart. There are a few changes though. There is a buffer in the filter of 1M and the output is just a forward to our fluentd instance. Fluent-bit is deployed as a daemonset.
This config grabs the kubernetes metadata that is useful when searching logs.
|
|
Side note below is an output config section for pushing logs directly to Elasticsearch. The only issue with this is that the Logstash_Prefix_Key is based on the kubernetes namespace. Setting up index templates to work with this is very difficult unless you have a prefix in your namespaces already. In Elasticsearch you glob indices with a prefix thus if every namespace does not have the same prefix then you either need to make a lot of patterns or lose out on automatically assigning the right settings to each index that is created.
|
|
Fluentd
Fluentd is used to aggregate the logs and perform a few more modifications to the logs. Some of the filters applied in fluentd could be applied in fluent-bit. The reason for applying them in fluentd is that we use fluent-bit for a simple log forwarder.
Fluentd is deployed as a statefulset so that it can have a persistent disk for buffering. If fluentd goes down then the buffer is still left on disk for when it comes back up.
|
|
Breaking down the configuration into smaller parts the first thing is the filters. We have two filters the first is the de_dot and the second is the rename_key.
De_dot is used to remove ‘.’ from the kubernetes metadata keys. It takes input like ‘kubernetes.labels.app.kubernetes.io/made.up.label’ and turns it into ‘kubernetes.labels.app.kubernetes.io/made_up_label’. This allows proper ingestion into Elasticsearch because Elasticsearch parses the dots as keys in a json dictionary. Without this Elasticsearch would nest ‘label’ under ‘up’ under ‘made’ which works until we have another key ‘kubernetes.labels.app.kubernetes.io/realappname’ which does not have dots which means it will not nest the same'
|
|
Rename_key is used to fix kubernetes labels that do not fit the pattern of the rest of the labels. This allows the labels to be ingested and not rejected becasue of mismatching fields again.
|
|
The output to elsaticsearch is using a datastream instead of the normal index. With indexes we need to bootstrap the index manually before we write logs to it. After we bootstrap the index we also need to write to the alias index so that the ILM rollover can happen behind the scenes. With a datastream we write to the same index and there is no need to bootstrap the index before because we are using index templates. This allows us to automatically add new indexes based on namespace. A datastream also allows for automatic index rollover behind the scenes.
The data_stream_name is set to ‘fluentd-k8s-’ so that we have a prefix that identifies where the index is being created from. The syntax on the end is part of fluentd that allows us to use any key from the buffer section that we key the buffer on.
The ‘data_stream_template_name’ is the index template that we have defined already in Elasticsearch.
We set the request_timeout to 30s to reduce the amount of timeouts. This number should be adjusted based on your environment and what your Elasticsearch cluster can handle.
Inside the buffer section we set the chunk_limit_size to 64MB because Elasticsearch has a default ‘http.max_content_length’ of 100MB. Increasing the ‘http.max_content_length’ for Elasticsearch is not recommended. Increasing it means that your cluster will have to handle larger payloads which depending on your cluster may degrade performance or may not be able to handle all together.
The ‘flush_thread_count’ and ‘flush_interval’ should also be tuned to what your cluster can handle. The ‘flush_thread_count’ determines how many threads are used to flush. In other words how many threads are used to send the http request to Elasticsearch. The ‘flush_interval’ is how often the flush to Elasticsearch will happen. You can increase these numbers to increase the throughput but the Elasticsearch cluster needs to be able to handle it otherwise there will be a lot of failing flushes.
|
|
#Elasticsearch
Preparing the Elasticsearch cluster doesn’t take much effort. The two things we need to make are the index templates and the ILM policy. This should be tuned to your cluster. There are other things to do like adding a Kibana indices if you want to search these indices from Kibana.
For the index template the index_patterns is set to match what our prefix is in the fluentd config. The ‘data_stream’ is empty but it means that we want to create a datastream when a new index is created that matches our pattern. The next thing of importance is setting the ‘lifecycle’ to ‘fluentd-k8s’ which is what our ILM policy is named.
Index Template
|
|
This ILM Policy is pretty standard and basic. This is something you should customize for your environment and requirements.
ILM Policy
|
|
Conclusion
The most difficult part is making sure your Elasticsearch cluster can handle the load. Even a small kubernetes cluster can produce a large amount of logs.
For now this solution works for reducing the amount of fields to be indexed at the cost of creating many more indices. These indices still need to be managed but with index templates the work gets easier.