Using Fluentd for Open Source Unified Data Logging

Fluentd is an open source software under the umbrella of the Cloud Native Computing Foundation (CNCF) Graduated Hosted Projects. It unifies data collection across multiple sources and aggregates logs into structured JSON data for easy consumption. It’s plug-in capable architecture makes it compatible with a wide-range of applications and workflows, allowing it to parse and reformat a variety of data. Plug-ins use JSON-formatted data concepts, allowing programmers can adapt specific applications as inputs or outputs by modifying existing plug-ins and their configurations.

What is a Unified Logging Layer?

Fluentd takes diverse data input sources found in various application log types, parses this input, and renders a chosen uniform output stream. This data is then used by other applications and/or for uniform log archiving and further analysis. Fluentd uses directives that modify the flow to match expressions, control flow, and route flows.

The Fluentd uniform output stream can be sent to many different application destinations. These include inputs to NoSQL and SQL databases, archival applications, and monitoring console apps. Fluentd unifies input logs and messages then outputs them in a configured, stratified stream specified by Fluentd and its plug-in configuration.

Data outputs from Fluentd are handled similarly through administratively defined or standardized streams. These are set by program configuration, or a combination of Fluentd configuration and chosen plug-in options.

Fluentd is capable of handling many diverse data inputs and output destinations concurrently through the use of plug-ins. Input and output data stream at different speeds and event cycles.

Several instances of Fluentd can run in parallelizing schemes on different hosts for fault tolerance and continuity. Data sources external to the hosted Fluentd versions require network pathway considerations, including firewall, routing, pathway congestion, and encryption. Fluentd conversation configurations can support SSL/TLS encryption.

Fluentd Plug-Ins

Input and output plug-ins are required to parse data flows through Fluentd. They are categorized by their role, listed below:

Input
Parser
Filter
Output
Formatter
Service Discovery
Buffer
Metrics

Plug-ins use a naming convention associated with their role as an input or output plug-in. As an example, in_syslog is an input plug-in, using the in_ prefix.

The output plug-ins, prefixed with out_, have three different flushing and buffering modes:

Non-Buffered: The plug-in does not buffer data. It writes or outputs results immediately after processing.
Synchronous Buffered: The plug-in outputs data in chunks specified by the data value set in its configuration. When a datum is set, the plug-in sends chunks of data at a specified speed. This technique is used to prevent destination congestion.
Asynchronous Buffered: The plug-in stores data for later transmission.

Before You Begin

If you have not already done so, create a Linode account and Compute Instance. See our Getting Started with Linode and Creating a Compute Instance guides. This guide focuses on Ubuntu and Debian Linux as hosts for Fluentd, although adaptations of Fluentd can be found for Windows and macOS as well.
Follow our Setting Up and Securing a Compute Instance guide to update your system and create a limited user account.
Fluentd input and output are synchronized to a time source, and Fluentd recommends setting up a Network Time Protocol daemon prior to software installation. In cloud environments with many separated data sources, a single source of NTP synchronization is recommended. The NTP time becomes the basis for data stamping through the parsing stages that Fluentd performs.

Note

This guide is written for a non-root user. Commands that require elevated privileges are prefixed with sudo. If you’re not familiar with the sudo command, see the Users and Groups guide.

The commands, file contents, and other instructions provided throughout this guide may include placeholders. These are typically domain names, IP addresses, usernames, passwords, and other values that are unique to you. The table below identifies these placeholders and explains what to replace them with:

Placeholder:	Replace With:
`EXAMPLE_USER`	The username of the current user on your local machine.

Required Resources

Check the maximum number of file descriptors:
ulimit -n
```
1024
```
If the answer is the default of 1024, an adjustment must be made to the /etc/security/limits.conf file.
Open the /etc/security/limits.conf file using a text editor with root permissions:
sudo nano /etc/security/limits.conf
Add the following lines to the end of the file but replace EXAMPLE_USER with your actual username.
File: /etc/security/limits.conf
1 2
EXAMPLE_USER soft nofile 65536 EXAMPLE_USER hard nofile 65536
When done, press CTRL+X, followed by Y then Enter to save the file and exit nano.
To admit the larger values, reload the kernel by rebooting:
sudo reboot
When the system reboots, recheck the maximum number of file descriptors:
ulimit -n
```
65536
```

Installing Fluentd

Fluentd is deployed as a server application. There are two versions available: Fluentd and td-agent. Both versions behave identically, but there are differences. Fluentd is available as a Ruby gem or source code, while td-agent offers typical packages for Linux, macOS, and Windows. These examples use the td-agent installation.

First, launch the appropriate cURL command for your operating system and version. The command installs the app and dependencies for the chosen version.

curl -fsSL https://toolbelt.treasuredata.com/sh/install-ubuntu-jammy-td-agent4.sh | sh

curl -fsSL https://toolbelt.treasuredata.com/sh/install-ubuntu-focal-td-agent4.sh | sh

curl -fsSL https://toolbelt.treasuredata.com/sh/install-debian-bullseye-td-agent4.sh | sh

curl -fsSL https://toolbelt.treasuredata.com/sh/install-debian-buster-td-agent4.sh | sh

Installation completed. Happy Logging!

Once the version-appropriate shell script is successfully executed, check to see if the service is acitve (running):

sudo systemctl status td-agent.service

If active (running), the output should look like this:

● td-agent.service - td-agent: Fluentd based data collector for Treasure Data
     Loaded: loaded (/lib/systemd/system/td-agent.service; enabled; vendor pres>
     Active: active (running) since Mon 2023-08-21 16:48:13 UTC; 57s ago
       Docs: https://docs.treasuredata.com/display/public/PD/About+Treasure+Dat>
   Main PID: 2102 (fluentd)
      Tasks: 9 (limit: 4557)
     Memory: 96.5M
        CPU: 2.669s
     CGroup: /system.slice/td-agent.service
             ├─2102 /opt/td-agent/bin/ruby /opt/td-agent/bin/fluentd --log /var>
             └─2105 /opt/td-agent/bin/ruby -Eascii-8bit:ascii-8bit /opt/td-agen>

Aug 21 16:48:11 localhost systemd[1]: Starting td-agent: Fluentd based data col>
Aug 21 16:48:13 localhost systemd[1]: Started td-agent: Fluentd based data coll

If not, launch the daemon:

sudo systemctl start td-agent.service

In order to automatically start up when the system is rebooted, run the following command:
sudo systemctl enable td-agent.service

Testing Fluentd

Open the /etc/td-agent/td-agent.conf file in a text editor with root permissions:
sudo nano /etc/td-agent/td-agent.conf
Append the following configuration to the bottom of the file:
File: /etc/td-agent/td-agent.conf
1 2 3
<match our.test> @type stdout </match>
When done, press CTRL+X, followed by Y then Enter to save the file and exit nano.
Restart td-agent for the appendage to take effect:
sudo systemctl restart td-agent

Once the daemon starts, test it using cURL and the REST API:

curl -X POST -d 'json={"json":"I’m Alive!"}' http://localhost:8888/our.test

Use the following command to view the result of the test:
tail -n 1 /var/log/td-agent/td-agent.log
It should answer with a time stamp and the “I’m Alive!” message:
```
2023-08-18 17:02:57.005253503 +0000 our.test: {"json":"I’m Alive!"}
```

Syslog Application Example

Ubuntu 20.04 LTS and 22.04 LTS Compute Instances have the remote syslog known as rsyslog pre-installed and it is used in this example. In this example, rsyslog.conf is modified to send log entries to the same port as the Fluentd tg-agent is set to listen.

Log in to the system once it boots up.
Open rsyslog.conf in a text editor with root permissions:
sudo nano /etc/rsyslog.conf
Append the following line to the bottom of the file:
File: /etc/rsyslog.conf
1
*.* @127.0.0.1:5440
The above configuration line tells rsyslog to send syslog data to port 5440 of the local host.
When done, press CTRL+X, followed by Y then Enter to save the file and exit nano.
After the file is saved, restart the rsyslog service:
sudo systemctl restart syslog
Fluentd typically listens for messages through its plug-ins, however, in this example, the raw syslog messages are monitored, unfiltered, and unmodified. The td-agent file must be modified to make Fluentd listen for syslog-formatted data. Continue the above example of an input source as syslog at port 5440.
Open td-agent.conf in a text editor with root permissions:
sudo nano /etc/td-agent/td-agent.conf

Append the following lines to the bottom of the file:

File: /etc/td-agent/td-agent.conf

1
2
3
4
5
6
7
8
9
<source>
   @type syslog
   port 5440
   tag system
</source>

<match system.**>
  @type stdout
</match>

When done, press CTRL+X, followed by Y then Enter to save the file and exit nano.
Restart td-agent for the appendage to take effect:
sudo systemctl restart td-agent

Rsyslog now outputs to the port where td-agent listens. Use the following command to view proof of the chain:

tail -n 1 /var/log/td-agent/td-agent.log

Entries from syslog are found in the td-agent.log:

2023-08-21 17:26:09.000000000 +0000 system.auth.info: {"host":"localhost","ident":"sshd","pid":"4304","message":"Connection closed by 37.129.207.106 port 42964 [preauth]"}
2023-08-21 17:26:13.000000000 +0000 system.auth.info: {"host":"localhost","ident":"sshd","pid":"4310","message":"Connection closed by 5.218.67.72 port 45500 [preauth]"}
2023-08-21 17:26:13.000000000 +0000 system.auth.info: {"host":"localhost","ident":"sshd","pid":"4308","message":"Connection closed by 83.121.149.248 port 36697 [preauth]"}
2023-08-21 17:26:19.000000000 +0000 system.auth.info: {"host":"localhost","ident":"sshd","pid":"4315","message":"Connection closed by 80.191.23.250 port 39788 [preauth]"}
2023-08-21 17:26:20.000000000 +0000 system.auth.info: {"host":"localhost","ident":"sshd","pid":"4313","message":"Connection closed by 87.248.129.189 port 51192 [preauth]"}
2023-08-21 17:26:24.000000000 +0000 system.auth.info: {"host":"localhost","ident":"sshd","pid":"4318","message":"Connection closed by 91.251.66.145 port 38470 [preauth]"}
2023-08-21 17:26:25.000000000 +0000 system.auth.info: {"host":"localhost","ident":"sshd","pid":"4320","message":"Connection closed by 37.129.101.243 port 39424 [preauth]"}
2023-08-21 17:26:26.000000000 +0000 system.auth.info: {"host":"localhost","ident":"sshd","pid":"4322","message":"Connection closed by 151.246.203.48 port 11351 [preauth]"}
2023-08-21 17:26:29.000000000 +0000 system.auth.info: {"host":"localhost","ident":"sshd","pid":"4325","message":"Connection closed by 204.18.110.253 port 43478 [preauth]"}
2023-08-21 17:26:31.000000000 +0000 system.auth.info: {"host":"localhost","ident":"sshd","pid":"4327","message":"Connection closed by 5.214.204.211 port 39830 [preauth]"}

Rsyslog has an input through the unified layer of Fluentd to the log of the td-agent. This is an unfiltered output that can be sent by an output plug-in to a desired archiving program, SIEM input, or other destination.

Common log sources such as syslog can have highly tailored processing with Fluentd controls applied.

Fluentd Directives

In the example of an rsyslog input shown above, there is no filtration of the information. Fluentd uses a configuration file directive to manipulate data inputs. The Fluentd directives are:

Source: determines input sources
Match: parses for regular expression matches
Filter: determines the event directive pipeline
System: sets system-wide configuration
Label: groups the output and filters for internal routing of data
Worker: directives limit to the specific workers as an object
@Include: sources other files for inclusion

Behavior is controlled by the type of plug-in(s), how records are matched (accepted, rejected based upon regular expression match), filtered, tagged, and used by workers, system directives, and other behavior specified by @include files.

Conclusion

Fluentd is highly customizable via its configuration as well as the configuration of the input and output plug-ins used. The unified logging layer represented by Fluentd processing becomes the input for many application destinations. These destinations are often archives, databases, SIEM, management consoles, and other log-processing apps. Fluentd is a unified logging layer application whose scope is modified by the customization of chosen plug-ins. Multiple instances of Fluentd can be configured for fault tolerance.

You should now have a basic understanding of Fluentd, along with some simple hands-on experience from the examples.

More Information

You may wish to consult the following resources for additional information on this topic. While these are provided in the hope that they will be useful, please note that we cannot vouch for the accuracy or timeliness of externally hosted materials.

Fluentd

Compute

Storage

Databases

Networking

Developer Tools

Delivery

Security

Services

Industries

Pricing

Community

Engage With Us

Search Results

No Results

Filters