Overview

Fluentd is an open source software that allows you to get events in many methods transform and ship them to various destinations and in a configurable manner. Once installed on a server, it runs in the background to collect, parse, transform and ship various types of data. td-agent is a stable distribution package of Fluentd, QAed by Treasure Data, and using it is recommended. If you are setting up your environment from scratch, we recommend on td-agent v4, which encapsulates fluentd v1 inside.

Learn how to install Fluentd in your environment.
To see a comparison between the versions, go here.

Capture syslog

Events that conform to the RFC3164\5424 standards can be captured using the syslog input plugin.

BSD vs. IETF

RFC3164 (BSD)

<priority>timestamp hostname application: message
CODE

In Fluentd, the application field is referred to as ident.

RFC5424 (IETF)

<priority>VERSION ISOTIMESTAMP HOSTNAME APPLICATION PID MESSAGEID STRUCTURED-DATA MSG
CODE

Transport layer

When you configure a syslog source, you choose a transfer protocol, either TCP or UDP. TCP is the recommended protocol, as it guarantees delivery of data in order and without any dropped log messages.

Consider using UDP in extreme cases where you have network and CPU utilization issues that need to be worked around combined with an extremely high volume of log messages.

support_colonless_ident

If your message does not contain the ident field, tune the syslog parser and set support_colonless_ident flag to false. This way you'll avoid a situation where the message's prefix is hijacked by this field.

Example

<source>
  @type syslog
  port 5140
  bind 0.0.0.0
  <transport tcp>
  </transport>
  <parse>
   @type syslog
   parser_type string
   support_colonless_ident false   
  </parse>
  tag MY-DATA-TYPE
</source>
CODE

Forward to S3

Output's format

Set this field carefully in order to achieve the desired seamless effect. A correct configuration will make the S3 object's lines look exactly as originally generated by the product.

If the syslog plugin parser is in use for the exact data type, pick up the single_value formatter. The syslog parser stores the entire message in the message field while the single_value formatter output solely the value of this field by default, as desired.

If the json plugin parser is in use, pick up the json formatter to reconstruct the original structure.

Authentication

S3 output plugin provides several credential methods for authentication and authorization, including: IAM user (i.e., access key and secret), IAM role and instance profile. All methods and their corresponding parameters are documented here.

Buffering

Many considerations should be taken when you come to fine tune your environment.

Full parameters list is documented here.

Compression

Store compressed files to reduce network traffic and save storage costs. Hunters detects and handles gzip files without requiring any special action.

Using the built-in gzip compressor suffers from a main drawback, which is blocking other jobs at the time compression takes places (due to Ruby's GIL). S3/Treasure Data plugin allows compression outside of the Fluentd process, using gzip. This frees up the Ruby interpreter while allowing Fluentd to process other tasks. Set store_as to gzip_command.

Example

<match MY-DATA-TYPE.**>
  @type s3
  store_as gzip_command
  path DATA-TYPE/%Y/%m/%d/
  aws_key_id xxxxx
  aws_sec_key xxxxx
  s3_bucket MY-BUCKET-NAME
  s3_region us-west-2
  slow_flush_log_threshold 40s
  s3_object_key_format %{path}%{time_slice}_%{chunk_id}.%{file_extension}
  <buffer time,host>
    @type file
    path /data/fluent/buffer/MY-DATA-TYPE    
    timekey 5m
    timekey_wait 0s
    chunk_limit_size 64MB
    total_limit_size 1024MB

    retry_timeout 1h    
    retry_type exponential_backoff
    retry_max_interval 30

    flush_mode interval
    flush_interval 60s
    flush_thread_count 2
    overflow_action drop_oldest_chunk
  </buffer>
  <format>
    @type single_value
  </format>
</match>
CODE

Local Fluentd Server Installation

Overview

This article is designed to help you deploy a local Fluentd server.

Note: The instructions provided are a recommendation and deemed a best practice. Hunters will not be able to provide support for any issues with your Fluentd server.

It is recommended to consult with a Fluentd consulting expert (see here for a list of recommended Enterprise consulting firms supporting Fluentd) .

There are many ways that you might choose to build and deploy this server. Some possible configurations include:

  • Single Fluentd server listening on a variety of network ports.

  • Single / Multiple Fluentd servers forwarding to other Fluentd servers.

  • Fluentd reading from local files and tailing them over time.

  • Fluentd integrated with syslog-ng or other syslog server.

  • And combinations of the above.

There are a number of ways that a capture solution built on Fluentd can be created, however this guide covers the first option above, a single Fluentd server listening on a variety of ports. This article assumes a few more points:

  1. Ubuntu 20.04 server will be used to host this box. Older versions of Ubuntu will work with very small changes to these steps. Other distributions of Linux could also be used, however the necessary modifications are not covered in this guide.

  2. Fluentd is CPU bound before anything else, which means that the server’s performance will be dictated by the number and type of CPUs only.

  • Ram is only marginally used as this guide instructs always buffering to disk and NOT ram.

  • The IOPS to the disk do not seem to influence performance.

  • All the data that is written to disk is gzipped by the configurations in this guide, hence lowering IO requirements and increasing CPU further.

VM Configuration Examples

  • When building Fluentd servers in AWS, it is generally to advised use an m5.Xlarge as a base machine.

  • This machine is 4 CPUs and 16GB of ram, and is capable of running comfortably at 25K EPS.

  • An m5.2xlarge (8 CPU / 32GB) is capable of running even beyond 50K EPS.

  • Keep some of these numbers in mind as you size your server.

Installation Instructions:

  1. Starting from a base install of Ubuntu Server 20.04 LTS. Connect to your new server with a privileged account. You will need to run commands as root, so it’s mandatory that you have that level of access.

  2. Upgrade the software on the server to make sure that the box is completely updated.

  3. sudo apt update
    sudo apt upgrade
    CODE
  4. if you would like to be able to install other Fluentd plugins (Prometheus, Azure Blob, GCP, etc) you should install some prerequisites to the server at this time.

  5. sudo apt install build-essential ruby ruby-dev
    CODE
  6. It is advised to also be able to run some debug tools like netstat

  7. sudo apt install net-tools
    CODE
  8. Next, you will want to make some changes to the underlying configuration of your Ubuntu server to increase a number of system limits. First edit the file /etc/security/limits.conf and add to the bottom of your limits.conf.

  9. root soft nofile 65536
    root hard nofile 65536
    * soft nofile 65536
    * hard nofile 65536
    CODE
  10. Next edit the file at /etc/sysctl.conf and make the following changes

  11. net.core.somaxconn = 1024
    net.core.netdev_max_backlog = 5000
    net.core.rmem_max = 16777216
    net.core.wmem_max = 16777216
    net.ipv4.tcp_wmem = 4096 12582912 16777216
    net.ipv4.tcp_rmem = 4096 12582912 16777216
    net.ipv4.tcp_max_syn_backlog = 8096
    net.ipv4.tcp_slow_start_after_idle = 0
    net.ipv4.tcp_tw_reuse = 1
    net.ipv4.ip_local_port_range = 10240 65535
    CODE
  12. Save both files after making these changes, and then reboot your server. Once it comes back up and you can login, you may continue.

  13. The next step is to install the td-agent software. td-agent is a version of the Fluentd software that is built and maintained by a company called Treasure Data, and is the version of Fluentd that we will be using in this walkthrough. At the time of this writing td-agent is at version 4.1.1. so that is the version that we will install. td-agent comes as a shell script that will add the proper software repository to your systems apt configuration and will then install the package. This is preferable as keeping td-agent up to date will now happen every time you update the server. To install td-agent on Ubuntu 20.04 (Focal)

  14. # td-agent 4
    curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-focal-td-agent4.sh | sh
    CODE
  15. If you are using an older version of Ubuntu, here are the installers. For Ubuntu 18.04 (Bionic Beaver)

  16. # td-agent 4
    curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-bionic-td-agent4.sh | sh
    CODE
  17. And for Ubuntu 16.04 (Xential) the installer is

  18. # td-agent 4
    curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-xenial-td-agent4.sh | sh
    CODE
  19. Now would be a great time to install any other needed Fluentd plugins. Below is how you would install the Prometheus plugin (to expose pipeline metrics) and the Azure plugin, if you need to write to Azure blob storage. The S3 plugin comes with td-agent so we don’t need to install that. A complete list of plugins is available here https://www.fluentd.org/plugins/all

  20. sudo gem install fluent-plugin-prometheus
    sudo gem install fluent-plugin-azurestorage
    CODE

21. If you are going to utilize the configurations below, you need to create and chown a directory. This directory is used for chunk storage, that is temporary storage of logs until it’s time to upload those files (based on the timekey setting).

sudo mkdir /var/spool/td-agent/
sudo chown td-agent:td-agent /var/spool/td-agent/
CODE

Congratulations! Your Fluentd Installation is now complete

Fluentd Configuration:

Before we dig into the configuration, keep the following paths in mind.

/etc/td-agent/td-agent.conf is the default configuration file.

/var/log/td-agent/ is the path where all log files are stored, and the main file is td-agent.log

/var/spool/td-agent/ you need to create this directory for chunk storage (chown to td-agent:td-agent).

You need to create a proper configuration in the td-agent.conf file so td-agent knows how to run, on what ports to listen, and where to upload received data. This guide suggests building out your td-agent file using a building block approach. Or simply think about your data sources as a pipeline. That would be the source → filter(s) → outputs. You only need to build a few source stanzas (one for TCP , one for UDP, and one for any unique sources), a single filter if you even need it and likely a single output (with some modifications for each pipeline).

Source Configuration:

An example source stanza might look like this. To learn more about the Fluentd syslog input, it is advised to read Fluentd’s documentation.

<source>
  @type syslog
  port 5000
  bind 0.0.0.0
  @log_level trace
  <parse>
    @type syslog
    support_colonless_ident false
    message_format rfc5424
  </parse>
  <transport tcp>
  </transport>
  tag default_syslog
</source>
CODE

In the above configuration there a few things to note.

Source

  • The “port 5000” and the “listen 0.0.0.0” will dictate on what ports fluentd is listening and on what interface(s).

  • 0.0.0.0 means all interface(s) in this example.

Remember: If you need to listen on ports below 1024 you will need to change your Fluentd configuration to run Fluentd as root. Instructions to run Fluentd as root are here.

Parse

  • The parse section of the example above, tells Fluentd how to parse each line that is received.

  • This is the section that seems to require the most customization in this guide’s configuration. This is due to the differences in the message format itself, even for tools that claim to support a specific syslog standard (RFC3164, and RFC5424) the actual bytes on the wire might change from the specification. When working on a parsing statement, always refer to the documentation.

Transport

  • The transport section is how you specify TCP or UDP connectivity will be used.

  • If you would like to use both, then create two identical <source> sections in your configuration and change the transport setting between them.

Tag

  • The last part, the tag is one of the most important settings, so this guide covers it in more depth.

  • Tags are used to determine routing within Fluentd for all events.

  • Each event will be tagged at ingestion time by the tag that is listed on that source.

  • In this case default_syslog, however for syslog that is not the only tag that will be applied.

  • Here is an example that shows an event that was captured by the source statement above:

2021-06-25T22:45:34+00:00       default_syslog.auth.info        {"host":"localhost","ident":"prg00000","pid":"1234","msgid":"-","extradata":"-","message":"seq: 0000000008, thread: 0000, runid: 1624661134, stamp: 2021-06-25T22:45:34 PADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPADDPAD"}
CODE
  • By looking at the tag, it’s visible that it is default_syslog.auth.info not default_syslog as you might expect.

  • What Fluentd actually does is to build the tag as tag_value.facility.severity and that requires a slight modification to the <filter> and <match> stanzas that we need to write.

  • Instead of writing your tag matches as default_syslog.* write them as default_syslog.**

Filter Configuration:

We will use filter stanzas to enable Prometheus metrics on the ingest pipelines that I create. This allows to monitor and understand the performance of your Fluentd servers. If you have no need for metrics, you can skip this step.

An example filter stanza might look like this:

<filter default_syslog.**>
  @type prometheus
  <metric>
    name default_syslog_input_num_records_total
    type counter
    desc The total number of records sent to the default_syslog collector.
    <labels>
      tag ${tag}
      hostname ${hostname}
    </labels>
  </metric>
  <metric>
    name fluentd_input_status_num_records_total
    type counter
    desc The total number of records sent to all inputs.
    <labels>
      tag ${tag}
      hostname ${hostname}
    </labels>
  </metric>
</filter>
CODE

In this example, we are creating two counters. One is default_syslog_input_num_records_total and the second is fluentd_input_status_num_records_total. For each line that is received we are incrementing both counters, this will give a counter with the number of events per pipeline and a total for all pipelines. A very good initial dashboard to view the metics generated here is this dashboard for Grafana

Match Configuration:

A simple local output configuration might look like this:

<match syslog.**>
  @type file
  path /var/spool/td-agent/default_syslog
  compress gzip
  <buffer time>
    @type file
    timekey 5m
    timekey_use_utc true
    timekey_wait 60s
    flush_at_shutdown true
  </buffer>
</match>
CODE

This configuration will take all data sent to the default_syslog input to gzipped files that will live in /var/spool/td-agent/default_syslog/. Those files will be written every five minutes (timekey value).

A more complex configuration with s3 support might look like.

<match syslog.**>
  @type s3
  aws_key_id YOUR_AWS_KEY_ID_HERE
  aws_sec_key YOUR_AWS_SECRET_KEY_HERE
  s3_bucket YOUR_S3_BUCKET_NAME_HERE
  s3_region YOUR_S3_REGION_HERE
  path syslog/
  <buffer time>
    @type file
    path /opt/s3
    timekey 5m
    timekey_use_utc true
    timekey_wait 60s
    chunk_limit_size 128m
    flush_at_shutdown true
  </buffer>
</match>
CODE

And a more complex configuration with S3 output and prometheus metrics might look like:

<match default_syslog.**>
  @type copy

  <store>
    @type s3
    aws_key_id YOUR_AWS_KEY_ID_HERE
    aws_sec_key YOUR_AWS_SECRET_KEY_HERE
    s3_bucket YOUR_S3_BUCKET_NAME_HERE
    s3_region YOUR_S3_REGION_HERE
    path syslog/
    <buffer time>
      @type file
      path /opt/s3
      timekey 5m
      timekey_use_utc true
      timekey_wait 60s
      chunk_limit_size 128m
      flush_at_shutdown true
    </buffer>
  </store>

  <store>
    @type prometheus
    <metric>
      name default_syslog_output_num_records_total
      type counter
      desc The total number of records sent to the default_syslog output destination.
      <labels>
        tag ${tag}
        hostname ${hostname}
      </labels>
    </metric>
    <metric>
      name fluentd_output_status_num_records_total
      type counter
      desc The total number of records sent to all outputs.
      <labels>
        tag ${tag}
        hostname ${hostname}
      </labels>
    </metric>
  </store>
</match>
CODE

Example Full Configuration:

# - Default Syslog Source Configurations - #
<source>
  @type syslog
  port 5000
  bind 0.0.0.0
  @log_level trace
  <parse>
    @type syslog
    support_colonless_ident false
    message_format rfc5424
    rfc5424_time_format %Y-%m-%dT%H:%M:%S
    with_priority true
  </parse>
  <transport tcp>
  </transport>
  tag default_syslog
</source>

# - This tracks the number of events into the default_syslog pipeline - #
<filter default_syslog.**>
  @type prometheus
  <metric>
    name default_syslog_input_num_records_total
    type counter
    desc The total number of records sent to the default_syslog collector.
    <labels>
      tag ${tag}
      hostname ${hostname}
    </labels>
  </metric>
  <metric>
    name fluentd_input_status_num_records_total
    type counter
    desc The total number of records sent to all inputs.
    <labels>
      tag ${tag}
      hostname ${hostname}
    </labels>
  </metric>
</filter>

# in order to track both metrics and send data out an output, it's necessary to use a copy.
<match default_syslog.**>
  @type copy

  <store>
    @type file
    path /var/spool/td-agent/default_syslog
    compress gzip
    <buffer time>
      @type file
      timekey 5m
      timekey_use_utc true
      timekey_wait 60s
      flush_at_shutdown true
    </buffer>
  </store>

  <store>
    @type prometheus
    <metric>
      name default_syslog_output_num_records_total
      type counter
      desc The total number of records sent to the default_syslog output destination.
      <labels>
        tag ${tag}
        hostname ${hostname}
      </labels>
    </metric>
    <metric>
      name fluentd_output_status_num_records_total
      type counter
      desc The total number of records sent to all outputs.
      <labels>
        tag ${tag}
        hostname ${hostname}
      </labels>
    </metric>
  </store>
</match>
# End Of File
CODE