Skip to content

Fluentd

Overview

Fluentd is an open source software that allows you to get events in many methods transform and ship them to various destinations and in a configurable manner. Once installed on a server, it runs in the background to collect, parse, transform and ship various types of data. td-agent is a stable distribution package of Fluentd, QAed by Treasure Data, and using it is recommended. If you are setting up your environment from scratch, we recommend on td-agent v4, which encapsulates fluentd v1 inside.

Learn how to install Fluentd in your environment.
To see a comparison between the versions, go here.

Capture syslog

Events that conform to the RFC3164\5424 standards can be captured using the syslog input plugin.

BSD vs. IETF

RFC3164 (BSD)

<priority>timestamp hostname application: message
In Fluentd, the application field is referred to as ident.

RFC5424 (IETF)

<priority>VERSION ISOTIMESTAMP HOSTNAME APPLICATION PID MESSAGEID STRUCTURED-DATA MSG

Transport layer

When you configure a syslog source, you choose a transfer protocol, either TCP or UDP. TCP is the recommended protocol, as it guarantees delivery of data in order and without any dropped log messages.

Consider using UDP in extreme cases where you have network and CPU utilization issues that need to be worked around combined with an extremely high volume of log messages.

support_colonless_ident

If your message does not contain the ident field, tune the syslog parser and set support_colonless_ident flag to false. This way you'll avoid a situation where the message's prefix is hijacked by this field.

Example

<source>
  @type syslog
  port 5140
  bind 0.0.0.0
  <transport tcp>
  </transport>
  <parse>
   @type syslog
   parser_type string
   support_colonless_ident false   
  </parse>
  tag MY-DATA-TYPE
</source>

Forward to S3

Output's format

Set this field carefully in order to achieve the desired seamless effect. A correct configuration will make the S3 object's lines look exactly as originally generated by the product.

If the syslog plugin parser is in use for the exact data type, pick up the single_value formatter. The syslog parser stores the entire message in the message field while the single_value formatter output solely the value of this field by default, as desired.

If the json plugin parser is in use, pick up the json formatter to reconstruct the original structure.

Authentication

S3 output plugin provides several credential methods for authentication and authorization, including: IAM user (i.e., access key and secret), IAM role and instance profile. All methods and their corresponding parameters are documented here.

Buffering

Many considerations should be taken when you come to fine tune your environment.

Full parameters list is documented here.

Compression

Store compressed files to reduce network traffic and save storage costs. Hunters detects and handles gzip files without requiring any special action.

Using the built-in gzip compressor suffers from a main drawback, which is blocking other jobs at the time compression takes places (due to Ruby's GIL). S3/Treasure Data plugin allows compression outside of the Fluentd process, using gzip. This frees up the Ruby interpreter while allowing Fluentd to process other tasks. Set store_as to gzip_command.

Example

<match MY-DATA-TYPE.**>
  @type s3
  store_as gzip_command
  path DATA-TYPE/%Y/%m/%d/
  aws_key_id xxxxx
  aws_sec_key xxxxx
  s3_bucket MY-BUCKET-NAME
  s3_region us-west-2
  slow_flush_log_threshold 40s
  s3_object_key_format %{path}%{time_slice}_%{chunk_id}.%{file_extension}
  <buffer time,host>
    @type file
    path /data/fluent/buffer/MY-DATA-TYPE    
    timekey 5m
    timekey_wait 0s
    chunk_limit_size 64MB
    total_limit_size 1024MB

    retry_timeout 1h    
    retry_type exponential_backoff
    retry_max_interval 30

    flush_mode interval
    flush_interval 60s
    flush_thread_count 2
    overflow_action drop_oldest_chunk
  </buffer>
  <format>
    @type single_value
  </format>
</match>