Any system administrator in his daily activities has to deal with the collection and analysis of logs. The collected logs need to be stored - they can be used for a variety of purposes: for debugging programs, for analyzing incidents, as a help for technical support, etc. In addition, it is necessary to provide the ability to search the entire data array.
Organizing the collection and analysis of logs is not as simple as it might seem at first glance. To begin with, you have to aggregate the logs of different systems that may have nothing in common with each other. It is also highly advisable to attach the collected data to a single timeline in order to track the connections between events. Implementing a log search is a separate and complex problem.
Over the past few years, interesting software tools have appeared to solve the problems described above. Solutions that allow you to store and process logs online: Splunk , Loggly , Papertrail , Logentries and others are becoming increasingly popular . Among the undoubted advantages of these services should be called a convenient interface and low cost of use (and within the framework of basic free tariffs they provide very good opportunities). But when working with large numbers of logs, they often do not cope with the tasks assigned to them. In addition, their use to work with large amounts of information is often disadvantageous from a purely financial point of view.
A much more preferable option is to deploy an independent solution. We thought about this issue when we faced the need to collect and analyze cloud storage logs .
We started looking for a suitable solution and opted for Fluentd , a not- without -interest tool with a fairly wide functionality, about which there are almost no detailed publications in Russian. The features of Fluentd will be described in detail in this article.
Fluend was developed by Sadayuki Furuhashi, co-founder of Treasure Data (she is one of the sponsors of the project), in 2011. It is written in Ruby. Fluentd is actively developing and improving (see the repository on GitHub, where updates appear steadily every few days).
Among the users of Fluentd are such well-known companies as Nintendo, Amazon, Slideshare and others.
Fluentd collects logs from various sources and passes them to other applications for further processing. In a schematic form, the process of collecting and analyzing logs using Fluentd can be represented as follows:
The following are the main advantages of Fluentd:
Low system resource requirements. For normal operation, Fluentd is enough 30 - 40 MB of RAM; the processing speed is 13,000 events per second.
Using a unified logging format. Fluentd translates data from multiple sources into JSON format. This helps to solve the problem of collecting logs from various systems and opens up wide opportunities for integration with other software solutions.
Convenient architecture. The Fluentd architecture allows you to expand your existing feature set with the help of numerous plug-ins (to date, more than 300 of them have been created). Using plugins, you can connect new data sources and display data in various formats.
Ability to integrate with various programming languages. Fluentd can accept logs from applications in Python, Ruby, PHP, Perl, Node.JS, Java, Scala.
Fluentd is distributed free of charge under the Apache 2.0 license. The project is documented in sufficient detail; on the official website and in the blog published a lot of useful training materials.
In this article, we describe the installation procedure for Ubuntu 14.04. Installation instructions for other operating systems can be found here.
Installation and initial setup of Fluentd is carried out using a special script. Run the commands:
The principle of operation of Fluentd is as follows: it collects data from various sources, checks for compliance with certain criteria, and then forwards it to the specified locations for storage and further processing. All this can be
visually presented in the form of the following scheme: Fluentd settings (what data and where to get it, what criteria they should meet, where to forward it) are written in the configuration file /etc/td-agent/td-agent.conf, which is built from the following blocks:
source - contains information about the data source;
match - contains information about where to transfer the received data;
include - contains information about file types;
system - contains system settings.
Consider the structure and content of these blocks in more detail.
Source: where to get data
The source block contains information about where to get the data. Fluentd can receive data from various sources: these are application logs in various programming languages (Python, PHP, Ruby, Scala, Go, Perl, Java), database logs, logs from various hardware devices, data from monitoring utilities ... With a complete list of possible sources data can be found here. To connect sources, specialized plugins are used.
Standard plugins include http (used to receive HTTP messages) and forward (used to receive TCP packets). You can use both of these plugins at the same time.
# We accept events from port 24224 / tcp
As can be seen from the above example, the type of the plugin is specified in the type directive, and the port number in the port directive.
The number of data sources is unlimited. Each data source is described in a separate block.