gcc December 23, 2010 at 08:21

Ustats module: backend request statistics

Greetings!

This article will discuss a new module for nginx, the purpose of which is to collect and provide the user with statistics on server access to backends. Under the cut - details, examples of use, screenshots, links, as well as the history of creation.

History

Not so long ago, the server support department of our company came to the conclusion that it was time to change something. More precisely, it was necessary to solve the problems with the distribution of the increased load - our fronts began to cope with their task with difficulty.

Using JMeter, we drove nginx, HAProxy, Brocade Server Iron ADX 1000 and a number of other balancers at the stand. The main selection criterion was the ability to terminate about 50 thousand simultaneous ssl sessions in peak periods. After long testing for various reasons, all options except nginx and its iron competitor Brocade Server have disappeared, and in the end only the first of them remains. Other things being equal, probably decisive factors in favor of nginx were the flexibility of its configuration and lightness.

Problem

Previously, we used HAProxy as a balancer on some fronts. After switching to nginx, it became clear that we lacked any informative statistics on working with backends in it. The fact is that the same HAProxy had such statistics, and with its help we tracked the problems that arise on the backends and quickly responded to them. With the new balancer, we ended up without these statistics, as without hands. stub_status and similar modules did not suit us, because their function is to show statistics not in the context of a separate upstream, but the server as a whole. For each upstream / backend, we wanted to have data on such parameters as the number of calls to each backend and the number of HTTP errors 499/500/503 and TCP , and later this list expanded.

Decision

Since we did not find ready-made solutions to our problem, an attempt was made to write a module that would provide the necessary information in a visual form. The attempt, it seems to me, was a success, and the result of the work was the ustats ( upstream statistics ) module .

What are the stats?

With ustats, you can keep statistics on backend metrics such as

The number of requests .
The number of errors is 499/500/503 .
The number of HTTP read and write timeouts .
The number of TCP connection errors .
Failure Timer (fail_timeout) . In nginx, this parameter is configured by the directive of the same name and determines the period of time during which several unsuccessful calls to the backend must occur in succession (the exact number is indicated by the max_fails directive), after which the backend is blacklisted and calls to it have not been made time fail_timeout. Usually, the administrator himself knows what timeouts are written in his server config, but still it seemed to us a good idea to have them in front of us.
The number of unsuccessful attempts to work with the backend (fails count). Inside nginx, for each backend, a counter of failed attempts is maintained. This number shows how many times during time fail timeout nginx tried to knock on the backend and failed (what to consider as failure, see the description of the proxy_pass directive). The principle of operation of the counter is quite simple. When nginx is about to redirect a request, it first looks at which backend is next in line (if it’s about balancing round robin), checks its status (blacklisted or not), and if the backend is “ignored”, the server looks at the time of its last failure . If the fail_timeout time has already passed since then, the counter of unsuccessful attempts for the backend is reset, and the request is sent. If the backend was not in the black list, the request is sent immediately, and the counter may be reset depending on the time,
The maximum number of failed calls (max_fails) . Defines the threshold for the number of failed attempts to work with the backend, upon reaching which the backend is blacklisted for the period of time fail_timeout. This parameter is also registered in the nginx config, and we added its display in the statistics for clarity.
The time of the last unsuccessful call to the backend. Its purpose should be clear from the previous paragraphs :)

Additional functions

Also, ustats can show which backends are currently in the black list. I note that the backend is understood not as what is specified in the nginx config by the server directive, but directly the address to which the name specified in the directive is resolved. If several addresses are listed for one name in DNS, the module displays them as separate backends (without forgetting to indicate what name they came from).

In addition to highlighting backends from the blacklist, ustats highlights off servers, i.e. described in nginx config as

...
server some.server.name down;
...

And finally, using the module, you can turn the server on and off from the nginx topology directly during its operation, through the web interface. Changes are not saved in the config and are designed to facilitate the implementation of those. works involving temporary disconnection of backends. I want to warn you: ustats does not provide any protection against unauthorized execution of this action, so you have to make sure that a random person doesn’t exclude half of the backends from your site from work :)

Use cases

There are two of them. Firstly, the module provides all the statistics in the form of a web page with a table, the display of which can be hung on any location, like stub_status:

location / ustats {
	ustats on;
	...
}

The page is automatically updated, and the update interval (in milliseconds) can be configured in the config:

...
ustats_refresh_interval 7000
...

The second scenario involves using the module as a data source for other monitoring utilities. In this case, appropriate requests are made to it, in response to which it returns an ordinary xml with the necessary information. You can request data for one upstream or for one backend. For example, request

/ ustats? u = offline

will return data for all backends in the upstream offline , and on request

/ ustats? u = break & b = the_mold

The data on the the_mold backend in the upstream break will be returned . If the upstream or backend is not found, the response xml will say this.

Some pictures

In order not to be unfounded, I will give a few screenshots of the result page of the module. There are 2 upstream settings in nginx from the first snapshot, in which all servers are local, raised on the same nginx, with the exception of the www server, it resolves to Yandex addresses:

In the picture, you can see the gray lines - these are the backends marked as “Down” in the nginx config, or disabled through the module page. So far, no numbers.

In the second screenshot, the red lines highlighted three backends from the first upstream, which were in the black list, which prevented requests from being sent to them.

Together with the fact that three more backends were turned off, the only remaining one took on the load.

Finally, the last shot was taken on the working frontend from our site:

The picture shows another feature that I have not mentioned yet. The upstream from the bottom is slightly brighter - this is a sign that they are implicitly defined in the config, i.e. not this way

upstream give_me_a_name {
  ...
}

and so

location / whereami {
...
proxy_pass http://192.168.0.75:8080
...
}

Total

We posted the source code of the module on the Google Code page. The repository contains the patch file for nginx, the source file + configuration file. There are installation instructions on the module page, additional configuration directives are also described there. The current version of the module was tested with nginx version 0.8.53 in Chrome, Firefox and Opera. Finally, I must say that ustats is just an attempt to add to nginx the most basic mechanism for displaying data on working with backends. In the future, I would like to see such useless modules in the main server branch as, for example, ahead of health check backends.

Tags: