Do not forget to increase the chance of a response to the client using a repeated request in L7 balancing

    Using nginx to balance HTTP traffic at the L7 level, it is possible to send a client request to the next application server if the target does not return a positive response. A test of the mechanism of passive verification of the health status of the application server showed the ambiguity of the documentation and the specificity of the algorithms for excluding the server from the pool of production servers.

    Summary of balancing HTTP traffic

    There are various ways to balance HTTP traffic. By the levels of the OSI model, there are balancing technologies at the network, transport and application levels. Combinations may be used depending on the size of the application .

    The technology of traffic balancing gives positive effects in the application and its maintenance. Here is some of them. Horizontal scaling of the application, in which the load is distributed among several nodes . Planned decommissioning of the application server by removing the flow of client requests from it. Implementation of the A / B testing strategy for the changed functionality of the application. Improving application fault tolerance by sending requests to well-functioning application servers.

    The last function is implemented in two modes. In passive mode, the balancer in client traffic evaluates the responses of the target application server and excludes it from the pool of production servers under certain conditions. In active mode, the balancer periodically independently sends requests to the application server at a given URI, and, for certain signs of a response, decides to exclude it from the pool of production servers. Subsequently, the balancer, under certain conditions, returns the application server to the pool of production servers.

    Passive verification of the application server and its exclusion from the pool of production servers

    Let's take a closer look at the passive application server check in the freeware edition of nginx / 1.17.0. Application servers are selected in turn by the Round Robin algorithm, their weights are the same.

    The three-step diagram shows a time section starting with sending a client request to application server No. 2. A bright indicator characterizes the requests / responses between the client and the balancer. Dark indicator - requests / responses between nginx and application servers.

    The third step of the diagram shows how the balancer redirects the client’s request to the next application server, in case the target server gave an error response or did not answer at all.

    The list of HTTP and TCP errors in which the server uses the following server is specified in the proxy_next_upstream directive .

    By default, nginx redirects only requests with idempotent HTTP methods to the next application server .

    What does the client get? On the one hand, the ability to redirect a request to the next application server increases the chances of providing a satisfactory response to the client when the target server fails. On the other hand, it is obvious that a sequential call first to the target server, and then to the next increases the total response time to the client.

    In the end, the application server response is returned to the client , where the proxy_next_upstream_tries allowable attempts counter ends .

    When using the redirection function to the next working server, you need to additionally harmonize the timeouts on the balancer and application servers. The upper limit of the time for a “travel” request between application servers and the balancer is the client timeout, or the wait time specified by the business. When calculating timeouts, it is also necessary to take into account the margin for network events (delays / losses during packet delivery). If the client each time ends the session by timeout while the balancer obtains a guaranteed answer, the good intention to make the application reliable will be futile.

    The passive health check of application servers is controlled by directives, for example, with the following options for their values:

    upstream backend {
        server app01:80 weight=1 max_fails=5 fail_timeout=100s;
        server app02:80 weight=1 max_fails=5 fail_timeout=100s;
    server {
        location / {
            proxy_pass			http://backend;
            proxy_next_upstream		timeout http_500;
            proxy_next_upstream_tries 	1;

    As of July 2 , 2019 , the documentation established that the max_fails parameter sets the number of unsuccessful attempts to work with the server that should occur within the time specified by the fail_timeout parameter .

    The fail_timeout parameter sets the time during which the specified number of unsuccessful attempts to work with the server must occur in order for the server to be considered unavailable; and the time during which the server will be considered unavailable.

    In the given example, part of the configuration file, the balancer is configured to catch 5 failed calls within 100 seconds.

    Returning the application server to the production server pool

    As follows from the documentation, the balancer after fail_timeout cannot consider the server to be inoperative. But, unfortunately, the documentation does not explicitly establish how the server performance is evaluated.

    Without an experiment, one can only assume that the mechanism for checking the state is similar to the previously described.

    Expectations and Reality

    In the presented configuration, the following behavior is expected from the balancer:

    1. Until the balancer excludes application server No. 2 from the pool of production servers, client requests will be sent to it.
    2. Requests returned with a 500 error from application server No. 2 will be forwarded to the next application server, and the client will receive positive responses.
    3. As soon as the balancer receives 5 responses with code 500 within 100 seconds, it will exclude application server No. 2 from the pool of production servers. All requests following a 100 second window will be immediately sent to the remaining working application servers without additional time.
    4. After 100 seconds, somehow, the balancer must evaluate the performance of the application server and return it to the pool of production servers.

    After conducting in-kind tests, according to the balancer's magazines, it was established that statement No. 3 does not work. The balancer excludes an idle server as soon as the condition on the max_fails parameter is fulfilled . Thus, a failed server is excluded from service without waiting for the elapse of 100 seconds. The fail_timeout parameter plays the role of only the upper limit of the error accumulation time.

    As part of assertion No. 4, it turns out that nginx checks the functionality of an application that was previously excluded from server maintenance with just one request. And if the server still responds with an error, then the next check will fail after fail_timeout .

    What is missing?

    1. The algorithm implemented in nginx / 1.17.0 may not be the most fair way of checking the server’s performance before returning it to the pool of production servers. At least, according to the current documentation, not 1 request is expected, but the amount specified in max_fails .
    2. The state check algorithm does not take into account the speed of requests. The larger it is, the stronger the spectrum with unsuccessful attempts shifts to the left, and the application server drops out of the working server pool too quickly. I suppose that this can adversely affect applications that allow themselves to produce errors “short in time clots”. For example, when collecting garbage.

    I wanted to ask you if there is any practical benefit from the server health check algorithm, which measures the speed of failed attempts?

    Also popular now: