The history of wandering documentation Haproxy, or what you should pay attention to when it is configured

    Hello again!

    The last time we talked about choosing a tool Ostrovok.ru for solving the problem of a large number of proxy requests to external services, one is not resting. The article ended with the choice of Haproxy . Today I will share the nuances that I had to face when using this solution.



    Haproxy configuration


    The first difficulty was that the option maxconnin Haproxy is different depending on the context:


    Out of habit, I set up only the first version ( performance tuning). This is what the documentation says about this option:
    Sets the number of concurrent connections to <number>. It
    is equivalent to the command-line argument "-n". Proxies will stop accepting
    connections when this limit is reached.

    It would seem - what you need. However, when I stumbled upon the fact that new connections to the proxy did not pass immediately, I began to read the documentation more carefully, and there I already found the second parameter ( bind options):
    Limits of concurrent connections. Extraneous
    connections will remain in the system backlog until a connection is
    released. If unspecified, maxconn.

    So, go, then look for frontends maxconn:
    The maximum number Fix of a concurrent connections on the frontend a
    ...
    By default, the this of value is set to 2000.

    Great, what you need. Add to the configuration:

    global
      daemon
      maxconn 524288
    ...
    defaults
      mode http
      maxconn 524288
    

    The next gag was that Haproxy is single threaded. I am very used to the model in Nginx, so this nuance has always depressed me. But you should not despair - Willy ( Willy Tarreau - the developer of Haproxy ) understood what he was doing, so he added an option - nbproc.

    However, right in the documentation says:
    USING MULTIPLE PROCESSES
    IS HARDER TO DEBUG AND IS REALLY DISCOURAGED.
    This option can really cause headaches if you need to:

    • limit the number of requests / connections to the servers (since you already have not one process with one counter, but many processes, and each has its own counter);
    • collect statistics from the Haproxy control socket;
    • enable / disable backends via a control socket;
    • ... maybe something else. ¯ \ _ (ツ) _ / ¯

    Nevertheless, the gods granted us multi-core processors, so I would like to use them to the maximum. In my case, there were four cores in two physical cores. For Haproxy, I selected the first core, and it looked like this:

      nbproc 4
      cpu-map 1 0
      cpu-map 2 1
      cpu-map 3 2
      cpu-map 4 3
    

    Using the cpu-map, we link the Haproxy processes to a specific kernel. The OS scheduler no longer needs to think about where to schedule Haproxy work, thereby keeping it content switchcold and cpu cache warm.

    There are many buffers, but not in our case.


    • tune.bufsize - in our case it was not necessary to start it, but if you have errors with the code 400 (Bad Request), then perhaps this is your case.
    • tune.http.cookielen - if you distribute large “cookies” to users, then in order to avoid their damage during transmission over the network, it may make sense to raise this buffer.
    • tune.http.maxhdr is another possible source of 400 response codes in case you have a lot of headers.

    Now consider the lower level stuff.


    tune.rcvbuf.client / tune.rcvbuf.server , tune.sndbuf.client / tune.sndbuf.server - the documentation says the following:
    The size of the available memory is set.

    But for me, the obvious is better than the implicit, so I screwed up the values ​​of these options to be confident in the future.

    And one more parameter, not related to buffers, but rather important - tune.maxaccept .
    Sets the number of consecutive connections
    to the work. In single process mode, higher numbers
    give better performance at high connection rates. However
    , it’s not important to keep the processes between processes underway and to
    improve performance.

    In our case, quite a lot of requests to the proxy are generated, so I raised this value to accept more requests at once. However, as stated in the documentation, it is worth testing that in multi-threaded mode the load is as evenly distributed as possible between processes.

    All parameters together:

      tune.bufsize 16384
      tune.http.cookielen 63
      tune.http.maxhdr 101
      tune.maxaccept 256
      tune.rcvbuf.client 33554432
      tune.rcvbuf.server 33554432
      tune.sndbuf.client 33554432
      tune.sndbuf.server 33554432
    

    What does not happen much is timeouts. What would we do without them?


    • timeout connect - time to establish a connection with the backend. If the connection with the backend is not very, then it is better to disable it by this timeout until the network returns to normal.
    • timeout client - timeout for transmission of the first bytes of data. Well helps to disconnect those who make requests "in reserve".

    Kulstori about HTTP client in Go
    Go has a regular HTTP client that has the ability to keep a pool of connections to servers. So there was one interesting story in which the above described timeout and connection pool in the HTTP client took part. Once a developer complained that he periodically has 408 errors from a proxy. We looked into the client code and saw the following logic there:

    • пытаемся из пула взять свободное установленное соединение;
    • если не вышло, запускаем в горутине установку нового соединения;
    • проверяем пул еще раз;
    • если в пуле нашлось свободное — берем его, а новое складываем в пул, если нет — используем новое.

    Уже поняли, в чем соль?

    Если клиент установил новое соединение, но не воспользовался им, то спустя пять секунд сервер его закрывает, и дело с концом. Клиент же отлавливает это только тогда, когда уже достает соединение из пула и пытается им воспользоваться. Стоит иметь это ввиду.

    • timeout server - the maximum time to wait for a response from the server.
    • timeout client-fin / timeout server-fin - here we are protected from semi-closed connections so as not to save them in the table of the operating system.
    • timeout http-request is one of the most useful timeouts. Allows you to chop off slow clients who can not issue an HTTP request in the time allotted to them.
    • timeout http-keep-alive - specifically in our case, if the keep-aliveconnection hangs without requests for more than 50 seconds, then, most likely, something went wrong, and the connection can be covered up, thus freeing up memory for something new, light.

    All timeouts together:

    defaults
      mode http
      maxconn 524288
      timeout connect 5s
      timeout client 10s
      timeout server 120s
      timeout client-fin 1s
      timeout server-fin 1s
      timeout http-request 10s
      timeout http-keep-alive 50s
    

    Logging Why so hard?


    As I wrote earlier, most of the time I use Nginx in my solutions, therefore I am spoiled by its syntax and ease of modifying the log formats. I especially liked the killer feature - to format the logs in the form of json, then to parsit them with any standard library.

    What do we have in Haproxy? This feature is also there, only you can write only in syslog, and the configuration syntax is a bit more wrapped up.
    Immediately give an example of the configuration with comments:

    # выносим все, что касается ошибок или событий, в отдельный лог (по аналогии с 
    # error.log в nginx)
    log 127.0.0.1:2514 len 8192 local1 notice emerg
    # здесь у нас что-то вроде access.log
    log 127.0.0.1:2514 len 8192 local7 info
    

    Special pain deliver such moments:
    • short variable names, and especially their combinations like% HU or% fp
    • The format can not be split into several lines, so you have to write a wrapper in one line. difficult to add / remove new / unnecessary items
    • for some variables to work, they need to be explicitly declared through the capture request header

    As a result, in order to get something interesting, you have to have such a footcloth:

    log-format '{"status":"%ST","bytes_read":"%B","bytes_uploaded":"%U","hostname":"%H","method":"%HM","request_uri":"%HU","handshake_time":"%Th","request_idle_time":"%Ti","request_time":"%TR","response_time":"%Tr","timestamp":"%Ts","client_ip":"%ci","client_port":"%cp","frontend_port":"%fp","http_request":"%r","ssl_ciphers":"%sslc","ssl_version":"%sslv","date_time":"%t","http_host":"%[capture.req.hdr(0)]","http_referer":"%[capture.req.hdr(1)]","http_user_agent":"%[capture.req.hdr(2)]"}'
    

    Well, it would seem, little things, but pleasant


    Above, I described the format of the log, but not everything is so simple. To deposit some items in it, such as:

    • http_host
    • http_referer,
    • http_user_agent,

    you must first capture this data from the request ( capture ) and place it in the array of captured values.

    Here is an example:

    capture request header Host len 32
    capture request header Referer len 128
    capture request header User-Agent len 128
    

    As a result, we can now access the elements we need in the following way:,
    %[capture.req.hdr(N)]where N is the sequence number of the definition of the capture group.
    In the above example, the Host header will be number 0, and the User-Agent will be number 2.

    Haproxy has a feature: it resolves the DNS addresses of the backends at startup and, if it cannot resolve any of the addresses, it falls to the death of the brave.

    In our case, this is not very convenient, since there are many backends, we do not manage them, and it is better to get 503 from Haproxy than the entire proxy server refuses to start because of one supplier. The following option helps us in this: init-addr .

    A line taken straight from the documentation allows us to go through all the available methods for resolving an address and, in the case of a file, simply put it off until later and go further:

    default-server init-addr last,libc,none

    And finally - my favorite: the choice of the backend.
    The syntax of the Haproxy backend selection configuration is familiar to everyone:

    use_backend <backend1_name> if <condition1>
    use_backend <backend2_name> if <condition2>
    default-backend <backend3>
    

    But the right word is somehow not very. I have already described all the backends in an automated way (see the previous article ), it could have been generated here use_backend, a bad business is not tricky, but I did not want to. As a result, there was another way:

      capture request header Host len 32
      capture request header Referer len 128
      capture request header User-Agent len 128
      # выставляем переменную host_present если запрос пришел с заголовком Host
      acl host_present hdr(host) -m len gt 0
      # вырезаем из заголовка префикс, который идентичен имени бэкенда
      use_backend %[req.hdr(host),lower,field(1,'.')] if host_present
      # а если с заголовками не срослось, то отдаем ошибку
      default_backend default
    backend default
      mode http
      server no_server 127.0.0.1:65535
    

    Thus, we have standardized the names of backends and URLs that can be accessed by them.

    Now compile from the above examples into one file:

    Full configuration version
      global
        daemon
        maxconn 524288
        nbproc 4
        cpu-map 1 0
        cpu-map 2 1
        cpu-map 3 2
        cpu-map 4 3
        tune.bufsize 16384
        tune.comp.maxlevel 1
        tune.http.cookielen 63
        tune.http.maxhdr 101
        tune.maxaccept 256
        tune.rcvbuf.client 33554432
        tune.rcvbuf.server 33554432
        tune.sndbuf.client 33554432
        tune.sndbuf.server 33554432
        stats socket /run/haproxy.sock mode 600 level admin
        log /dev/stdout local0 debug
      defaults
        mode http
        maxconn 524288
        timeout connect 5s
        timeout client 10s
        timeout server 120s
        timeout client-fin 1s
        timeout server-fin 1s
        timeout http-request 10s
        timeout http-keep-alive 50s
        default-server init-addr last,libc,none
        log 127.0.0.1:2514 len 8192 local1 notice emerg
        log 127.0.0.1:2514 len 8192 local7 info
        log-format '{"status":"%ST","bytes_read":"%B","bytes_uploaded":"%U","hostname":"%H","method":"%HM","request_uri":"%HU","handshake_time":"%Th","request_idle_time":"%Ti","request_time":"%TR","response_time":"%Tr","timestamp":"%Ts","client_ip":"%ci","client_port":"%cp","frontend_port":"%fp","http_request":"%r","ssl_ciphers":"%sslc","ssl_version":"%sslv","date_time":"%t","http_host":"%[capture.req.hdr(0)]","http_referer":"%[capture.req.hdr(1)]","http_user_agent":"%[capture.req.hdr(2)]"}'
      frontend http
        bind *:80
        http-request del-header X-Forwarded-For
        http-request del-header X-Forwarded-Port
        http-request del-header X-Forwarded-Proto
        capture request header Host len 32
        capture request header Referer len 128
        capture request header User-Agent len 128
        acl host_present hdr(host) -m len gt 0
        use_backend %[req.hdr(host),lower,field(1,'.')] if host_present
        default_backend default
      backend default
        mode http
        server no_server 127.0.0.1:65535
      resolvers dns
        hold valid 1s
        timeout retry 100ms
        nameserver dns1 127.0.0.1:53
      


    Thanks to those who read to the end. However, this is not all. The next time you look at more than the low-level stuff on the optimization of the system, which works HAProxy, to him and to our operating system was comfortable with, and iron enough for everyone.

    See you!

    Also popular now: