Fast web application - web trepanation

    Psychology is an interesting and sometimes useful science. Numerous studies show that a delay in displaying a web page for more than 300 ms makes the user distract from the web resource and think: “what the hell?”. Therefore, by ACCELERATING a web project to psychologically unperceptible values, you can simply keep users longer. And that is why the business is ready to spend money on speed: $ 80M - to reduce latency by only 1 ms .



    However, in order to accelerate a modern web project, you will have to let the kroushka down and thoroughly dig into this topic - therefore, basic knowledge of network protocols is welcome. Knowing the principles, you can effortlessly speed up your web system by hundreds of milliseconds in just a few approaches. Well, are you ready to save hundreds of millions? Pour in coffee.

    Aftertaste


    This is a very hot topic - how to satisfy the user of the site - and the usabilityists will most likely make me drink a Molotov cocktail, bite a grenade without a check and have time to shout out before the explosion: "I'm carrying heresy." Therefore, I want to go from the other side. It is widely known that the delay in displaying the page is more than 0.3 seconds - it makes the user notice this and ... "wake up" from the process of communicating with the site. And the delay in the display is more than a second - to think about the topic: “what am I doing here at all? Why are they tormenting me and making me wait? ”

    Therefore, we will give usability to their “usability”, and we will deal with a practical problem - how not to disturb the "dream" of the client and facilitate his work with the site as long as possible, without distractions to the "brakes".

    Who is responsible for speed


    Well, who, of course you are. Otherwise, you would hardly have started reading the post. Seriously, then there is a problem - for the speed issue is divided into 2 loosely connected both technologically and socially parts: the front-end and the back-end. And often they forget about the third key component - the network.

    Just HTML


    To begin with, remember that the first sites in the early nineties were ... a set of static pages. And HTML was simple, straightforward and concise: first text, then text and resources. Bacchanalia began with the dynamic generation of web pages and the spread of java, perl, and nowadays it is already a galaxy of technologies, including php.
    To reduce the impact of this race on network viability, they are following HTTP / 1.0 in 1996, and after 3 years - HTTP / 1.1 in 1999. The latter finally agreed that it’s not necessary to drive TCP handshakes at ~ 2/3 of the speed of light (in optical fiber) there-ships for each request, establishing a new connection, or rather open the TCP connection once and work through it.

    Backend


    application

    Little has changed here over the past 40 years. Well, maybe a “parody” of relational theory was added under the name NoSQL - which gives both pros and cons. Although, as practice shows, it seems to be more beneficial to business from it (but sleepless nights with the answer to the question: “who deprived of integrity data and under what pretext” became more likely).
    1. Application and / or web server (php, java, perl, python, ruby, etc.) - accepts a client request
    2. The application accesses the database and receives data
    3. The application generates html
    4. Application and / or web server - sends data to the client

    Everything is clear here in terms of speed:
    • optimal application code, without loops for seconds
    • optimal data in the database, indexing, denormalization
    • caching database samples


    We’ll no longer talk about overclocking the “application” - a lot of books and articles have been written about it and everything is pretty linear and simple.
    The main thing is that the application be transparent and you can measure the speed of the request through various components of the application. If this is not, then you can not read further, it will not help.
    How to achieve this? The paths are known:
    • Standard request logging (nginx, apache, php-fpm)
    • Logging slow database queries (option in mysql)
    • Tools for fixing bottlenecks when passing a request. For php, this is xhprof, pinba.
    • Built-in tools within a web application, such as a separate trace module .

    If you have a lot of logs and you get confused in them - aggregate data, see percentiles and distribution. Simple and straightforward. Found a request for more than 0.3 seconds - start a debriefing and so on until the end.

    Web server

    We move out. Web server. Here, little has changed much, but crutch can just - through installing a reverse proxy web server in front of the web server (fascgi server). Yes, it certainly helps:
    1. to keep significantly more open connections with clients (at the expense of? .. yes, another architecture of the caching proxy - for nginx this is the use of socket multiplexing by a small number of processes and low memory for one connection)
    2. more efficiently give static resources directly from disks without filtering through application code

    But the question remained - why didn’t they immediately begin to do apache “correctly” and sometimes we have to put web servers in a train.

    Permanent Connections

    Establishing a TCP connection takes 1 RTT. Print a chart and hang in front of you. The key to understanding the appearance of brakes is here.

    This value is pretty closely correlated with the location of your user relative to the web server (yes, there is the speed of light, there is the speed of light propagation in the material, there is routing) and it can take (especially considering the last mile provider) tens or hundreds of milliseconds, which of course is a lot . And the trouble is, if this connection is established for each request, which was common in HTTP / 1.0.



    For the sake of this, by and large, HTTP 1.1 was conceived and in this direction HTTP 2.0 is developing (represented by spdy) The IETF with Google is currently trying to do its best to get the most out of the current network architecture - without breaking it. And this can be done ... well, yes, using TCP connections as efficiently as possible, using their bandwidth as densely as possible through multiplexing , recovery from packet loss, etc.



    Therefore, be sure to check the use of persistent connections on web servers and in the application.

    TLS

    Without TLS , which originally originated in the bowels of Netscape Communications as SSL, in the modern world there is nowhere. And although, they say, the last "hole" in this protocol made many turn gray much earlier than the deadline - there is practically no alternative.
    But not everyone for some reason remembers that TLS worsens the “aftertaste” - adding 1-2 RTT in addition to 1 RTT connection via TCP. In nginx, the default TLS session cache is off by default - which adds extra RTT.



    Therefore, make sure that TLS sessions are necessarily cached - and this will save another 1 RTT (and one RTT will still remain, unfortunately, as a security fee).

    That's about the backend, probably. Further it will be more difficult, but more interesting.

    Network. Distance and network bandwidth


    You can often hear - we have 50Mbit / s, 100Mbit / s, 4G will give even more ... But you rarely see the understanding that bandwidth is not very important for a typical web application (unless you download files) - it’s much more important than lantency, etc. to. a lot of small requests are made on different connections and the TCP window just does not have time to swing.
    And of course, the farther the client is from the web server, the longer. But it happens that it is impossible otherwise or difficult. That is why they came up with:
    1. Cdn
    2. Dynamic proxy (CDN-vice versa). When, for example, nginx is installed in a region, it opens persistent connections to a web server and terminates ssl. I see why? Namely, the connection between the client and the web proxy is accelerated several times (handshakes start to fly), and then the heated TCP connection is used.

    What else can be done ... increase TCP's initial congestion window - yes, this often helps, because The web page is given in one set of packages without confirmation. Give it a try.



    Turn on the browser debugger, look at the loading time of the web page and think about latency and how to reduce it.

    Throughput

    Remember that the TCP connection window must be overclocked first. If the web page loads in less than a second, the window may not have time to enlarge. The average network bandwidth in the world is slightly higher than 3 Mbit / s. Conclusion - transmit as much as possible through one established connection, "warming up" it.



    Of course, multiplexing of HTTP resources inside one TCP connection can help here: transferring several resources interspersed both in the request and the response. And even this technique was included in the standard, but was underestimated and as a result it did not take off (in chrome it was removed quite recently not so long ago). Therefore, here you can still try spdy, wait for HTTP 2.0, or use pipelining - but not from the browser, but from the application directly.



    Domain Sharding

    But what about the very popular domain sharding technique - when the browser / application overcomes the limitation of> = 6 connections to a domain, opening another => 6 or more connections to fictitious domains: img1.mysite.ru, img2.mysite.ru ...? It's fun here because from the point of view of HTTP / 1.1 - this is likely to accelerate you, and from the point of view of HTTP / 2.0 - this is an antipattern, because Multiplexing HTTP traffic over a TCP connection can provide better throughput.
    So for now - shuffling domains and waiting for HTTP / 2.0 to not do this anymore. And of course - it’s best to measure specifically for your web application and make an informed choice.

    Frontend


    About well-known things such as web page rendering speed and image size and JavaScript, resource loading order, etc. - writing is not interesting. The theme is beaten and killed. In short and inaccurate - cache resources on the side of the web browser, but ... with your head. Cache a 10MB js file and parse it inside the browser on each web page - we understand what it will lead to. We turn on the browser debugger, pour coffee and by the end of the day - trends are evident. We outline the plan and implement it. Simple and transparent.
    Much sharper pitfalls may be hidden behind the relatively new and booming web browsing capabilities. We will talk about them:
    1. XMLHttpRequest
    2. Long polling
    3. Server-Sent Events
    4. Web sockets


    Browser - like an operating system

    Initially, the browser was perceived as a client application for displaying HTML markup. But every year it turned into a control center for a galaxy of technologies - as a result, the HTTP server and web application behind it are now perceived only as an auxiliary component inside the browser. An interesting technological shift in emphasis.
    Moreover, with the advent of WebRTC built-in “television studio” and browser network communication tools with the outside world, the issue of ensuring performance smoothly moved from the server infrastructure to the browser. If this internal kitchen will slow down at the client - no one will remember about php on the web server or join in the database.

    We will analyze this opaque monolith for parts.

    XMLHttpRequest

    This is the well-known AJAX - the ability of the browser to access external resources via HTTP. With the advent of CORS , a complete “chaos” began. Now, to determine the cause of braking, you need to climb all the resources and look at the logs everywhere.
    Seriously, the technology undoubtedly blew up the capabilities of the browser, turning it into a powerful platform for the dynamic rendering of information. There is no point in writing about it; many people know the topic. However, I will describe the limitations:
    1. again, the lack of multiplexing of several "channels" makes inefficient and incomplete use of TCP connection bandwidth
    2. there is no adequate support for streaming (opened the connection and hangs, you wait), i.e. it remains to pull the server and see what he answered




    Nevertheless, the technology is very popular and making it transparent in terms of speed monitoring is not difficult.

    Long polling

    How to make web chat? Yes, you need to somehow transmit information about the changes from the server to the browser. Directly through HTTP - you can’t, can not. Only: request and response. It was on the forehead that people decided: to make a request and wait for an answer, a second, 30 seconds, a minute. If anything comes, give back and disconnect.

    Yes, a lot of antipatterns, crutches - but the technology is very widespread and always works. But, because You are responsible for the speed - you know, the load on the servers with this approach is very high, and can be compared with the load from the main traffic of the web project. And if updates from the server to the browsers are distributed often, then the main load can exceed many times!
    What to do?

    Server-Sent Events

    This opens a TCP connection to the web server, does not close, and the server writes different information to UTF-8 into it. True, it is impossible to transmit binary data optimally without preliminary Base64 (+ 33% increase in size), but as a one-way control channel, it is an excellent solution. True in IE - not supported (see paragraph above, which works everywhere).




    The advantages of technology are that it:
    1. very simple
    2. no need to re-open the connection to the server after receiving the message


    Web sockets

    For the system administrator, this is not even a beast, but rather a night necromorph . In a "tricky" way through HTTP 1.1 Upgrade, the browser changes the "type" of the HTTP connection and it remains open.




    Then, through the connection in OBE (!) Of the side, you can begin to transmit data framed in messages (frames). Messages are not only with information, but also control ones, including type "PING", "PONG". First impression - the bicycle was invented again, again TCP based on TCP.
    From the point of view of the developer - of course this is convenient, a duplex channel appears between the browser and the web application on the server. Do you want streaming, do you want messages. But:
    1. html caching is not supported since working through a binary framing protocol
    2. compression is not supported, you need to implement it yourself
    3. жутки глюки и задержки при работе без TLS — из-за устаревших прокси серверов
    4. нет мультиплексирования, в результате чего каждое bandwidth каждого соединения используется неэффективно
    5. на сервере появляется много висящих и делающих что-то «гадкое с базой данных» прямых TCP-соединений от браузеров




    How to track Web Sockets performance? Very good question, specially left for a snack. On the client side - the WireShark packet sniffer, on the server side and with TLS enabled - we solve the problem by patching modules for nginx, but apparently there is a simpler solution.

    The main thing is to understand how Web Sockets are arranged from the inside, and you already know this and speed control will be provided.
    So which is better: XMLHttpRequest, Long Polling, Server-Sent Events, or Web Sockets? Success lies in the competent combination of these technologies. For example, you can manage the application through WebSockets, and load resources using built-in caching through AJAX.

    So what's now?


    Learn to measure and respond to set points. Process web application logs, deal with slow queries in them. Speed ​​on the client side also became possible to measure thanks to the Navigation timing API - we collect performance data in browsers, send it via JavaScript to the cloud, aggregate in pinba and respond to deviations. Very useful API, be sure to use it.



    As a result, you will find yourself surrounded by a monitoring system such as nagios , with a dozen or two automated tests that show that everything is in order with the speed of your web system. And in case of positives - the team is assembled and a decision is made. Cases can be, for example, such:
    • Slow query in the database. The solution is query optimization, denormalization in case of emergency.
    • Slow processing of application code. The solution is algorithm optimization, caching.
    • Slow transmission of the page body over the network. Solution (in order of increasing cost) - increase tcp initial cwnd, put a dynamic proxy next to the client, transfer servers closer
    • Slow return of static resources to clients. The solution is CDN.
    • Lock pending connection to servers in the browser. The solution is domain sharding.
    • Long Polling puts more pressure on servers than customer hits. The solution is Server-Sent Events, Web Sockets.
    • Slow, Web Sockets are unstable. The solution is TLS for them (wss).

    etc.

    Summary


    We went through the main components of a modern web application. We learned about the trends of HTTP 2.0, control points that are important to understand and learn to measure to ensure the response speed of a web application as low as possible 0.3 sec. Looked at the essence of modern network technologies used in browsers, identified their advantages and bottlenecks.

    We realized that it is important to understand the operation of the network, its speed, latency and bandwidth. And that bandwidth is far from always important.

    It became clear that now it’s not enough to “stretch” the web server and database. You need to understand the bouquet of network technologies used by the browser, know them from the inside and measure them effectively. Therefore, the TCP traffic sniffer should now become your right hand, and monitoring your key performance indicators in server logs with your left foot.

    You can try to solve the problem of processing a client’s request in “0.3 sec” in different ways. The main thing is to determine the metrics, automatically collect them and act in case of exceeding them - to dig to the root of each specific case. In our product, we solved the problem of ensuring the lowest possible latency thanks to a comprehensive caching technology that combines static and dynamic site technologies.

    In conclusion, we invite you to visit our technology conference, which will be held soon, May 23. Good luck and success in the difficult task of ensuring the productivity of web projects!

    Also popular now: