IDMan February 2, 2013 at 16:47

High performance Google Chrome

Transfer

The history and cornerstones of Google Chrome.

Google Chrome was introduced in the second half of 2008 as a beta version for the Windows platform. The Google code sponsored by Google has been made available under a liberal BSD license - just like the Chromium project. For most interested, this turn of events came as a surprise - is the browser war coming back ? Can Google make its product really better than others?

“It was so good that it made me change my mind ..” - Erich Schmidt, initially unwilling to accept the idea of Google Chrome.

Yes, he could! Today, Google Chrome is one of the most popular browsers ( 35% market share, StatCounter indicators) and is available on Windows, Linux, OS X, Chrome OS, Android and iOS platforms. Undoubtedly, its advantages and wide functionality found a response in the hearts of users, giving many wonderful ideas to other browsers.

The original 38-page comic book explaining the ideas and principles of Google Chrome offers us a wonderful example of the thinking and design process that resulted in this browser. However, this is only the beginning of the journey. The main principles that became the motivators of the first stages of the development of Chrome, found their continuation in the rules of continuous improvement of the browser:

Speed : make the fastest browser
Security : provide the user with the most secure work environment
Stability : provide a flexible and stable platform for web applications
Simplicity : sophisticated technology behind a simple interface .

As the development team notes, the many sites that we use today are not so much web pages as web applications. In turn, all large and ambitious applications require speed, security and stability. Each of these qualities deserves a separate chapter in the book, but since performance is our theme today, we will mainly talk about speed.

Versatility of performance.

Modern browsers are a platform that in many ways resembles the operating system, and Google Chrome is no exception. The browsers that preceded Chrome were designed as monolithic programs, with a single workflow. All open pages shared the same address space and worked with the same resources, so an error in processing any page or in the browser mechanism threatened to crash and crash the entire application.

In contrast to this approach, Chrome is based on a multi-process architecture that provides each page with its own separate process and memory, creating something like a hard-wired sandboxfor each tab. In a world of ever-increasing multi-core processors, the ability to isolate processes at the same time as protecting each page from other pages working from errors, has given Chrome a significant performance advantage over its competitors. It is worth noting that most other browsers followed suit, implementing or starting implementation of the mentioned architecture.

With the separation of processes, the execution of a web application mainly includes three tasks: get all the necessary resources, build a page structure and display it, execute JS. The page building and JS execution processes follow a single-threaded, alternating scheme, since it is impossible to simultaneously build and modify the same page tree (DOM). This feature is due to the fact that JS itself is a single-threaded language. Therefore, optimizing the joint construction of the page and the execution of scripts in real time is a very important task, both for web application developers and for developers of the browser itself.

Chrome uses WebKit to display pages., fast, open (Open Source) and standards-compliant engine. To run JS, Chrome uses its own, very well optimized V8 Javascript engine, which, incidentally, is an Open Source project, and has found its application in many other popular projects - for example, in node.js. However, optimizing the execution of V8 scripts, or processing and displaying pages with a web kit is not so important when the browser is waiting for the resources needed to build the page.

The browser’s ability to optimize the order, priority, and manage the possible delays of each required resource is one of the most important factors in its operation. You may not even suspect it, but Chrome’s network interface, figuratively speaking, is getting smarter every day, trying to hide or minimize the cost of waiting for each resource to load: it learns like a DNS lookup, remembering the network topology, making preliminary requests to the most likely to visit pages, and other. Externally, it is a simple mechanism for requesting and receiving resources, but its internal structure gives us a fascinating opportunity to learn how to optimize web performance in order to leave the user only the best experience.

What is a modern web application?

Before turning to the individual details of optimizing our interaction with the Internet, this section will help us understand the nature of the problem we are investigating. In other words , what does a modern web page do, or what does a current web application look like ?

The HTTP Archive project preserves the history of the evolution of the web, and it will help us answer this question. Instead of collecting and analyzing content from across the network, he periodically visits popular sites to collect and record data on resources used, types of content, headers, and other meta-data for each individual site. Statistics available as of January 2013 may surprise you. The average page, with a sample of the first 300,000 Internet sites, has these characteristics:

"Weight" - 1280 KB
Consists of 88 resources (pictures, css, js)
It uses the data of more than of 30 third-party sites.

Let's look at it in more detail. Over 1MB of weight on average, consists of 88 resources, and is assembled from 30 different proprietary and third-party servers. Note that each of these indicators has been growing steadily over the past few years, and there is no reason to predict that this growth will stop. We are increasingly building more cumbersome and demanding web applications, and there is no end in sight.

After making simple mathematical calculations based on HTTP Archive metrics, you can see that the average page resource has a weight of 12KB (1045KB / 84), which means that most Internet connections in the browser are short-term and impulsive. This makes life even more difficult for us because the underlying protocol (TCP) is optimized for large, streaming downloads. Therefore, it is worth getting to the bottom of things, and consider one of the typical requests for a typical resource.

Typical request life

The W3C Navigation Timing specification provides a browser API and the ability to track the time frame and performance of each request. Let's take a closer look at its components, as each of them represents a very important part in the overall user experience of browser performance.

Having received the URL of the resource on the Internet, the browser starts checking if it is local, and whether there is stored data in the cache. If you have previously received data from this resource, and the corresponding browser headers have been set (Expires, Cache-Controle, ...) that is, the ability to get all the data from the cache - the fastest request is the request that was not made . In another case, if we checked the resource and the cache is “rotten” or we have not visited the site yet, the turn comes to make an expensive network request.

Having the site address and the path to the requested resource, Chrome first checks if there are open connections to this site that can be used again - sockets are grouped by{scheme, host, port}. If you access the Internet through a proxy, or have installed a proxy auto-config (PAC) script, Chrome checks to see if you have the right connection through the appropriate proxy. The PAC script allows you to specify multiple proxies based on URLs or other configuration rules, and each of them can have its own set of connections. And finally, if none of the above conditions fit, the turn came to get the IP address for the address we need - DNS Lookup.

If we are lucky and the address is in the cache, the answer will probably cost us one quick system request. If not, then the first thing you need to do is query the DNS server. The time it takes to complete it depends on your ISP, the popularity of the requested site, and the likelihood that the site name is in the intermediate cache, plus the response time of the DNS server to this request. In other words, there is a lot of uncertainty, but the time of several hundred milliseconds that it will take to query the DNS will not be anything out of the ordinary.

After receiving the IP, Chrome can establish a new TCP connection to the remote server, which means that we have to perform the so-called three-way handshake (three-time greeting): SYN> SYN-ACK> ACK. This greetings exchangeadds a request-response delay for each new TCP connection - no exceptions . Depending on the distance between the client and the server, given the choice of the routing path, this can take several hundred and even thousands of milliseconds in us. Note that all this robot is executed before even one byte of web application data is transferred!

If the TCP connection is established and we use the secure data transfer protocol (HTTPS), you will additionally need to establish an SSL connection. This can take up to two additional full question-answer cycles between the client and server. If the SSL session is cached, we can get by with just one additional loop.

Finally, after all the procedures, Chrome has the ability to finally send an HTTP request (requestStart in the diagram above). Having received the request, the server starts processing it, and sends a response back to the client. This will require at least one cycle, plus the time it takes to process the request on the server. And so, we finally got the answer. Yes, if this answer is not an HTTP redirect! In this case, we will have to repeat once again the entire above procedure as a whole. Is there a couple of redirects you don't need on your pages? Probably worth returning to them, and change your mind.

Did you consider all these delays? To illustrate the problem, let’s assume the worst case scenario of a typical broadband connection: the local cache is lost, followed by a relatively fast DNS lookup (50ms), TCP greeting, SSL negotiation, and a relatively quick (100ms) server response, with 80ms for request and response delivery (average Continental America Cycle Time):

50 ms for DNS
80 ms for DNS greetings (one cycle)
160 ms for SSL greetings (two cycles)
40ms per request to the server
100 ms to process the request on the server
40 ms response from the server

In total, this is 470 ms for a single request, which leads to the cost of more than 80% of the time to establish a connection with the server compared to the time that the server needs to process the request . In fact, even 470 milliseconds can be an optimistic estimate:

if the server response does not fit in the initial congestion window (4-15KB), then a few more request-response cycles will be required.
SSL delay can be even worse if we need to get a lost certificate or check the certificate status online , in each case we need to establish a new, independent TCP connection, which can add hundreds of milliseconds or even seconds to the delay.

What does fast enough mean?

Network expenses for DNS, greetings, and messaging - this is what dominates the total time in previous cases - the server response will require only 20% of the total wait! But, by and large, do these delays matter ? If you read this, then you probably already know the answer - yes, and a very big one.

Recent user studies paint the following picture of what users expect from any interface, both online and offline applications:

Delay	User response
0-100ms.	instantly
100-300 ms.	slight but noticeable delay
300-1000ms.	request in processing
more than 1s.	switching user attention to a different context
more than 10s.	I'll be back later

The table above also explains the unofficial performance rule in a web application environment: display your pages, or at least provide a visual response to user actions for 250ms to keep them interested in your application. But it’s not just speed for speed. Research in Google, Amazon, Microsoft, as well as many thousands of other sites show that the additional delay has a direct impact on the success of your site: faster sites bring more views, higher user loyalty and higher conversion.

And so, what we have is an optimal delay indicator of about 250ms, but, however, as we saw above, the combination of DNS queries, the establishment of TCP and SSL connections, and query delivery takes up to 370ms. We crossed the limit by more than 50%, and we still did not take into account the processing time of the request on the server!

For most users and even web developers, DNS, TCP, SSL, delays are completely opaque, and form on layers of abstraction, which only a few of us think about. However, each of these steps is critical to interacting with the user as a whole, as each additional network request can add tens or hundreds of milliseconds of delay.
This is the reason why the Chrome network stack has many, many more than just a socket handler.

We discussed the problem, it's time to move on to the implementation details.

PS translator: since the article is quite large, I decided to break it down into theory and practice, the other part is more interesting and much larger. As it turned out, a lot of time in processing a request for a translation takes up to Habr, about 40% of the time, and proofreading in Russian, because for me it is a kind of double translation. Thanks for attention.

Tags: