Best HTTP / 2 Prioritization for Web Acceleration

Transfer

HTTP / 2 promised to significantly speed up the web, and Cloudflare long ago deployed HTTP / 2 access for all clients. But one feature of HTTP / 2, prioritization, did not meet expectations. Not because it is fundamentally broken, but because of the implementation in browsers.

Today, Cloudflare suggests changing the prioritization of HTTP / 2, which gives our servers control over prioritization decisions that really speed up the Internet.

Historically, it was the browser that controlled how and when to download web content. Today, for all paid plans, we are making radical changes to this model. They transfer control directly to the site owner. On the “Speed” tab in the Cloudflare dashboard, clients can enable “Advanced HTTP / 2 prioritization”: it overrides the default browser settings to an improved scheduling scheme, which significantly speeds up access for visitors (in some cases, we saw an increase of 50%). With Cloudflare workers, site owners can go even further and fully customize the settings for their specific needs.

Current situation

Web pages consist of dozens (sometimes hundreds) of individual resources that are downloaded and collected by the browser into the final displayed content. This includes the visible content with which the user interacts (HTML, CSS, images), as well as the application logic (JavaScript) for the site itself, advertising, analytics and marketing tracking beacons. From the user's point of view, the sequence in which these resources are loaded is very important: this affects the time when he sees the content and can interact with the page.

A browser is, in fact, an HTML processing engine that passes through an HTML document and follows the instructions in order: from beginning to end HTML, building the page as it moves. Style sheet links (CSS) tell the browser how to style the contents of the page, and the browser will delay the display of the content until it loads the stylesheet. Scripts on the page can have different behaviors. If the script is marked as “asynchronous” or “pending”, the browser can continue processing the document and simply run the script when it becomes available. If the script is not marked as asynchronous or pending, the browser MUSTstop processing the document until the script is loaded and executed. Such scripts are called "blocking" because they block the browser from continuing to process the document.

The HTML document is divided into two parts. The title of the <head> document is at the beginning and contains stylesheets, scripts, and other browser instructions needed to display the content. After the heading is the body of the <body> document, it contains the actual content displayed in the browser window (although scripts and style sheets can also be in the body). Until the browser reaches the body of the document, the user has nothing to show, and the page remains blank. Therefore, it is important to process the header as quickly as possible. If you are interested in the details,how browsers work.

The browser is usually responsible for the order in which the various resources needed to build the page and further process the document are loaded. In HTTP / 1.x, there are restrictions on how many objects the browser can request from any server at a time (usually 6 connections and only one resource at a time per connection), so the order of requests is strictly controlled by the browser. In HTTP / 2, the situation is completely different. The browser can request all resources at once (at least as soon as it finds out about them), and provides the server with detailed instructions on how to deliver these resources.

Optimal resource loading order

For most parts, there is an optimal order in the page loading cycle that maximizes the page’s availability for the user (and the difference between the optimal and non-optimal loading order can reach 50% or more).

As described above, before the browser can display any content, it is blocked by CSS and JavaScript in the section <head>. At this stage, it is more profitable to use 100% of the channel to load blocking resources, rather than load them in order, as they are written in the HTML code. This allows the browser to analyze and run each element while loading the next blocking resource, which creates an optimal pipeline.

The script loading time for parallel or sequential loading does not differ, but for sequential loading the first script can be processed and executed during the second loading.

After loading blocking resources, the situation becomes a little more interesting. Here, the optimal load may depend on a particular site or even business priorities (selection of user-generated content or advertising, or analytics, etc.). A separate problem with fonts, because the browser detects the desired fonts after applying the stylesheet to the displayed content. Therefore, by the time the browser learns about the font, it is necessary to display text that is already ready for display on the screen. Any delays in loading the font result in a lack of text on the screen (or the text is displayed in the wrong font).

As a rule, some trade-offs need to be considered:

Custom fonts and images in the visible part of the page (viewport) should be loaded as quickly as possible. They directly affect the user's visual experience when loading the page.
Non-blocking JavaScript should be loaded sequentially with respect to other JavaScript resources so that each of them can be pipelined. JavaScript may include custom application logic, as well as tracking beacons for analytics and marketing, and their delay may lead to a decrease in the indicators tracked by the business.
Images can be uploaded in parallel. The first few bytes of the image file contain its size, which may be necessary for the browser layout, and parallel loading of progressive images can provide visual completeness after transferring about 50% of the total volume.

Given the trade-offs, in most cases, this strategy works well:

Custom fonts are loaded sequentially and share the available bandwidth with images in scope.
Visible images are loaded in parallel, sharing among themselves the part of the bandwidth allocated to them.
When there are no more fonts or visible images:
- Non-blocking scripts are loaded sequentially and share the available bandwidth with invisible images (which are out of scope).
- Invisible images are loaded in parallel, sharing among themselves the part of the bandwidth allocated to them.

Thus, the content visible to the user is loaded as quickly as possible, the application logic is delayed to a minimum, and invisible images are loaded in such a way as to complete the layout as quickly as possible.

Example

To illustrate, we use a simplified product category page from a typical e-commerce site:

Blue - HTML file of the page itself.
Green - One external style sheet (CSS file).
Orange - Four external scripts (JavaScript). Two blocking scripts at the top of the page and two asynchronous ones. Blocking scripts are shown in a darker shade of orange.
Red is one custom web font.
Violet - 13 images. The viewing window displays the page logo and four product images, another 8 product images require scrolling. The five visible images are indicated by a darker shade of purple.

For simplicity, suppose all resources have the same size and each load in 1 second. Downloading all resources takes a total of 20 seconds, but the order and method of loading are extremely important.

Here's what the optimal resource loading will look like in a browser:

The page is blank for the first 4 seconds while loading HTML, CSS and blocking scripts: they all use 100% connection.
At the 4 second mark, the background and page structure are displayed without text or images.
After a second, at around 5 seconds, the page text is displayed.
In the interval of 5-10 seconds, images are downloaded, blurry at first, but very quickly they become clear. At about 7 seconds, the result is almost indistinguishable from the final version.
At around 10 seconds, the loading of all visual content in the visible part of the page is completed.
Over the next two seconds, asynchronous JavaScript is loaded and executed, executing any non-critical logic (analytics, marketing tags, etc.).
In the last 8 seconds, the remaining images are loaded in case the user scrolls the page.

Current browser prioritization

All current browser engines implement various prioritization strategies , none of which are optimal.

Microsoft Edge and Internet Explorer do not support prioritization , so they work with the default HTTP / 2 settings, which loads everything in parallel, evenly distributing bandwidth between all resources. Microsoft Edge in future versions will switch to using the Chromium engine, which may improve the situation. But for now, in our example, the browser most of the time gets stuck in the page header, since images slow down the transmission of blocking scripts and style sheets.

Visually, this leads to a rather painful experience: the user looks at a blank screen for 19 seconds, and then there is a delay of 1 second to display the text. When viewing the animation below, be patient, because for 19 seconds it may seem that nothing is happening on the blank screen (although it is):

Safari loads all resources in parallel , sharing bandwidth based on their importance, according to Safari (blocking resources such as scripts and style sheets are more important than images). Images are loaded in parallel, but also simultaneously with blocking content.

Although Safari is similar to Edge in the sense that everything loads at the same time, allocating more bandwidth to blocking resources allows you to display content much earlier:

After about 8 seconds, the stylesheet and scripts load, so you can start rendering the page. Since the images were loaded in parallel, they can also be partially displayed (blurry for progressive images). This is still twice as slow as the optimal scenario, but much better than in Edge.
After about 11 seconds, the font loads. You can display the text. At this point, more data is being loaded for the images, and they are getting a bit sharper. This is comparable to the situation around the 7-second mark for an optimal loading scenario.
Over the remaining 9 seconds, images become sharper as more data is downloaded until, finally, the process is complete in 20 seconds.

Firefox creates a dependency tree that groups resources, and then plans the groups to either load one after another or share bandwidth between groups together. Within this group, resources share bandwidth and load simultaneously. Images are planned to be loaded after stylesheets that block rendering and loaded in parallel, but scripts and stylesheets that block rendering are also loaded in parallel and do not get the benefits of pipelining.

In our example, this happens a little faster than in Safari, since the images are waiting for the stylesheet to load:

At around 6 seconds, the original page content is displayed with a background and blurry versions of the product’s images (compared to 8 seconds for Safari and 4 seconds in the best case).
At 8 seconds, the font loaded, and you can display the text along with slightly sharper images of the product (compared to Safari's 11 seconds and 7 seconds in the best case).
Over the remaining 12 seconds, images become sharper as the remaining content is loaded.

The Chrome (and all browsers based on Chromium) prioritizes resources for the list . This works very well for blocking resources that load optimally in order, but not so good for images. Each image is loaded up to 100% before starting the next one.

In practice, this is an almost optimal download scenario, with the only difference being that the images are downloaded one at a time, and not in parallel:

Up to 5 seconds, loading Chrome is identical to the optimal scenario, displaying the background at the 4th second and text content at the 5th.
Over the next 5 seconds, the images of the visibility area are loaded one at a time until the process completes at around 10 seconds (compared to the optimal scenario, when they are displayed in a slightly blurry form at around 7 seconds and become sharper for the remaining three seconds).
After completing the visual part of the page in 10 seconds (identical to the optimal scenario), the remaining 10 seconds are spent on running asynchronous scripts and loading hidden images (as well as in the optimal scenario).

Visual comparison

The visual difference is quite different, although technically loading all the content takes the same time:

Server Side Prioritization

HTTP / 2 prioritization is requested by the client (browser), and the server must decide what to do based on the request. A large number of servers do not support this function at all , and the rest fulfill a client request. Another option is to decide on the best server-side prioritization based on client request.

According to specification, HTTP / 2 prioritization is a dependency tree that requires full knowledge of all current requests in order to be able to prioritize resources relative to each other. This allows you to implement incredibly complex strategies, but it is difficult to implement it well on the browser or server side (as evidenced by various browser strategies and different levels of server support). To simplify the management of prioritization, we have developed a simpler scheme that still has all the flexibility necessary for optimal planning.

Cloudflare's prioritization scheme consists of 64 priority "levels", and within each level there are groups of resources that determine how to divide the connection among themselves:

First, all resources are downloaded at a higher priority level, then there is a transition to a lower level.

Within a given priority level, there are three different concurrency groups:

0 : all resources in group “0” are sent sequentially in the order in which they were requested using 100% bandwidth. Only after loading all resources of group “0” are other groups at the same level considered.
1 : all resources in concurrency group “1” are sent sequentially in the order in which they were requested. The available bandwidth is evenly distributed between the parallelism group “1” and the parallelism group “n”.
n : resources in concurrency group “n” are transmitted in parallel, sharing available bandwidth.

In practice, the parallelism group “0” is useful for critical content that needs to be processed sequentially (scripts, CSS, etc.). Group “1” is useful for less important content that can share bandwidth with other resources, but where the resources themselves still benefit from sequential processing (asynchronous scripts, non-progressive images, etc.). The concurrency group “n” is useful for resources that benefit from parallel processing (progressive images, video, audio, etc.).

Cloudflare default prioritization

With the option of advanced prioritization, the “optimal” order of resource loading, as described above, is implemented. The specific priorities used are as follows:

This scheme allows you to sequentially send resources that block rendering, then send visible images in parallel, and then the rest of the page content with some level of bandwidth sharing to balance the loading of the application and the content. The caveat * If Detectable is that not all browsers distinguish between different types of stylesheets and scripts, but still it will be much faster in all cases. Acceleration of 50%, especially for visitors to Edge and Safari, will not be something unusual:

Setting Prioritization with Workers

Faster default work is great, but it gets really interesting thanks to the ability to configure prioritization with Cloudflare Workers support, so sites can redefine the default priority for resources or implement their own prioritization schemes.

If the worker adds a header to the response cf-priority, then Cloudflare Edge Servers will apply the specified priority and concurrency. The format of the header is <priority> / <concurrency>, so the header will response.headers.set('cf-priority', “30/0”);set priority 30 and parallelism 0 for this answer. Similarly, “30/1” will set parallelism to “1”, and “30 / n” will set parallelism to n.

With such flexibility, a site can set arbitrary priority of resources for its needs. For example, to increase the priority of some important asynchronous scripts or main images: they are downloaded even before the browser determines that they are in visibility.

To inform about prioritization decisions, the runtime of workers also indicates the information requested by the browser about prioritization in the request object, which is passed to the receiver of the worker events (request.cf.requestPriority). Incoming priorities are a list of attributes separated by a semicolon. It looks something like this: weight=192;exclusive=0;group=3;group-weight=127.

weight : weight for prioritizing HTTP / 2.
exclusive : the exclusive HTTP / 2 flag (1 for Chromium-based browsers, 0 for others).
group : HTTP / 2 stream identifier for the request group (non-zero for Firefox).
group-weight : HTTP / 2 weight for the group of requests (non-zero for Firefox).

This is just the beginning.

The ability to configure and control the priority of answers is the main building block for a great future work. We intend to implement our own advanced optimizations on top of this, but with the support of workers, all sites and researchers can experiment with various prioritization strategies. Through the Apps Marketplace, companies can also create new optimization services on top of the work platform and make them available to other sites.

If you are on a Pro plan or higher, go to the “Speed” tab in the Cloudflare dashboard and enable “advanced HTTP / 2 prioritization” to speed up your site.

Tags: