cloud August 11, 2008 at 17:26

Using Nginx as a collector in the difficult business of caching

Often, we do not cache the output just because among the data that can be easily and painlessly cached, there is often changing information (usually tied to a specific user).
There are several solutions to this problem, but significantly fewer good ones.

Solutions

There are several solutions to this problem
1. Only cache samples.
This can and should be done, but all the same, extra gestures are performed, such as parsing the URL, initiating models and outputting data to the template. All this increases entropy and leads to premature death of the universe.
2. Collect data using JavaScript.
From my point of view, this practice is flawed, as it leads to data loss for clients that do not support javascript. The claims that decent clients keep javascript turned on, I dismiss with anger, because in the world there are mobile browsers, readers for the blind and just hardworking robots. This solution is applicable (and even desirable) exclusively for the derivation of various "informers" that can be easily dispensed with.
Thus, we must collect data on the server, and in the most easy way possible. In this case, the well-deserved favorite nginx server and the undeservedly forgotten ssi technology are used.

I must say right away that the great discovery did not belong to me, but was born in the bowels of the respected company mail.ru, whose specialists shared it at the past Highload.
However, their decision was, in my opinion, too cumbersome and somewhat different in ideology. In fact, nginx was used there as a template engine with support for loops, conditional jumps and other flip-flops that seem a little redundant for the "small light front-end". In addition, contrary to the joyful assurances of the authors, the project did not appear in the public domain, and writing such splendor was not in my plans.

Idea

The point is to separate well-cached data and data that is cached poorly or not cached at all, the latter usually being tied to the user.
For example, there is a list of communities, next to each, depending on whether the user has logged in there (and if he is logged in at all), the “enter” button or the “leave” button are displayed.

As a result, we create a cache in which the ready-made html page is stored, containing the necessary data and ssi instructions for outputting variables with non-cached data. A backend file is also created that generates these variables.

1. Run the cache at the normalized URL, if the cache with the normalized URL is already there, redirect to it, if not, create a fake cache that refreshes the page with some delay, i.e. "Lock" the page.
/communities/new/1.html - the first page of the list of communities sorted by time of creation
2. We make a selection of well cached data.
In our case, the newest 10 communities.
3. Generate and "remember" the code for fetching non-cached data.
Those. in which communities of the selected user is the current user.
4. Display the list. The point is that at the level of the template engine instructions we have the ability to specify the sections of code that are created in the backend. Thus, the code is “remembered” and supplemented with the ssi variable generation code, and po: ssi is replaced with the ssi instruction.
It looks something like this (a bicycle template builder is used, the secret meaning of its creation is described here: cloud.habrahabr.ru/blog/22445 ):



{$Title}



 if (auth::is_logged_id()){

 if (isset($memberof['{$id}'])&&$memberof['{$id}']){

 ?>покинуть сообщество
 }

 else {

 ?>покинуть сообщество
 }

 }

?>

5. We save the result of the script execution in the html file, and the code “attached to the user” in the backend file. Add the ssi include virtual instruction to the file cache, which connects the backend file.

Thus, when receiving data from the cache, the following occurs:

1. nginx receives the request and finds the page in the cache.
2. Executes the include virtual instruction (here it is important not to forget to set the wait parameter), by which it connects the generated backend file. It defines the current user and his membership in the communities displayed on this page, creates the html code for the “enter” and “leave” buttons, which is recorded in the corresponding ssi variables.
3. Variables are displayed in the corresponding places of the templates, as a result of which we obtain the desired page.

The relevance of the cache.

The relevance of the cache is supported by the "sample names", i.e. each selection is an object that has one or more names, usually a table name and / or table name and key.
For example, a sample of communities from the “communities” table with the subject “sport” is named communities_sport. Sample names are set at the model level, and at the model level, a cache can be reset at UPDATE / DELETE / CREATE.

As a result, when we save the cache, we keep the page links with the names of all the samples made, and when changing, we delete all pages with the given name.

Tags:

Using Nginx as a collector in the difficult business of caching

Solutions

Idea

The relevance of the cache.

Also popular now: