Experience in developing a total caching engine

    I would like to talk about my experience in creating an engine for a specialized site, the feature of which is the ability, in the ideal case, not to use the database at all. I would like to share my solution to the episodic high-load problem and get feedback on similar solutions and improvements.

    So, I was tasked with developing an information site based on user content - blog entries. The editorial staff is working on the site, which collects posts from the Internet and compiles stories from them, backing up with various relevant information. The specifics of the site is such that with an average load of 5-10 thousand visitors per day in case of a socially important topic where fresh information can be gleaned from the blogosphere, traffic to specific materials increases many times (sometimes by orders of magnitude, as in the case of a terrorist attack or an unexpected political solutions). It was decided: we cache the most popular. But first, let's make some assumptions:

    • The front-end is almost static - the materials enter the database through the CMS, and the user does not add or change anything. Content on the site is rarely entered relative to the number of views, so CMS has the right to be more voracious than the front-end;
    • We have only one weak server at our disposal, but it is possible to add memory;
    • The RAM volume is much larger than the database volume (RAM at the initial stage is 8GB compared to the current 500 MB of text data in the database);
    • Individual materials have tens and hundreds of thousands of visits, while most have hundreds;
    • We use PHP / MySQL / Memcached.


    At the same time, there is a problem: the page structure is complex and varied enough to cache it “forehead”. A page consists of blocks, some of which are constant, some depend on query parameters (for example, “other materials on the subject”), and some are not cached at all (for example, search on a site). In order not to produce an abundance of duplicate pages, it was decided to compile the page based on the template and blocks that are formed by modules that take into account only the necessary parameters.

    Let's take a look at the structure of the Subject page template:

    Subject template

    We have a three-column design. Modules are built into each column. The top_menu module does not depend on any parameters, the content_subject module depends on the material ID and page number, the rest only on the material ID.

    Now let's look at the structure of the module that forms the HTML code of the block:

    Module

    The module interface contains 3 methods necessary for working with the caching system:

    • getCode () - generates code for a block and takes into account parameters passed from the kernel;
    • getDependencies () - returns a list of dependencies. Here the module receives: the name of the table in the database, the name of the action with this table (add, delete, update) and the material ID in this table (if any). Using them, the module calculates the names of the dependent blocks and returns a list of them. Example: the action of adding an article and returning a list of all the pages in the section so that they are marked as outdated by the engine core;
    • getParameters () - returns an array of those parameters that affect the formation of the code. It is necessary for the correct connection of modules to templates. Some parameters may be redundant and we would get a large list of duplicates in Memcache.


    How the display request is processed


    The engine performs the following actions:

    • Router determines the name of the action $ action and its parameters by URL. In my implementation, they are hard-coded;
    • The tpl, $ action template corresponding to the action is connected (for convenience, their names coincide) from the cache (in case of a miss, we consider the template from disk);
    • Getting a list of modules from the var, modules cache (in case of a miss, we get a list of module files);
    • Obtaining module parameters from the var, params cache (in case of a miss for all modules, we will execute the getParams () method);
    • Traversing the template in search of plug-ins. The found module is checked through in_array with a list of modules to prevent errors. For each module:
      • If “nocache” is contained among the module parameters, the generated block will not be cached;
      • If among the parameters "increment", we increase the counter of views, which we also have in the cache (if not, we get from the database);
      • We select the parameters for calling the module: those that are set, from those that are required;
      • We select from the cache or execute getCode ();
        • In the found code, we look for markers for counters for viewing materials to substitute the actual values.


    I want to dwell on the last point. Among all the "static paradise" safely obtained from the cache, we have exceptions. These are exceptions-modules, which are mentioned above, and which are not cached, but also counters for viewing materials. When you call a module such as subject_content, which generates the main part of the theme page, the number of views will increase automatically and immediately in the cache (cnt, subject, $ id), but these values ​​are also actively used in processing announcements of materials. Therefore, for them, we have special markers by which the values ​​will be taken from the cache and inserted on the fly.

    The overall cache structure is as follows:

    Cache structure

    Moreover, the order of the "assembly" of the page is exactly the same as in the diagram: code calls generated by modules © are inserted into the template (tpl), into which counter values ​​(cnt) are inserted.

    In addition to the engine on the server, scripts are executed that are executed from the scheduler - robots. Without taking site-specific functions into account, I’ll mention two robots: dumping the counters from the cache to the database, and asynchronously updating various site statistics and blocks like “popular materials”. The first is needed so that the number of views of the materials is not lost, and the second is to keep the blocks up to date for which calculations should be made periodically.

    CMS Operation Algorithm


    Everything is much simpler here. When updating / adding / removing materials, a poll of all modules is carried out, does it have any relation to this action with this table and given ID (if any). For example, under the action of update for a material, only one index page is reset where the announcement of this material is located. All modules verify whether they use data from a table with the given name, and if so, how. In my implementation, blocks with "Fresh Articles" are always reset, without checking whether a specific material ID is used in this list - I observe a balance of manufacturability and reasonableness.

    So, for the entire list of modules from var, modules, the getDependencies method ($ tableName, $ action, $ id = 0) is executed, and the resulting list of blocks for reset is passed to the kernel to set the "outdated" checkbox. Blocks will be regenerated upon request from the front-end (or maybe they won’t, if the material lies deep and no one needs it anymore).

    The practice of using the engine


    The site has been successfully operating since 2010 and has survived a series of disasters that have been sustained thanks to the architecture of the engine. Once our hard drives burned out in a raid, both of them at once. The go-ahead was given to the editors to temporarily stop updating the site so as not to reset the cache, and the site successfully worked all the time while the disks were being installed and the data was returned from the backup and the drives were synchronized. Another time there was a terrorist attack in Domodedovo and visitors rushed to search with us for the most relevant information on the event and about 70 thousand visitors came up on the topic within half an hour after the tragedy. The page delivery time increased to 10 seconds, but the server survived.

    If you are interested in looking at how the growth of attendance affects the consumption of processor time and memory, let's look at a recent case that occurred on September 25th. Here's what Liveinternet.ru says about him:

    Liveinternet statistics

    Attendance growth of about 7 times. As I wrote above, as a rule, traffic goes to some separate materials, and this case is no exception:



    Memory consumption varied within the statistical error:



    About CPU time, the load was slightly felt:



    (Two “bursts” at the end of the 20th and The 27th is associated with a weekly full backup.)

    Memcached statistics: Misses on reading: 1 on 60, uptime 74 days. I will be glad to hear questions and opinions. How could the engine be improved? Make it more versatile? What are similar solutions?

    [uptime] => 6371668
    [get_hits] => 409123948
    [get_misses] => 6869860
    [incr_misses] => 1259
    [incr_hits] => 2476204
    [bytes_read] => 13353236827
    [bytes_written] => 135590836194
    [bytes] => 358927266
    [curr_items] => 1246460
    [total_items] => 1733562





    Also popular now: