Caching nginx for anonymous users using the Drupal example

    As you know, Drupal is an example of an extremely difficult CMS / CMF, and building loaded sites is not so simple. Since my company primarily uses Drupal in its development, we sometimes have to deal with performance optimization, and I would like to talk about how we deal with the load.

    In this article, I will discuss one of the most effective methods of increasing productivity - caching the nginx web server content for anonymous users. Thanks to this method, requests from anonymous users do not cause access to the backend (it doesn’t matter which one - apache or fastcgi). Thus, such caching is more efficient than any CMS.

    Formulation of the problem

    Drupal has built-in caching for anonymous users. However, it works extremely inefficiently and brings rather more problems with high traffic. Therefore, it is reasonable to apply at least 2 measures:
    1. Cache content for anonymous users using nginx
    2. Store the cache_form and cache_filter tables in Cacherouter + APC


    In order to separate anonymous users from logged-in users, we will issue a cookie at login, and at logout - select it. Let's write a small nginxcache module: nginxcache.module We clear the sessions table, cache and turn on the module.
    name = Nginx cache
    description = Nginx cache for anonymous users
    package = ISFB
    version = VERSION
    core = 6.x

    function nginxcache_user($type, &$edit, &$user) {
    case 'login' :
    setcookie('logged', TRUE, time()+60*60*24*30, "/");
    case 'logout':
    setcookie('logged', FALSE);


    I will not give the whole config, I will describe only what is relevant to the article.

    In the http section, we need to declare a zone:
    proxy_cache_path /var/nginx/cache levels=1:2 keys_zone=hrportal:10m inactive=60m;

    First, we need to take care of creating the / var / nginx / cache directory with the necessary rights. If cache data is not accessed during inactive time, it is deleted regardless of freshness.

    In the locally needed virtual host, we write: proxy_no_cache - the directive sets the conditions under which the answer will not be stored in the cache, in our case, if there are logged cookies. If you forget about this directive, the responses of authorized users will be written to the cache, which will be given to anonymous and other authorized users. proxy_cache_bypass - the response will not be taken from the cache for users who have a logged cookie
    proxy_cache hrportal;
    proxy_cache_key $host$uri?$args;
    proxy_no_cache $cookie_logged;
    proxy_cache_bypass $cookie_logged;
    proxy_cache_use_stale error timeout invalid_header updating http_500 http_502 http_503 http_504;
    proxy_pass_header Set-Cookie;
    proxy_ignore_headers "Expires" "Cache-Control";
    proxy_cache_valid 200 301 302 304 1h;

    proxy_cache_use_stale - allows you to give an outdated response from the cache if the backend is not available, very useful
    proxy_pass_header - you need users to receive proxy_ignore_headers
    cookies - Drupal always sends these headers, we have to ignore them
    proxy_cache_valid - sets the response caching time


    As you know, search engines analyze the site’s performance and don’t direct more users to it, which can process the cham. Here is an example of a quality site from the point of view of the PS that lacked the performance to process traffic from the PS. As you can see from the graph , after installing this solution, traffic began to grow.
    This site uses CMS Drupal and is located on our VPS technical site.

    PS This approach, of course, applies to any CMS.

    UPD I have repeatedly heard the question of why not boost.
    The answer is quite complicated. I will try to explain.
    To begin with, as you rightly noted - my solution is not boxed, and it should be applied when traffic is really high - and any ways to increase productivity are important.

    1. the nginx frontend can be located on a separate server from the backend - the cache will lie in the same place - and be delivered much faster
    2. nginx in any case gives up faster statistics
    3. Drupal can stand on nginx - in this case it is necessary to redo boost revites - it must ?
    4. In the case of boost, the caching task is given to the backend - this can be done directly on the frontend

    Also popular now: