Memcached clustering and caching key selection

    A series of posts under the general title “Web, caching and memcached” continues. In the first, we talked about memcached, its architecture, and possible uses.

    Today we will talk about:
    • Choosing a caching key
    • memcached clustering and key distribution algorithms.

    The next post will be devoted to atomicity of operations and counters in memcached .


    Caching key


    Let us already make sure that using memcached for caching is the right solution. The first task that confronts us is choosing a key for each cache. The key in memcached is a string of limited length, consisting of a limited set of characters (for example, spaces are forbidden). The cache key must have the following properties:
    • When changing the parameters of the sample that we cache, the cache key must change (so that with the new parameters we do not "fall" into the old cache).
    • The key must be determined unambiguously by the selection parameters, i.e. there should be only one caching key for the same sample, otherwise we risk lowering the efficiency of the caching process.

    Of course, we could build a key for each sample on our own, for example, ‘user_158’to select information about a user with ID 158 or ‘friends_192_public_sorted_online’for friends of a user with ID 192, which are publicly available and, moreover, sorted in the order of the last appearance on the site. Such an approach is fraught with errors and non-compliance with the conditions formulated above.

    You can use the following option (an example for PHP): if there is a point in the code through which all calls to the database pass, and any call is completely described (contains all the query parameters) in some structure $options, you can use the following key:

    $key = md5(serialize($options))

    Such a key undoubtedly satisfies the first condition (it $optionswill be necessarily changed when changing $key), but the second condition will also be met if we use all the data types in $ options “canonically”, i.e. avoid strings "1"instead of numbers 1(although in PHP two such values ​​are equal, but their serialized representation is different). The function is md5used to “compress” the data: the memcached key has a length limit, and the serialized view may be too long.

    Memcached clustering


    Instead of a single memcached server, a cluster of such servers is used to distribute the load and achieve fault tolerance. Servers included in the cluster can be configured with different amounts of memory, and the total cache size will be equal to the sum of the cache sizes of all memcached members of the cluster. The memcached process can be started on a server where the processor is poorly used and the network is not loaded to the limit (for example, on a file server). With a high processor load, memcached may not be able to respond quickly enough to requests, which leads to degradation of the service.

    When working with a cluster, the keys are distributed among the servers, that is, each server processes part of the total array of project keys. Fault tolerance follows from the fact that if one of the servers fails, the keys will be redistributed to the remaining cluster servers. In this case, of course, the contents of the failed server will be lost (see the section "Key Loss"). If necessary, important keys can be stored not on one server, but duplicated on several, so you can minimize the consequences of a server crash due to storage redundancy.

    With clustering, the question of key distribution becomes relevant: how to most efficiently distribute keys across servers. To do this, it is necessary to determine the key distribution function, which by key returns the server number on which it should be stored (or server numbers, if storage occurs with redundancy).

    Historically, the first distribution function was the module function:

    f(ключ) = crc32(ключ) % количество_серверов

    This function ensures that keys are distributed evenly across servers, but problems arise when reconfiguring a memcached cluster: changing the number of servers moves a significant part of the keys across the servers, which is equivalent to losing a significant part of the keys.

    An alternative for this function is the consistent hashing mechanism, which, when reconfiguring the cluster, preserves the key positions across the servers. This approach was first implemented in memcached clients by Last.fm developers in April 2007.



    The essence of the algorithm is as follows: we consider a set of integers from 0 to 2 32 , “twisting” the numerical axis into a ring (glue 0 and 2 32) To each server from the pool of memcached servers we associate a number on the ring (figure on the left, servers A, B and C). The key is hashed into a number in the same range (blue dots 1-4 in the figure), as the server for storing the key, we select the server at the point closest to the key point in a clockwise direction. If the server is removed from the pool or added to the pool, the server point appears or disappears on the axis, as a result of which only part of the keys moves to another server. Figure 2 on the right shows the situation when server C was removed from the server pool and a new server D. was added. It is easy to see that keys 1 and 2 did not change the bindings to the servers, and keys 3 and 4 moved to other servers. In fact, 100-200 points on the axis are assigned to one server (in proportion to its weight in the pool),

    This algorithm was implemented in many memcached clients in various programming languages, however, the implementation sometimes differs in details, which leads to hash incompatibility. This fact makes consistent hashing inconvenient for use when accessing the same pool of memcached servers from various programming languages. The simplest algorithm with crc32 and the module is implemented in all clients in all languages ​​in the same way, which ensures the same key hashing on the servers. Therefore, in the absence of the need to access memcached from various clients in different programming languages, an approach with consistent hashing looks more attractive.

    Materials

    1. http://weblogs.java.net/blog/tomwhite/archive/2007/11/consistent_hash.html
    2. http://www.lastfm.ru/user/RJ/journal/2007/04/10/rz_libketama_-_a_consistent_hashing_algo_for_memcache_clients

    Also popular now: