DmitryKoterov April 13, 2009 at 19:48

Dklab_Cache: tags in memcached, namespaces, statistics

Memcached community made many attempts to write “native” patches for memcached code, adding tag support to it. The most famous of these patches is the memcached-tag project. Unfortunately, memcached-tag is still very far from the stable version: it is not difficult to write a script that causes the patched memcached server to freeze. It seems that at the time of writing this article there is no reliable solution to the problem of tagging at the level of the memcached server itself.

Dklab_Cache Library

Dklab_Cache is a (mostly) key tagging support library for memcached using the Zend Framework interfaces. The library itself is written in pure PHP. Here is the complete list of library features:

Backend_TagEmuWrapper: tags for memcached and any other Zend Framework backend caching systems;
Backend_NamespaceWrapper: namespace support for memcached and others;
Backend_Profiler: calculation of statistics on the use of memcached and other backends;
Frontend_Slot, Frontent_Tag: a framework for high-level construction of caching systems in complex projects.

Actually, to support tags there is a TagEmuWrapper class. It is a decorator ("wrapper") for the backend caching classes of the Zend Framework. In other words, you can use it to “transparently” add tag support to any Zend Framework caching subsystem. We will consider the backend for working with memcached: Zend_Cache_Backend_Memcached, but if your project uses some other backend class, you can connect the tagging to it without any special features.

TagEmuWrapper implements the standard Zend_Cache_Backend_Interface backend interface, so from the point of view of the calling system, it is itself a cache backend. In general, Zend Framework is good because at the interface level it supports tags from the very beginning! For example, the save () method already has a parameter that allows you to tag the key. However, none of the backends in the Zend Framework support tags: an attempt to add a tag to a certain key raises an exception (in particular, for Zend_Cache_Backend_Memcached).

Technical details, documentation, and usage examples can be found here: dklab.ru/lib/Dklab_Cache

What are tags?

Working with a typical caching system (including memcached) consists of three main operations:

save ($ data, $ id, $ lifetime): save the data $ data in the cache cell with the key $ id. You can specify the "lifetime" of the key $ lifetime; after this time, the data in the cache will “go bad” and be deleted.
load ($ id): load data from the cell with the key $ id. If data is not available, false is returned.
remove ($ id): clear the cache cell with the key $ id.

Suppose we want to cache a long SQL query to quickly display part of a page. In this case, we check whether there is an entry in the cache cell corresponding to this request. If the cell is empty, data is downloaded from the DBMS and stored in the cache for possible future retrievals.

if (false === ($ data = $ cache-> load ("key"))) {
    $ data = executeHeavyQuery ();
    $ cache-> save ($ data, "key");
}
display ($ data);

Unfortunately, in its pure form, this approach can be applied not so often. The fact is that the data in the database can change, and we must somehow clear the cache cell so that the user sees the results of these changes immediately. You can use the remove () method with the key specified, but in many cases at the time of updating the data we just don’t know in which cells they are cached.

The problem, in fact, is much more complicated. In heavily loaded systems, data is added to tables several (hundreds) times per second. Therefore, the logic of tracking dependencies and checking which cache cells need to be cleared and which ones not, becomes extremely complex (or even completely impossible).

Taggingprovides a solution to this problem. Each time the data is written to some cache cell, we mark them with tags - marks representing the dependence of this data on other parts of the system. Tags seem to allow you to combine cells into multiple intersecting groups. In the future, we can give the command "clear all cells marked with a specific tag."

Let's modify the previous example using tags. Suppose that the SQL query is significantly dependent on the ID of the current user $ loggerUserId, so each such user is allocated a separate cell with the name "key _ {$ loggedUserId}". However, the data also depends on the ID of another person $ ownerUserId whose profile the current user is viewing. In this case, we can mark the cell with a tag associated with the user $ ownerUserId:

if (false === ($ data = $ cache-> load ("key _ {$ loggedUserId}"))) {
    $ data = loadProfileFor ($ loggedUserId, $ ownerUserId);
    $ cache-> save ($ data, "key _ {$ loggedUserId}", array ("profile _ {$ ownerUserId}");
}
display ($ data);

Now, if the data in the $ ownerUserId user profile changes (for example, the person changed his name), we just need to give a command to clear the tag associated with this profile:

$ cache-> clean (Zend_Cache :: CLEANING_MODE_MATCHING_TAG, array ("profile _ {$ ownerUserId}");

Please note that the cache cells of all other users will not be affected: only those that depend on $ ownerUserId will be cleared.

Actually, the phrase “mark cell C with tag T” means the same as the statement “cell C depends on the data described as T”. Tags are dependencies, nothing more.

A small digression: about code dependencies

Before continuing the story about tags, let's go back a bit and talk about a more general concept - about dependencies. What are these addictions? In a typical case (even without using tags), we have to refer several times to the cache key in order to work effectively with the data:

if (false === ($ data = $ cache-> load ("profile _ {$ userId}"))) {
    $ data = loadProfileOf ($ userId);
    $ cache-> save ($ data, "profile _ {$ userId}", array (), 3600 * 24); // 24 hour caching
}
display ($ data);

and then in a completely different part of the program:

$ cache-> remove ("profile _ {$ userId}");

As you can see, the phrase "profile _ {$ userId}" has to be repeated as many as three times. And if in the first case we can remove the repetition at the cost of introducing a new variable:

$ cacheKey = "profile _ {$ userId}";
$ cacheTime = Config :: getInstance () -> cacheTime-> profile;
if (false === ($ data = $ cache-> load ($ cacheKey))) {
    $ data = loadProfileFor ($ userId);
    $ cache-> save ($ data, $ cacheKey, array (), $ cacheTime);
}
display ($ data);

... then in the second part of the program we can’t get rid of the knowledge of how the caching key is built and what parameters it depends on.

Important note
The line “profile _ {$ userId}” is knowledge, and one should not underestimate the harm of spreading this knowledge to an unnecessarily large number of independent places. In our example, knowledge is very simple, but in practice, the cache key may depend on dozens of different parameters, some of which even need to be loaded from the database upon request.

The situation is in reality even worse than it might seem.

Who can guarantee that the current user ID is stored in the $ userId variable, and not some garbage? But what if someone tries to substitute incorrect data there? Obviously, the cache key does not really depend on the user ID, but on that user. An attempt to use anything other than a user object to generate a key is obviously erroneous, but this restriction is not explicitly expressed in the program.
We should not store the caching time directly in the code, but somewhere in the system configuration (see the previous example) so that it can be changed without touching the code. This is another dependency on the role of the cache cell and the profile line.

How it works in Dklab_Cache

Instead of a long explanation, I’ll immediately give an example of using a Slot class built in accordance with the ideology of Dklab_Cache_Frontend.

$ slot = new Cache_Slot_UserProfile ($ user);
if (false === ($ data = $ slot-> load ())) {
    $ data = $ user-> loadProfile ();
    $ slot-> save ($ data);
}
display ($ data);

To clear the cache:

$ slot = new Cache_Slot_UserProfile ($ user);
$ slot-> remove ();

What is better?

The knowledge about the algorithm for constructing the cache key is enclosed in a single place - in the Cache_Slot_UserProfile class.
It also contains knowledge of the cache lifetime. In our case, we set it explicitly, but no one bothers to take the lifetime from the configuration parameter, the name of which coincides with the name of the slot class.
The parameter $ user of the constructor of the Cache_Slot_UserProfile class is typed. This means that we cannot “palm off” anything other than the correct user object to the slot class. Naturally, the dependence can be on several objects; all this is determined by the constructor parameters.

You have to write as many custom slot classes as there are types of cache storage in your program. This disciplines: looking at the Cache / Slot directory, you can immediately see how many different caches are used in the program, as well as what they depend on.

Well, now, actually, about the tags

Slots, among other things, support tagging. Here is an example of using tags for end-to-end caching (of course, you can use “pass-through”).

$ slot = new Cache_Slot_UserProfile ($ user);
$ slot-> addTag (new Cache_Tag_User ($ loggedUser);
$ slot-> addTag (new Cache_Tag_Language ($ currentLanguage);
$ data = $ slot-> thru ($ user) -> loadProfile ();
display ($ data);

You have to create as many tag classes as there are various kinds of dependencies in your system. Tag classes are especially convenient when it comes time to clear some tags:

$ tag = new Cache_Tag_Language ($ currentLanguage);
$ tag-> clean ();

As you can see, knowledge about the dependencies of tags is again stored in a single place. Now you simply cannot accidentally “miss” and clear the wrong tag: the system will give an error either about a class that does not exist, or about the wrong type of constructor parameter.

Conclusion

This article talks about everything at once: cache tagging, cache dependencies in the code, and the method of abstraction from the Slot and Tag cache storage implemented in the library.

You can download library sources and examples here: dklab.ru/lib/Dklab_Cache

Tags: