Memcached - Caching Strategy
I want to greet the habrosociety. Of the pleasant impressions when registering on Habré - this is the atmosphere of fairy tales, which happens only in the good old tales from the Soviet Motion Picture.
So, the tears of tenderness have passed, proceed. Below is the topic that led to the invite to Habr.
Memcached is used to cache data. This is done in order to avoid unnecessary calls to the database, i.e. Memcached saves query results. This speeds up the site and reduces the time it takes to render pages.
Cache has advantages in addition to advantages. One of the cache problems is its relevance. In the "read-only" mode of operation, there are no difficulties. If we are dealing with data that changes, or changes frequently, the caching efficiency drops sharply. The more often the data changes, the less efficient the cache. Usually the cache is flushed after the first change. Moreover, all cached data is reset immediately. After the reset, the queries go to the database and the cache is replenished in a new way. If there is another change, then the cache is reset again. It often turns out that such a good thing as memcached does not bring any benefit to server performance, and also entails additional costs for memory and processor time.
One of the methods for solving this problem is the logical division of the cache into independent parts. If a cache flush occurs, then only for the part that has changed.
Let us consider one of such approaches in the Memcached bundle - the database.
If we make logical separation by requests, the question arises of how and what to share, how often to update. Here you need to give hints for each request, since the purpose of the requests is different and it is not clear which requests to update and under what events. It takes a lot of effort to implement - and to me, as a lazy programmer, this is not interesting.
Let's separate all the database calls by tables.
Suppose we have a query with access to several tables. We take a query, analyze what tables are in it, see if the data in the table has changed. If the data has changed, then the cache for the request is also updated. It sounds a little complicated, questions arise - how to do it all, but in the end, the implementation is quite simple.
Let's sketch:
* Each table we have will have a counter that changes every time the data in the table changes.
* When we delete and insert rows, when records change, we increase these counters.
* Before executing the query, a list of affected tables is taken from it. From these tables we find the values of the counter. We form these values in one line and add as a comment to the request.
That's all the difficulties with this approach. To move on to the new caching policy, we just need to make small changes to the code. An example demonstrating this approach is provided below. This example is completely independent and can be executed if you have PHP with support for the mysql and memcache extensions.
This approach increases the efficiency of data caching. When you reset the cache, only data that refers to the changed tables is deleted. To be more specific, the words "flushing the cache" lose their meaning, the changed data becomes inaccessible and the cache continues to be filled with new keys for the same requests. If you have a “ugly” table, because of which the entire cache is often flushed, now such a table will not spoil the whole picture for you.
The method is viable, it was tested on one of the sites (http://www.skachatreferat.ru ). Experience has shown that other caching methods should not be neglected. That for data whose relevance is not critical at a refresh rate of once every 5 minutes, it is better to use the simplest caching with setting the cache lifetime in a given period, in this case 5 minutes.
Take habrahabr, which provides access to articles. Here, each article is a text field and a set of some attributes. Text rarely changes, while article attributes change often. For this reason, it makes sense to put only the text of the article in the cache, and the attributes are independently selected from the tables. As a result, the speed of data access grows by an order of magnitude.
The fewer columns we select, the better for performance. MySQL works with columns with data of simple type an order of magnitude faster than with columns of type TEXT (where we store the text of the article). By using these features, significant performance gains are achieved.
Below is a script to demonstrate the method of dividing the cache into tables, the source of which was promised to you. The script is completely independent and does not require any additional modules. Do not forget to specify the data for mysql and memcache at the beginning of the script:
source here: www.skachatreferat.ru/demo.txt
So, the tears of tenderness have passed, proceed. Below is the topic that led to the invite to Habr.
Memcached is used to cache data. This is done in order to avoid unnecessary calls to the database, i.e. Memcached saves query results. This speeds up the site and reduces the time it takes to render pages.
Cache has advantages in addition to advantages. One of the cache problems is its relevance. In the "read-only" mode of operation, there are no difficulties. If we are dealing with data that changes, or changes frequently, the caching efficiency drops sharply. The more often the data changes, the less efficient the cache. Usually the cache is flushed after the first change. Moreover, all cached data is reset immediately. After the reset, the queries go to the database and the cache is replenished in a new way. If there is another change, then the cache is reset again. It often turns out that such a good thing as memcached does not bring any benefit to server performance, and also entails additional costs for memory and processor time.
One of the methods for solving this problem is the logical division of the cache into independent parts. If a cache flush occurs, then only for the part that has changed.
Let us consider one of such approaches in the Memcached bundle - the database.
If we make logical separation by requests, the question arises of how and what to share, how often to update. Here you need to give hints for each request, since the purpose of the requests is different and it is not clear which requests to update and under what events. It takes a lot of effort to implement - and to me, as a lazy programmer, this is not interesting.
Let's separate all the database calls by tables.
Suppose we have a query with access to several tables. We take a query, analyze what tables are in it, see if the data in the table has changed. If the data has changed, then the cache for the request is also updated. It sounds a little complicated, questions arise - how to do it all, but in the end, the implementation is quite simple.
Let's sketch:
* Each table we have will have a counter that changes every time the data in the table changes.
* When we delete and insert rows, when records change, we increase these counters.
* Before executing the query, a list of affected tables is taken from it. From these tables we find the values of the counter. We form these values in one line and add as a comment to the request.
That's all the difficulties with this approach. To move on to the new caching policy, we just need to make small changes to the code. An example demonstrating this approach is provided below. This example is completely independent and can be executed if you have PHP with support for the mysql and memcache extensions.
This approach increases the efficiency of data caching. When you reset the cache, only data that refers to the changed tables is deleted. To be more specific, the words "flushing the cache" lose their meaning, the changed data becomes inaccessible and the cache continues to be filled with new keys for the same requests. If you have a “ugly” table, because of which the entire cache is often flushed, now such a table will not spoil the whole picture for you.
The method is viable, it was tested on one of the sites (http://www.skachatreferat.ru ). Experience has shown that other caching methods should not be neglected. That for data whose relevance is not critical at a refresh rate of once every 5 minutes, it is better to use the simplest caching with setting the cache lifetime in a given period, in this case 5 minutes.
Take habrahabr, which provides access to articles. Here, each article is a text field and a set of some attributes. Text rarely changes, while article attributes change often. For this reason, it makes sense to put only the text of the article in the cache, and the attributes are independently selected from the tables. As a result, the speed of data access grows by an order of magnitude.
The fewer columns we select, the better for performance. MySQL works with columns with data of simple type an order of magnitude faster than with columns of type TEXT (where we store the text of the article). By using these features, significant performance gains are achieved.
Below is a script to demonstrate the method of dividing the cache into tables, the source of which was promised to you. The script is completely independent and does not require any additional modules. Do not forget to specify the data for mysql and memcache at the beginning of the script:
- header('Content-type: text/html; charset=UTF-8');
- $mysql_host='localhost';
- $mysql_username='root';
- $mysql_password='12345';
- $mysql_database='test';
- //укажите имена двух таблиц, эти таблицы не изменяются в этом примере
- $mysql_table1='table1';
- $mysql_table2='table2';
- $memcache_host='localhost';
- $memcache_port=11211;
- $mysql=mysql_connect($mysql_host,$mysql_username,$mysql_password);
- if(!$mysql)
- die("Невозможно подсоединиться к MySQL: $mysql_username@$mysql_host/$mysql_password");
- if(!mysql_select_db($mysql_database))
- die("Невозможно подсоединиться к базе данных: $mysql_database");
- $memcache = new Memcache;
- if(!$memcache->pconnect($memcache_host,$memcache_port))
- die("Memcached не доступен: $memcache_host:$memcache_port");
- function cacheGet($key)
- {
- global $memcache;
- return $memcache->get($key);
- }
- function cacheSet($key,$data,$delay)
- {
- global $memcache;
- return $memcache->set($key,$data,0,$delay);
- }
- function sqlExtractTables(&$query)
- {
- preg_match_all("/\\<\\<([A-Za-z0-9\\_]+)\\>\\>/",$query,$tables);
- if(!$tables[1])
- die("Запрос не содержит таблиц, доступные для распознавания вида '<
>': $query"); - $query=preg_replace("/\\<\\<([A-Za-z0-9\\_]+)\\>\\>/","\\1",$query);
- return $tables[1];
- }
- function sqlQuery($query)
- {
- $resource=mysql_query($query);
- if(!$resource)
- die("Неправильный запрос: $query
".mysql_error());- echo "Запрос был выполнен:$query
";- return $resource;
- }
- function sqlSet($query)
- {
- $tables=sqlExtractTables($query);
- foreach ($tables as $table)
- cacheSet($table,uniqid(time(),true),24*3600);
- return sqlQuery($query);
- }
- function sqlGet($query)
- {
- $tables=sqlExtractTables($query);
- foreach ($tables as $table)
- $appendix.=cacheGet($table);
- $appendix="/*".md5($appendix)."*/";
- $query=$query.$appendix;
- $cache_key=md5($query);
- $result=cacheGet($cache_key);
- if($result!==false)
- {
- echo "Попадание в кеш:$query
";- return $result;
- }
- else
- echo "Кеш не сработал:$query
";- $resource=sqlQuery($query);
- $result=array();
- while ($row = mysql_fetch_assoc($resource))
- {
- $result[]=$row;
- }
- cacheSet($cache_key,$result,3600);
- return $result;
- }
- ?>
Демонстрация. Разделение кешированных запросов по таблицам
Делаем 2 запроса
- sqlGet("select * from <<$mysql_table1>> limit 1");
- //обычно это селекты вида "select * from <<$mysql_table1>> where id=1", здесь так дано чтобы не надо было привязываться к конкретным столбцам
- ?>
- sqlGet("select * from <<$mysql_table2>> limit 1");
- ?>
Меняем одну из таблиц
- sqlSet("delete from <<$mysql_table2>> where 1=0");
- ?>
Выполняем те же запросы опять
- sqlGet("select * from <<$mysql_table1>> limit 1");
- ?>
- sqlGet("select * from <<$mysql_table2>> limit 1");
- ?>
Результат: второй запрос должен быть выполнен снова, минуя кеш. Первый запрос продолжает браться из кеша
source here: www.skachatreferat.ru/demo.txt