About caching resource-intensive SQL queries on a web server
In this article, I will try to describe a common mistake made by the creators of caching systems.
It all started back in the old days when I managed sites that were hosted on FreeBSD jail, which was very limited in resources. Why is that? Because I used the pdflib extension to display reports and printed forms, which was not in the set of extensions on standard hosting. I compiled my apache and php there, uploaded the documents there and the site worked.
Everything was fine until there was a need to show on the pages of the site top10 goods sold in the store. The SQL query that created the desired data set took about 10 seconds to complete. Keys, explain and any other shamanism did not help. It was necessary to do data caching. And I, seeing how others do it, wrote code that cached the request data.
What did my code do.
1. Checked if there is an object in the cache with the necessary data, and, if so, took it from the cache and used the received data. If it was not there, I executed the code that generated the necessary data and saved it in the cache.
2. After using the data, the “garbage collection” procedure was launched. Objects with expired lifetimes were deleted.
It seems like everything is in order. I tested the cache on a test machine using the ab utility and got encouraging results. After that I uploaded the code to jail. And went to bed quietly.
However, the next day I received a letter from the hosting administration stating that my site was blocked due to the fact that it creates too much load on the SQL server.
The key to the solution was given to me by the graphs of the load on the server. They showed spasmodic load growth with a frequency approximately equal to the lifetime of cache objects. What really happened? Everything is very simple.
When the cache object expired, the cache object was deleted when the request was executed. The following http request started the process of creating a cache object, which lasted for some time and performed an expensive query to the SQL server. During this time, another http request occurred. Which also launched the procedure for creating a cache object. The load on the server doubled, which led to an increase in the execution time of SQL queries by a factor of two. During the increased waiting time for executing SQL queries, another http query occurred. Etc.
How to avoid this?
1. A process that finds that a cache object is out of date should not delete it.
2. The process that executes the request, re-creating the cache object must set the flag so that other processes do not start the update procedure.
3. After receiving fresh data, the cache object must be replaced by an atomic (fast) operation, and after that the flag is unchecked.
As a homework, please check how your caching system is built.
It all started back in the old days when I managed sites that were hosted on FreeBSD jail, which was very limited in resources. Why is that? Because I used the pdflib extension to display reports and printed forms, which was not in the set of extensions on standard hosting. I compiled my apache and php there, uploaded the documents there and the site worked.
Everything was fine until there was a need to show on the pages of the site top10 goods sold in the store. The SQL query that created the desired data set took about 10 seconds to complete. Keys, explain and any other shamanism did not help. It was necessary to do data caching. And I, seeing how others do it, wrote code that cached the request data.
What did my code do.
1. Checked if there is an object in the cache with the necessary data, and, if so, took it from the cache and used the received data. If it was not there, I executed the code that generated the necessary data and saved it in the cache.
2. After using the data, the “garbage collection” procedure was launched. Objects with expired lifetimes were deleted.
It seems like everything is in order. I tested the cache on a test machine using the ab utility and got encouraging results. After that I uploaded the code to jail. And went to bed quietly.
However, the next day I received a letter from the hosting administration stating that my site was blocked due to the fact that it creates too much load on the SQL server.
The key to the solution was given to me by the graphs of the load on the server. They showed spasmodic load growth with a frequency approximately equal to the lifetime of cache objects. What really happened? Everything is very simple.
When the cache object expired, the cache object was deleted when the request was executed. The following http request started the process of creating a cache object, which lasted for some time and performed an expensive query to the SQL server. During this time, another http request occurred. Which also launched the procedure for creating a cache object. The load on the server doubled, which led to an increase in the execution time of SQL queries by a factor of two. During the increased waiting time for executing SQL queries, another http query occurred. Etc.
How to avoid this?
1. A process that finds that a cache object is out of date should not delete it.
2. The process that executes the request, re-creating the cache object must set the flag so that other processes do not start the update procedure.
3. After receiving fresh data, the cache object must be replaced by an atomic (fast) operation, and after that the flag is unchecked.
As a homework, please check how your caching system is built.