Multi-threaded implementation of the CART caching algorithm

Some time ago, I had the task of caching queries to a large database on disk in a highly loaded multithreaded application (C ++). The application itself was intended to be deployed in the cloud and the disk would obviously become the bottleneck. The base at the same time was a huge graph, along which many threads crawled at the same time. Thus, the cache also had to be multi-threaded.

I rejected the idea of ​​using external applications such as memcached right away - this would introduce an inevitable additional lag into each transition along the edge of the graph. The question arose about implementation inside the application.

The documentation helpfully suggested that there is an LRU cache implementation in my TBB library.. Unfortunately, this implementation at that time was in a preview state (where it still is).

I had a choice - to rely on an insufficiently tested implementation, where fixing any bug would be a great adventure, or write your own. Also, a quick look at the available publications of caching algorithms led me to the idea that LRU may not be the most efficient scheme. For heavily loaded applications, even a few percent extra efficiency is bread in itself. After going through all the ones I found, I settled on the CART algorithm .

This scheme combines several useful advantages:
  1. Scan protection. If you go through the entire database once, this will not affect the effectiveness of the cache. Such a situation may arise, for example, when using analytical tools.
  2. Protection against double treatment. It often happens that the same database item is requested twice with a short interval between requests, and then it is not accessed at all. This can lead to a significant decrease in the efficiency of circuits that do not have protection against such behavior, since any element so requested will receive priority in the circuit.
  3. Each hit in the cache does not require any significant operations or maintenance of internal structures (which is a significant drawback of LRU). This is very important for multi-threaded implementation, as it does not require blocking with each treatment.

Let's see how effective the circuit is in practice. Since I did not have enough time to implement and debug other schemes, I will compare the effectiveness with the same LRU implementation from the TBB library.

First, we test the effectiveness of both schemes on a randomly generated sequence of numbers (0 ... 10000) for three cache sizes (100, 500 and 1000):
Less is better.
Random numbers, cache size 100
CART result: 0.99, missed 994950/1005000
LRU result: 0.990057, missed 995007/1005000
Random numbers, cache size 500
CART result: 0.949862, missed 954611/1005000
LRU result: 0.95016, missed 954911/1005000
Random numbers , cache size 1000
CART result: 0.90002, missed 904520/1005000
LRU result: 0.900309, missed 904811/1005000

CART is slightly ahead of LRU in efficiency, but, frankly, completely random access is not a good test. Moreover, in a real application with this type of access, the effectiveness of any cache will be low.

Therefore, I made the second test more like a practical application. Here, the numbers are taken from 6 baskets, each of which has an increasing number of elements, which reduces the likelihood of choosing a specific number. Thus, numbers from 0 to 150 in total have the same probability of being selected as numbers from 5000 to 10000. This tactic is similar to a sampling pattern, for example, from a user database, where often incoming users often pull the database.
Bins draw, cache size 100
CART result: 0.920258, missed 924859/1005000
LRU result: 0.965795, missed 970624/1005000
Bins draw, cache size 500
CART result: 0.721484, missed 725091/1005000
LRU result: 0.84106, missed 845265/1005000
Bins draw , cache size 1000
CART result: 0.581132, missed 584038/1005000
LRU result: 0.71412, missed 717691/1005000

In this case, CART is already showing significantly better results than LRU. Which, in fact, we wanted to achieve. To everyone interested, I suggest downloading an example and launching it yourself.

Find this implementation here . It was tested in a real application under load, which however does not guarantee a 100% absence of possible bugs. Use, so to speak, at your own peril and risk. To integrate into your projects, you only need the files in the Include folder. Dependence on HashMurmur3 can be eliminated by replacing it with any hash function that suits you.

In the dependencies, this implementation has only TBB, but most of those who write multi-threaded applications in C ++, and so use it. As a bonus, a good implementation of a random number generator is attached. Also, the original implementation uses EASTL instead of STL, but I got rid of this dependency before posting the public version.

Sample sources are built under VS2010, but the Linux port should not cause problems. Since I don’t have Linux at hand right now, I would be grateful to the community if someone would spend a little of their time on this and make this port.

PS There are tons of ways to use a good caching scheme. For example, for me, this implementation is also used in accessing Memory Mapped Files, where the file is not mapped entirely, which can lead to a huge consumption of virtual memory on large files, but a limited number of small pieces of 16 MB each. The cache then controls which pieces to eject from memory. You can also write your own memcached in a couple of hundred lines, if such a desire suddenly arises.

Also popular now: