Under heavy load: our Tarantool applications
Many of you have already heard about our Tarantool project .. This is a DBMS, or, simply put, a database with an application server inside. Tarantool is an open source project and anyone can work with it. This project has been developing for more than eight years. More than half of the products are actively used in Mail.Ru Group Tarantool: in Mail, Cloud, My World, Agent, etc. All the modifications we made to this database we commit back to GitHub, and the community has the same version of the database that we have . Now we have client libraries for almost all languages, we have added a lot in this direction over the past year. Some of them were written by the community, some by us. If some kind of more efficient library appears, then we just make it official. We try to make everything right out of the box - both the database and the library.
One of the main features of Tarantool is the combination of database and cache properties. A database is something reliable, with transactions, a server-side query language. And the cache is fast. And both of these worlds organically merge into Tarantool. This database is intended for use in highly loaded projects and for working with hot data.
Comparison with classic solutions
If you work with "traditional" databases, for example MySQL or Oracle, then you probably came across the fact that your system lacks cache properties: high speed, small Latency, and much more. In traditional databases, all this is not. Caches also have their drawbacks, including the lack of transactions. Even if you use cache + databases, for example, MySQL in conjunction with Memcached or PostgreSQL in conjunction with Redis, this still leads to compromises: you partially lose database properties, for example, there are no transactions, no storages, secondary indexes. Also, some cache properties are lost, for example, a large write throughput. At the same time, new problems appear, the most serious of which are data inconsistency and a cold start.
But if this compromise does not suit you and you need all the advantages of the cache and the database, then pay attention to Tarantool. He is deprived of all these shortcomings. Tarantool is very simple. Roughly speaking, it stores two files on disk: a snapshot of data at some point in time and a log of all transactions from that moment in time. We tested the cold start speed. Reading files into memory from a magnetic disk occurs at a speed of 100 Mb / s. That is, for example, a 100 GB database is considered for 1000 s - about 15 minutes.
For comparison, when we played with MySQL and PostgreSQL, everything was much worse there. They store data on disk. There is no such problem that the database does not respond until everything is loaded into memory. But their cache warms up much more slowly (1-2 Mb / s), and therefore you need to resort to different tricks, like pre-heating the index. Those who administer MySQL are well aware of this. Tarantool just gets up and running. Cold start time is the smallest possible.
Nevertheless, not everything satisfies us in this database. The first thing we are working on is disk storage. Tarantool was originally created as an In-Memory database. Due to its speed and low demand for servers, it is still better at cost of ownership than traditional disk databases. But since Tarantool is an In-Memory database, the question arises: what to do with cold data? It works efficiently with hot data, but everything lies in memory, including cold data. Therefore, we are developing disk storage. By the way, in our production all Tarantools work on the cheapest SATA disks. SSD can only be installed for a quick start, but if there are replicas, this is irrelevant.
So far, we are not doing anything with cold data. Their ballast is more than paid for by the speed of work and the insanely small number of servers. For example, user profiles are processed by only eight Tarantools, and in MySQL it would be a farm of thousands of servers. But if we had better work with extruding cold data, then Tarantools could have been not eight, but four.
We also develop automatic cluster solutions. We now have several of them, but they are not universal. And we want to make one right universal so that you can put Tarantool on ten servers, and inside everything was shard, resolved, replicated, and so that the head does not hurt.
In addition, we make support for various systems, such as SQL. Again, she is still in an unstable state, but we have high hopes. SQL support is mainly needed so that you can easily migrate. In the same Mail.Ru Mail, there are a hundred MySQL servers, whose load can be transferred to a couple of Tarantools. But since there is no SQL support, you need to rewrite a ton of code. So it’s easier to make support once.
We use our own allocator, such as a Slab allocator, to minimize the effect of memory fragmentation. But it is still imperfect, we are constantly working to improve it.
How to calculate the amount of memory for Tarantool
Tarantool has a very good Memory Footprint, which means a little overhead for storing data. The size of the data on disk (or in memory) is only slightly larger than the size of the clean data. If you need to store 1 billion lines, each of which has ten fields, the field size is four bytes, then this will be 4 x 10 x 1 billion plus 1-10% of the overhead for control structures.
Our user cases
Mail.Ru Group uses Tarantool to solve a variety of problems - a total of several hundred installations of this database will be typed, three of which are the most loaded: an authentication system, a push notification system and an ad display system. I’ll tell you more about each of them.
Authentication system by login and password
Session / token authentication system
Probably every site and mobile application has such a system. It's about checking the login password or session. This is a central system, our entire portal uses it to authenticate users. This system has very interesting requirements that may even seem incompatible:
- Demand. Each page, each Ajax request, each call to the mobile API accesses this system to authenticate.
- Low response time. Users do not like to wait, they want to quickly get all the information. That is, each call should be quickly processed.
- High availability. The authentication system should not be the cause of error 500. If it cannot serve the request, then the user is not serviced at all, because the whole stream of server request execution does not go further.
- Constant access to the repository. Each hit of the authorization system is a session or login-password check, i.e. a certain Select in the database, and sometimes even Update. There are also anti-bruteforce and anti-fraud systems - you need to check for every hit whether the user is referring with good intentions. Each hit of the authorization system can update something, for example, the last time of the session. If this is authorization by login and password, then you need to create a session, and therefore, make Insert into the database. For anti-bruteforce verification, you need to record the user's location (IP address or something else). That is, there are a lot of reading and writing processes. They are constantly trying to break into the authorization system, which creates an additional load, because each time it turns to the database, then to refuse the attacker authorization.
- Large amount of data. This system should have information about all users.
- Data must be expired quickly. In fact, these are also updates. For example, user sessions should expire.
- Persistence. Everything should be saved to disk, every change. User sessions cannot be stored in Memcached because you will lose them when the server crashes, and users will have to re-enter their username and password. And they do not like to do this.
Some of these requirements are only satisfied if you use the cache. For example, high load, expiration and other things that are typical for the cache. Other requirements are only satisfied if you use a database. Therefore, the system should be based on both the cache and the database, combined into one solution. It should be reliable and durable, like a truck, but at the same time as fast as a sports car.
Now the load of checking login passwords for our authentication system is approximately 50 thousand requests per second. It seems that not so much, but for each request a lot of work needs to be done, including checking anti-bruteforce, performing many transactions in the database, etc. Tarantool successfully copes with all this.
But the number of authentications per session reaches 1 million per second. This is what comes from the entire portal. Only 12 servers hold this load: four with sessions and eight with user profiles. Moreover, they are loaded only by 15–20%, that is, the safety margin is very large. It’s just that we like to usually reorder.
Push notification system
Now more and more users are moving to the mobile segment. They mainly use applications there, and not the usual mobile web. And in applications there is such a thing as push notifications. When an event occurs on the server side and you need to notify the mobile device about it, how is it usually arranged? The most mobile application does not need to keep a connection to the server, this happens at the level of the operating system, which connects to the corresponding web gate and periodically checks for push notifications. That is, the server code goes to a special API from iOS and Android, which themselves themselves communicate with operating systems on mobile devices.
To connect to these APIs and send data, you need to somehow identify the user, therefore, a token is sent. The token must be stored somewhere. Moreover, a single user needs several tokens, because he can have several devices. And you need to forward the token to each event on the server about which you want to notify the user. And there are many such events. The more often you notify the user, the more often he uses your application. Therefore, for a push notification system, you need a fast and reliable database.
We used Tarantool just because we have a huge number of requests and transactions, to send push we need to do a lot of checks. And do it quickly. We can not slow down in this place, because it is Server Side, on the work of which many processes consuming a lot of memory depend.
Do you think it’s good if Server Side connects directly to Android or iOS? This is bad for several reasons. First, from an architectural point of view - because you are losing versatility. After all, Windows Mobile or someone else may appear, the development complexity will increase, you will need to refine a bunch of systems. Secondly, you have an additional point of failure, and the whole interaction is much more complicated. And thirdly, these mobile APIs can slow down or crash. They do not guarantee a quick response, may respond for a few seconds. Therefore, we need some kind of layer, a queue where all the changes are placed, and from there they fly to Apple and Google, in their API. We cannot lose these notifications. So the queue should store everything on disk, but be very fast. Tarantool fully meets these criteria. Our system can withstand a fairly large load - 200 thousand requests per second, both writing and reading. Each call to the queue is a record, a transaction that is replicated to multiple replicas. Nevertheless, everything works very quickly.
Ad serving system
We have a large portal, and we show the user ads on almost every page. This process is controlled by an ad display system called Target. One of the main problems of the advertising system is that it should work superfast and hold a super-large load. Even more than an authentication system. Because sessions are a call to the database, there can be several calls to each call.
Advertising is shown not only on our pages, but also on the pages of partners, and this is also a very heavy load. Let's say there are a dozen ad units per page. For each of them, you need to go to data sources with information about the user profile, aggregate the result, determine which advertisement to show, display it. And to do all this quickly (the standard now is 50 ms), because users do not like to wait. In addition, advertising does not carry any functionality for the user, and it certainly cannot justify the slow operation of services.
Our ad serving system is one of the largest and heaviest Tarantool clusters in the world: 3 million operations and about 1 million transactions (updates) are performed every second.
Tarantool was born for high load. If your load is low, then it will provide a good response time - one millisecond or less. Traditional databases, even at low load, are not able to give an answer at such a speed. And often you need to make a few calls, all these milliseconds add up, and it turns out quite sadly. Tarantool will provide you with high RPS, low Latency, high Uptime, help you squeeze out all the iron juices you can, and at the same time you will have a database with transactions, replications and server procedures.