MariaNev May 2, 2015 at 22:38

Our report on "Strike": "How to get rid of persistent database dependencies"

April 10-11, our team took part in the largest in the Russian regions IT conference "Strike", which was held for the fourth time in Ulyanovsk. IT companies presented their stands where they could get acquainted with their products, learn about vacancies, and take part in contests.

This year, XIMAD also decided to present a stand with its products at the conference in connection with the development of the mobile section. This is a profile direction for us. We listened to reports, exchanged experiences with colleagues and answered many questions about the technologies used in our games.

In two days, 130 experts spoke at 8 venues and a total of 150 relevant reports were delivered in various areas. Our developer Aleksey Klyuchnikov presented his report on the topic “Making Interactive in Mobile Games or How to Get Rid of a Persistent Database Dependence” using our flagship Magic Jigsaw Puzzles (2M MAU, 600K DAU and up to 50K online players with interactive interaction).

The following is the text of his speech:

What is the main problem with the server side of the game dev? And the fact that it is hiload! This is not hiload in only one case, if the project is not completed. If the game enters the market, advertising goes, the flow of players goes and the first few days it will be, albeit a small, but quite real hiload. And if you succeed ...

Having visited the reports on high loads at the conference, it became clear that everyone works in approximately the same direction. The first thing any high-load project comes up with is data, and most speakers talked about how to replicate, shard, denormalize, etc. We did not escape this fate, but the path chosen is somewhat different. It is proposed to minimize the work with the database. In fact, get rid of her. How to do it? But very simple.

Idea
We write the server into which the player will log in, the selected process is launched for the player, the profile is loaded from the database into it, and then all the player's actions take place in this process. As a player does not show activity for some time, let it be 10 minutes, save the profile in the database and complete the process. As a result, we have one read from the database with login and one record in the database with logout and that’s it!

What do we want?
We got one reading and one record per game session. So, we can calculate and predict how many players our decision will pull. I think many can figure out how much the simplest key / value plate can give, for example, in mysql, write reads per second. And how many players will stretch such a base per day. The number will turn out to be impressive, and for this number we have fenced ourselves off from database problems. What could be better?

Implementation
We take Erlang for implementation, since it is good to work with processes in it and ... and that’s it.
What Erlang gives us: processes out of the box, they can start, stop and send messages between them. This means that one player process can send a message to another process of another player. And according to the same principle, interact with processes that provide game logic. Interactivity in this case also comes out of the box.

We will deal with nuances in order.

Addressing

Everything is canonical here, each player is assigned a unique identifier during registration and all addressing is carried out later on. Sometimes it may be tempting to use additional keys for addressing, but this is best avoided for the following reasons: addressing is used to send messages when a message is sent offline to the player, we must start the process, load the profile of this player into it and only then send the message to it. But if we use different keys for addressing, we have a chance to get into a difficultly tracked collision when 2 or more processes start for one player.

Process recorder

The logger built into Erlang has serious drawbacks that prevent it from being used for dynamically starting and ending processes, so we take the gproc logger. The registrar is required to register processes and issue their Pid upon request. And at the end of the processes or at their decline to produce their "deregistration".

Process start

As mentioned above, when a message arrives to a player, we turn to the registrar with the question of which Pid to send the message to, if there is no process for such a player, you need to start it. Each operation takes even a short time, but a situation is possible where two messages will come to the same player, at about the same time both of them will receive a negative response from the registrar and try to start the processes. As a result, one of them starts first, and the second gets an exception and the message is lost. We cannot start processes asynchronously and must arrange a queue to start them. To do this, we start a process in which we will direct all our appeals to the registrar and which will start the processes. But we get a bottleneck, so we need not one such “process_starting_worker”, but a pull, for example from 100 wokers,

Process stop

Stopping processes is no less interesting. When it is time to complete the process, we must perform a number of actions, such as saving the profile to the database, checking out the registrar, sending a farewell message to all friends and actually completing the process. All these actions cannot be done one after another, as the player may suddenly come to life, or he may just receive a message while we are engaged in saving the profile. Therefore, after each operation, you need to read the message queue, and if something is found in it, then process it, and in the case before discharge from the registrar, return the process to normal state, and after discharge from the registrar, honestly reply to the sender that the process is unregistered, that the sender will have to resend the message.

Caching and further denormalization

As you can see, our registrar is used for every message and this makes it a bottleneck, so it makes sense to cache Pid's in processes. After the first exchange of messages between the processes, each of the processes remembers the Pid of the opponent, and in the future they communicate without contacting the registrar. That is why, at the end of the process, an action is added to notify all Pids from the cache of its completion, so that everyone can clear their caches from the process being completed.

The second thing to think about is reading optimization. Should a player see his friends play? And how should he receive this information? Each time, interview all his friends, or each friend, upon reaching the result, should “brag” to all his friends who will write this result in their profile and will not generate any queries to display the results of friends, but will give it right away from their profile. Which approach to choose depends on the nature of the use of the data. If reading will be more often than writing, it makes sense to go this way.

Thoughts on scaling

Firstly, we can estimate our load and get that the server under the database + server under our code will stretch several hundred thousand players per day. Erlang fairly fairly uses memory, so if you use an average of 100kb per player profile, then you need 5GB of RAM to serve 50k players. In other words, we take the server at 32-64GB and with a high probability we forget about the need for scaling, to the resounding success of the project.

Secondly, if the resounding success has nevertheless come, then nothing prevents “dumping” the database by the player’s id and distributing the players using CSN to different Erlang nodes. The problem here is only with our registrar, he must be able to work in cluster mode. Gproc can do it, but as tests have shown, it’s not completely. All you need is to patch it up a bit or take another registrar, but this is a separate topic, perhaps for a separate article.

Conclusion
The decision was not as simple as it might seem. There are still a lot of questions about messaging, how to guarantee their delivery, how to roll back, for example, message chains, which messages to transmit synchronously, which asynchronously, etc.

But the most important conclusion from the use of such an architecture is that we shoved the non-pushable and received a workable service. What would be impossible using the classic implementation of each player’s sneeze with a query in the database.

Tags:

Our report on "Strike": "How to get rid of persistent database dependencies"

Also popular now: