
Do databases live in Kubernetes?

Somehow it happened historically that the IT industry for any reason is divided into two conditional camps: which are for and which are against. Moreover, the subject of controversy can be absolutely arbitrary. Which OS is better: Win or Linux? On an Android or iOS smartphone? Keep everything in the clouds or upload to cold RAID storage and put screws in a safe? Do PHP schnicks have the right to be called programmers? These disputes are, at times, exclusively existential in nature and have no other basis besides sports interest.
It just so happened that with the advent of containers and all this beloved kitchen with docker and conditional k8s, disputes “for” and “against” the use of new opportunities in various areas of the backend began. (We will make a reservation in advance that although most often Kubernetes will be indicated as an orchestra in this discussion, the choice of this instrument does not matter. Instead, you can substitute any other one that seems most convenient and familiar to you.)
And, it would seem, it would be a simple argument about the two sides of the same coin. As senseless and merciless as the eternal confrontation Win vs Linux, in which adequate people quite exist for themselves somewhere in between. That's just in the case of containerization is not so simple. Usually in such disputes there is no right side, but in the case of “apply” or “do not apply” containers for storing the database everything gets turned upside down. Because in a certain sense both supporters and opponents of such an approach are right.
Bright side
You can briefly describe the arguments of the Bright Side with one phrase: “Hello, 2k19 outside the window!” It sounds like populism, of course, but if you delve into the situation in detail, it has its advantages. We’ll analyze them now.
Suppose you have a large web project. It could initially be built on the basis of a microservice approach, or at some point it came to it in an evolutionary way - this is not very important, in fact. You scattered our project on separate microservices, set up orchestration, load balancing, scaling. And now, with a clear conscience, drink mojito in a hammock during a habr effect, instead of raising the fallen servers. But all actions must be consistent. Very often only the application itself is containerized - the code. What else do we have besides the code?
Right, data. The heart of any project is its data: it can be either a typical DBMS - MySQL, Postgre, MongoDB, or storage used for searching (ElasticSearch), key-value storage for caching - for example, redis, etc. Now we don’t we’ll talk about crooked backend implementation options when the database crashes due to poorly written queries, and instead we’ll talk about ensuring the fault tolerance of this very database under client load. Indeed, when we containerize our application and allow it to scale freely to handle any number of incoming requests, this naturally increases the load on the database.
In fact, the channel for accessing the database and the server on which it rotates become the eye of the needle in our beautiful containerized backend. At the same time, the main motive for container virtualization is the mobility and plasticity of the structure, which make it possible to organize the distribution of peak load across the entire infrastructure available to us as efficiently as possible. That is, if we do not containerize and roll all the available elements of the system into a cluster, we make a very serious mistake.
It is much more logical to cluster not only the application itself, but also the services responsible for data storage. When clustering and deploying independently working and distributing the load of web servers in k8s, we already solve the problem of data synchronization - the same comments for posts, if you take some kind of media or blog platform as an example. In any case, we start an intracluster, even virtual, representation of the database as an ExternalService. The question is that the database itself has not yet been clustered - the web servers deployed in the cube take information about the changes from our static combat base, which rotates separately.
Feel the catch? We use k8s or Swarm in order to distribute the load and avoid the fall of the main web server, but we do not do this for the database. But after all, if the database crashes, then in our entire clustered infrastructure there is no point - what is the use of empty web pages that return a database access error?
That is why not only web servers need to be clustered, as is usually done, but also the database infrastructure. Only in this way can we ensure that elements of the same structure that are fully operational in one team, but at the same time, are independent from each other. At the same time, even if half of our backend under load “collapses”, the rest will survive, and the database synchronization system with each other within the cluster and the possibility of infinite scaling and deployment of new clusters will help to quickly reach the required capacities - there would be racks in the data center .
In addition, the database model distributed in clusters allows you to take this same database to where it is needed; if we are talking about a global service, it’s rather illogical to twist a web cluster somewhere in the San Francisco area and at the same time drive packages when accessing the database in the Moscow Region and vice versa.
Also, database containerization allows you to build all the elements of the system at the same level of abstraction. Which, in turn, makes it possible to manage this system directly from the code, by developers, without the active involvement of admins. It was thought by the developers that they needed a separate DBMS for a new subproject - easy! wrote a yaml file, uploaded to the cluster and you're done.
Well, of course, internal operation is greatly simplified. Tell me, how many times have you squinted at the moments when a new member of the team shoved his hands into the combat database for work? Which one, in fact, is spinning right now? Of course, we are all adults here, and somewhere we have a fresh backup, and even further - behind a shelf with grandmother's cucumbers and old skis - another backup, possibly even in a cold storage, because once your office was on fire. But still, every introduction of a new member of the team with access to the combat infrastructure and, of course, to the combat database is a bucket of validol for everyone around. Well, who, a novice, knows, maybe he is squinting? Scary, agree.
Containerization and, in fact, the distributed physical database topology of your project helps to avoid such valid moments. Do not trust the newbie? Okay We’ll raise our own cluster for it and disconnect it from the rest of the database clusters - synchronization only by manual push and simultaneous rotation of two keys (one team lead, the second admin). And everyone is happy.
And now it's time to change shoes in the opponents of database containerization.
Dark side
Arguing why it is not necessary to containerize the database and continue to rotate it on one central server, we will not stoop to the rhetoric of orthodoxy and statements like “grandfathers turned the database on hardware, and we will!” Instead, let's try to come up with a situation in which containerization would really bring tangible dividends.
You must admit that projects that really need a base in the container can be counted on the fingers of one hand as not the best milling machine operator. For the most part, even the very use of k8s or Docker Swarm is redundant - quite often they resort to these tools due to the general hype nature of the technology and the “most powerful” ones, in the person of genders, to drive everything into clouds and containers. Well, because now it’s fashionable and everyone does it.
At least in half the cases, using kubernetis or just a docker on a project is redundant. The question is that not all teams or outsourcing companies hired to service the client’s infrastructure are aware of this. Worse - when containers are imposed, because it rises in a certain amount of coins to the client.
In general, there is an opinion that the docker / mafia cube stupidly crushes clients who outsource these infrastructure issues. Indeed, in order to work with clusters, you need engineers who are capable of this and understand the architecture of the implemented solution in general. Somehow we already described our case with the Republic edition - there we trained the client team to work in the realities of kubernetis, and everyone was satisfied. And it was decent. Often k8s “implementers” take the client’s infrastructure hostage - now they only understand how everything works there, there are no specialists on the client side.
Now imagine that in this way we give not only the web server part to the outsource, but also database maintenance. We said that a DB is a heart, and heart loss is fatal for any living organism. In short, the prospects are not the best. So, instead of hype kubernetis, many projects should simply not go mad at the normal AWS tariff, which will solve all the problems with the load on their website / project. But AWS is no longer fashionable, and show-offs are more expensive than money - unfortunately, in the IT environment too.
Okay Perhaps the project really needs clustering, but if everything is clear with stateless applications, then how can we organize decent provision of network connectivity for the clustered database?
If we are talking about a seamless engineering solution, which seems to be the transition to k8s, then our main headache is data replication in a clustered database. Some DBMSs are initially quite loyal to the distribution of data between their individual instances. Many others are not so welcoming. And quite often, the main argument in choosing a DBMS for our project is not the ability to replicate with minimal resource and engineering costs. Especially if the project was not originally planned as a microservice, but simply evolved in this direction.
We don’t need to talk about the speed of network drives - they are slow. Those. we still do not have a real possibility, in which case, to upgrade a DBMS instance somewhere, where there are more, for example, processor capacities or free RAM. We very quickly run into the performance of a virtualized disk subsystem. Accordingly, the DBMS must be nailed to its own personal set of machines in close proximity. Or, it is necessary to somehow separate the sufficiently fast synchronization of data synchronization to the estimated reserves.
Continuing the theme of virtual FS: Docker Volumes, unfortunately, are not hassle-free. In general, in such a case as long-term reliable data storage, I would like to manage with the most simple technical schemes. And adding a new abstraction layer from the FS container to the parent host FS is a risk in itself. But when there are also difficulties in transmitting data between these layers in the operation of the containerization support system, then it’s really a disaster. At the moment, most of the problems known to progressive humanity seem to be eradicated. But you yourself understand, the more complex the mechanism, the easier it breaks.
In the light of all these “adventures,” it’s much more profitable and easier to keep the database in one place, and even if you need containerization of the application, let it spin by itself and through the distribution gateway receive simultaneous communication with the database, which will be read and written only once and In one place. This approach reduces the likelihood of errors and mis-synchronization to minimum values.
What are we leading to? In addition, containerization of the database is appropriate where there is a real need for it. You can’t cram the full-app base and twist it as if you have two dozen microservices - this does not work. And this must be clearly understood.
Instead of output
If you are waiting for the intelligible conclusion “virtualize or not the database”, then we are disappointed: it will not be here. Because when creating any infrastructure solution, one should not be guided by fashion and progress, but, first of all, by common sense.
There are projects on which the principles and tools that come with kubernetis fit perfectly, and in such projects, peace comes at least in the backend area. And there are projects that do not need containerization, but a normal server infrastructure, because they basically can not be rescaled to a microservice cluster model, because they will fall.