Was MongoDB the right choice at all?
- Transfer
I recently found out that Red Hat is removing MongoDB support from Satellite (they say due to license changes). It made me think that in the last few years I have seen a bunch of articles how terrible MongoDB is and that no one should ever use it. But during this time, MongoDB has become a much more mature product. What happened? Is all the hatred really due to mistakes at the beginning of marketing a new DBMS? Or do people just use MongoDB in a wrong place?
If you suddenly feel like I'm protecting MongoDB, please read the disclaimer at the end of the article.
I have been working in the software industry for more than enough time to speak decently, but all the same, only a small part of the trends that hit our industry accounted for me. I have witnessed the growth of 4GL, AOP, Agile, SOA, Web 2.0, AJAX, blockchain ... the list is endless. Every year new trends appear. Some quickly fade away, while others fundamentally change the way software is developed.
Around every new trend, a certain general excitement is created: people either jump into the boat, or see the noise generated by others - and follow the crowd. This process is codified by Gartner in a hype cycle . Although controversial, this graph roughly describes what happens to the technologies before they eventually become usable.
But from time to time, a new innovation appears (or happens a second coming, as in this case), driven by only one specific implementation. In the case of NoSQL, the hype was heavily driven by the advent and rapid rise of MongoDB. MongoDB did not launch this trend: in fact, large Internet companies started having problems processing large amounts of data, which led to the return of non-relational databases. The general movement started with projects such as Bigtable from Google and Cassandra from Facebook, but it was MongoDB that became the most famous and affordable implementation of the NoSQL database, which most developers had access to.
Note: you might think that I am mixing document databases with column databases, key / value stores, or any of the many other types of data stores that fall under the general definition of NoSQL. And you are right. But at that time chaos reigned. Everyone was obsessed with NoSQL, everyone absolutely needed it, although many did not see the differences in different technologies. For many, MongoDB has become synonymous with NoSQL.
And the developers attacked her. It was a pretty tempting idea to have a database without a schema that magically scales to solve any problem. Around 2014, it seemed that everywhere where a relational database was used a year ago, such as MySQL, Postgres or SQL Server, MongoDB databases began to be deployed. To the question why, you could get an answer from the banal “this is the scale of the web” to the more thoughtful “my data is very poorly structured and fits well into the database without a scheme”.
It is important to remember that MongoDB and document databases generally solve a number of problems with traditional relational databases:
The potential benefits of MongoDB were huge, especially for certain classes of problems. If you read the above list without understanding the context and having no experience, you might get the impression that MongoDB is truly a revolutionary DBMS. The only problem was that the above advantages were accompanied by a number of reservations, some of which are listed below.
In fairness, no one at 10gen / MongoDB Inc. he won’t say that the following is not true; it’s just a compromise.
Many developers who turned to MongoDB did not really understand the trade-offs, and often dived headlong, setting it up as the primary data store. After this, it was often incredibly difficult to go back.
Not everyone jumped head first and hit the bottom. But many projects installed the MongoDB base where it simply did not fit - and they will have to live with it for many more years. If these organizations spent some time and methodically considered the choice of technologies, many would have made a different choice.
How to choose the right technology? There have been several attempts to create a systematic framework for evaluating technologies, such as “Framework for introducing technologies into software organizations” and “Framework for evaluating software technologies” , but it seems to me that this is unnecessary complexity.
Many technologies can be reasonably evaluated by asking just two basic questions.The problem is finding people who can respond responsibly to them, spending time searching for answers and without bias.
If you do not encounter any problem, you do not need a new tool. Point. No need to look for a solution and then come up with a problem. If you have not encountered a problem that a new technology does not solve much better than your existing technology, then there is nothing to discuss. If you are considering using this technology because you saw how others use it, then consider what problems they are facing and ask if you have such problems. It’s easy to accept the technology because others use it, the difficulty is in understanding whether you are facing the same problems.
This is certainly a more difficult question, because you have to dig and understand well both old and new technology. Sometimes you cannot truly understand a new one until you build something with it or you have an employee with such experience.
If you have neither one nor the other, then it makes sense to think about the minimum possible investment to determine the value of this tool. And if you make an investment, how difficult will it be to reverse the decision?
Trying to answer these questions as impartially as possible, remember one thing: you have to struggle with human nature. There are a number of cognitive biases that must be overcome in order to evaluate technology effectively. Here are just a few:
An objective assessment is not easy, but understanding basic cognitive biases will help make more rational decisions.
When a certain innovation appears, two questions must be answered with great care:
If you cannot confidently answer these two questions, take a few steps back and think.
So was MongoDB generally the right choice? Of course yes; as with most engineering technologies, this depends on many factors. Among those who answered these two questions, many have benefited from MongoDB and continue to benefit from it. Whoever did not do this, I hope they received a valuable and not too painful lesson about movement along the hype cycle.
I want to clarify that I have neither love nor hatred for MongoDB. It’s just that we didn’t have problems for which MongoDB is best suited. I know that 10gen / MongoDB Inc. At first, she acted very boldly, setting unsafe default values and promoting MongoDB everywhere (especially on hackathons) as a universal solution for working with any data. This was probably a bad decision. But it confirms the approach described here: these problems could be detected very quickly even with a superficial evaluation of the technology.
If you suddenly feel like I'm protecting MongoDB, please read the disclaimer at the end of the article.
New trend
I have been working in the software industry for more than enough time to speak decently, but all the same, only a small part of the trends that hit our industry accounted for me. I have witnessed the growth of 4GL, AOP, Agile, SOA, Web 2.0, AJAX, blockchain ... the list is endless. Every year new trends appear. Some quickly fade away, while others fundamentally change the way software is developed.
Around every new trend, a certain general excitement is created: people either jump into the boat, or see the noise generated by others - and follow the crowd. This process is codified by Gartner in a hype cycle . Although controversial, this graph roughly describes what happens to the technologies before they eventually become usable.
But from time to time, a new innovation appears (or happens a second coming, as in this case), driven by only one specific implementation. In the case of NoSQL, the hype was heavily driven by the advent and rapid rise of MongoDB. MongoDB did not launch this trend: in fact, large Internet companies started having problems processing large amounts of data, which led to the return of non-relational databases. The general movement started with projects such as Bigtable from Google and Cassandra from Facebook, but it was MongoDB that became the most famous and affordable implementation of the NoSQL database, which most developers had access to.
Note: you might think that I am mixing document databases with column databases, key / value stores, or any of the many other types of data stores that fall under the general definition of NoSQL. And you are right. But at that time chaos reigned. Everyone was obsessed with NoSQL, everyone absolutely needed it, although many did not see the differences in different technologies. For many, MongoDB has become synonymous with NoSQL.
And the developers attacked her. It was a pretty tempting idea to have a database without a schema that magically scales to solve any problem. Around 2014, it seemed that everywhere where a relational database was used a year ago, such as MySQL, Postgres or SQL Server, MongoDB databases began to be deployed. To the question why, you could get an answer from the banal “this is the scale of the web” to the more thoughtful “my data is very poorly structured and fits well into the database without a scheme”.
It is important to remember that MongoDB and document databases generally solve a number of problems with traditional relational databases:
- Strict scheme : with a relational database, if you have dynamically generated data, you are forced to either create a bunch of random “different” data columns, push data blobs there or use the EAV configuration ... all this has significant drawbacks.
- The difficulty of scaling : if there is so much data that they do not fit on a single server, MongoDB has proposed mechanisms to scale them on multiple machines.
- Sophisticated circuit modifications : no migrations! In a relational database, changing the structure of the database can be a huge problem (especially when there is a lot of data). MongoDB was able to greatly simplify the process. And it made it so easy that you can just update the circuit on the go and move on very quickly.
- Recording performance: MongoDB performance was good, especially with proper tuning. Even the MongoDB configuration out of the box, for which it was often criticized, showed some impressive performance metrics.
All risks are on you.
The potential benefits of MongoDB were huge, especially for certain classes of problems. If you read the above list without understanding the context and having no experience, you might get the impression that MongoDB is truly a revolutionary DBMS. The only problem was that the above advantages were accompanied by a number of reservations, some of which are listed below.
In fairness, no one at 10gen / MongoDB Inc. he won’t say that the following is not true; it’s just a compromise.
- Loss of transactions : Transactions are a major feature of many relational databases (not all, but most). Transactional means that you can perform several operations atomically and can guarantee that the data will remain consistent. Of course, with a NoSQL database, transactionality can be within the same document, or you can use two-phase commits to get transactional semantics. But you have to implement this functionality yourself ... which can be a complex and time-consuming task. Often you are not aware of the problem until you see that the data in the database falls into unacceptable states, because it is impossible to guarantee the atomicity of operations.Note: many told me that transactions appeared in MongoDB 4.0 last year, but with a number of limitations. The conclusion from the article remains the same: evaluate how the technology meets your needs.
- Loss of relational integrity (foreign keys) : if there is a relationship in your data, then you have to apply it in the application. Having a database in compliance with these relationships will remove a significant part of the work from the application and, therefore, from your programmers.
- Lack of ability to apply data structure : strict schemes sometimes become a big problem, but it is also a powerful mechanism for good data structuring, if used correctly. Document databases such as MongoDB provide incredible schema flexibility, but this flexibility removes the responsibility of keeping data clean. If you do not take care of them, then in the end you will have to write a lot of code in the application to account for data that is not stored in the form that you expect. As our company often says Simple Thread ... someday the application will be rewritten, and the data will live forever.Note: MongoDB supports schema validation: it is useful, but does not provide the same guarantees as in a relational database. First of all, adding or modifying a schema check does not affect existing data in the collection. You yourself must make sure that you update the data in accordance with the new scheme. Decide for yourself if this is enough for your needs.
- Native query language / loss of the ecosystem of tools : the advent of SQL has become an absolute revolution, and nothing has changed since then. It is an incredibly powerful language, but also quite complex. The need to design database queries in a new language consisting of JSON fragments is regarded as a big step back by people who have experience with SQL. There is a whole universe of tools that interact with SQL databases: from the IDE to the reporting tools. Going to a database that does not support SQL means that you cannot use most of these tools or you need to convert the data to SQL in order to use them, and this can be more difficult than you think.
Many developers who turned to MongoDB did not really understand the trade-offs, and often dived headlong, setting it up as the primary data store. After this, it was often incredibly difficult to go back.
What could have been done differently?
Not everyone jumped head first and hit the bottom. But many projects installed the MongoDB base where it simply did not fit - and they will have to live with it for many more years. If these organizations spent some time and methodically considered the choice of technologies, many would have made a different choice.
How to choose the right technology? There have been several attempts to create a systematic framework for evaluating technologies, such as “Framework for introducing technologies into software organizations” and “Framework for evaluating software technologies” , but it seems to me that this is unnecessary complexity.
Many technologies can be reasonably evaluated by asking just two basic questions.The problem is finding people who can respond responsibly to them, spending time searching for answers and without bias.
If you do not encounter any problem, you do not need a new tool. Point.
Question 1: What problems am I trying to solve?
If you do not encounter any problem, you do not need a new tool. Point. No need to look for a solution and then come up with a problem. If you have not encountered a problem that a new technology does not solve much better than your existing technology, then there is nothing to discuss. If you are considering using this technology because you saw how others use it, then consider what problems they are facing and ask if you have such problems. It’s easy to accept the technology because others use it, the difficulty is in understanding whether you are facing the same problems.
Question 2: What am I losing?
This is certainly a more difficult question, because you have to dig and understand well both old and new technology. Sometimes you cannot truly understand a new one until you build something with it or you have an employee with such experience.
If you have neither one nor the other, then it makes sense to think about the minimum possible investment to determine the value of this tool. And if you make an investment, how difficult will it be to reverse the decision?
People always spoil everything
Trying to answer these questions as impartially as possible, remember one thing: you have to struggle with human nature. There are a number of cognitive biases that must be overcome in order to evaluate technology effectively. Here are just a few:
- The effect of joining the majority - everyone knows about it, but it’s still difficult to fight with it. Just make sure the technology really matches your real needs.
- The effect of novelty - many developers tend to underestimate the technologies that they have been working with for a long time and overestimate the advantages of the new technology. Not just programmers, everyone is subject to this cognitive bias.
- The effect of positive characteristics - we tend to see what is, and lose sight of what is missing. This can lead to chaos in combination with the novelty effect, because you not only overestimate the new technology, but also ignore its shortcomings .
An objective assessment is not easy, but understanding basic cognitive biases will help make more rational decisions.
Summary
When a certain innovation appears, two questions must be answered with great care:
- Does this tool solve a real problem?
- Do we understand compromises well?
If you cannot confidently answer these two questions, take a few steps back and think.
So was MongoDB generally the right choice? Of course yes; as with most engineering technologies, this depends on many factors. Among those who answered these two questions, many have benefited from MongoDB and continue to benefit from it. Whoever did not do this, I hope they received a valuable and not too painful lesson about movement along the hype cycle.
Disclaimer
I want to clarify that I have neither love nor hatred for MongoDB. It’s just that we didn’t have problems for which MongoDB is best suited. I know that 10gen / MongoDB Inc. At first, she acted very boldly, setting unsafe default values and promoting MongoDB everywhere (especially on hackathons) as a universal solution for working with any data. This was probably a bad decision. But it confirms the approach described here: these problems could be detected very quickly even with a superficial evaluation of the technology.