DevConf: Uber Migrating from PostgreSQL to MySQL
May 18, 2018 at Digital October will host DevConf 2018 . And we decided to retell some interesting reports from last year's conference. There was a report with a somewhat holiivary headline: " What the Political Instrument is silent about: to the discussion about the transition of Uber from PostgreSQL to MySQL ". In it, MySQL developer Alexei Kopytov examined the differences between InnoDb and PostgreSQL at the lowest level, including data, memory, and replication. We bring to your attention a brief retelling of the report.
Uber switched from MySQL to Postgres in 2013, and the reasons they list were first: PostGIS is a geoinformation extension for PostgreSQL and hype. That is, PostgreSQL has a certain halo of serious, solid DBMS, perfect, without flaws. At least when compared to MySQL. They did not know much about PostgreSQL, but they fell for all this hype and switched, and after 3 years they had to move back. And the main reasons, if you summarize their report, are poor performance during operation in production.
It is clear that this was a fairly large reputation hit for the PostgreSQL community. There were a lot of discussions on a variety of sites, including mailing lists. I read all these comments and try to consider some of the most detailed, most comprehensive answers.
Here are our contestants today. The first is Robert Haas. Robert Haas is one of two key PostgreSQL developers. He gave the most balanced, balanced answer. He does not blame Uber for anything, on the contrary, says that, well done, thank you for telling us about the problems, we have something to work on, we have already done a lot for PostgreSQL, but there is still a lot of room for improvements. The second answer from Christophe Pettus is the CEO of some PostgreSQL consulting company. This is what I saw in this report that shocked me to the depths of my soul, because it feels like a person did not even try to be objective. Initially, he set himself the task of making Uber fools, and PostgreSQL white and fluffy. Everything turns inside out and I asked the opinion of Uber engineers, what they think about this report. They said it was some kind of nonsense. We didn’t say that at all. This is the exact opposite of Robert Haas's answer.
There is also Simon Riggs. This is CTO of the famous company 2ndQuadrant, which also does consulting and develops a lot in PostgreSQL itself. It is somewhere in between, in the middle. Simon answers some questions in detail, and some does not give any answer and hides some things. And the fourth report by Alexander Korotkov, development director of the Russian company PostgreSQL professional. The expected report is detailed and detailed, but it seems to me that Alexander also sweeps some things under the carpet.
And my moment of glory. A year ago, at last year’s DevConf, I also made a report about Postgres problems. And this report was a month before Uber published its own. Moreover, I considered PostgreSQL exclusively from an architectural point of view, having no real experience in the industrial operation of PostgreSQL, but I was able to identify most of the problems that Uber describes based on actual operation. The only Uber issue that I generally ignored in my talk was the issue with MVCC on replicas. I saw that there were some problems, I read something on this topic, but decided that it was not so important in industrial applications and decided to omit all this. As it turned out, Uber shows that for them this problem was also important.
There will be many pictures. To explain the Uber problem, we need to talk a little about the difference in MVCC, the implementation in MVCC, and the difference in data organization between MySQL, specifically InnoDB and PostgreSQL.
InnoDB uses a clustered index, which means that user data is stored along with index values for a single index selected. The primary key is usually “PRIMARY KEY”. Secondary indexes use pointers to records, but they use not just some kind of offset, they use a primary key to point to records.
PostgreSQL has a different organization of data on disk. The user data itself is stored separately from the index in an object called “heap” or “heap”, and all indexes, both primary and secondary, use pointers to the data in the form of page numbers, roughly speaking, and offsets on this page. That is, they use some kind of physical pointer.
Essentially, this means that all indexes in PostgreSQL are secondary. The primary index is just some kind of alias for the secondary. Organizationally it is arranged in the same way, there is no difference.
If you go down one level, I’m using a simplified Uber report here to see how the data is stored in pages or in blocks. So let's say we have a table with the names and surnames of famous mathematicians, just the year of their birth on this column with the year of birth built a secondary index and there is a primary key for some ID, just a number. This is how it looks like in InnoDB, schematically, this is how it looks in PostgreSQL.
What happens with non-index upgrade? That is, we perform an update without affecting any columns for which there are any indices, non-indexable columns. In the case of InnoDB, the data is updated directly in: the primary key, where they are stored locally, but we need lines that are visible to other transactions, we need some history for lines. This historical data leaves for a separate segment called the UNDO log. Moreover, not the whole line leaves, but only what we have changed. The so-called update vector.
Accordingly, the pointer in the secondary index, if we have one, does not need to be changed, because we use the PRIMARY KEY. PRIMARY KEY has not changed, we only change non-indexed columns, which means that you do not need to change anything, we just transfer the old data to the UNDO log and enter the new data directly into the table. PostgreSQL has a fundamentally different approach. if we do an non-indexed update, we insert another full copy of the line with the new data, but, accordingly, then the indices must also be changed. Pointers in all indexes must point to the latest version of the string. Accordingly, another record is also inserted in them, which already indicates not the old version of the line, but the new one.
What happens with an indexed update? What if we update some sort of indexed column, here is the year of birth here. In this case, InnoDB again makes a new entry in the UNDO log and again indicates only the data that has changed. Again, the secondary index does not change. Well, more precisely, a new one is inserted there, since we have now replaced the indexed column, a new record is inserted - a secondary index that points to the same line. In PostgreSQL, the same thing happens as with non-indexed update. We insert another full copy of the line and again update all indexes. Insert new entries into each index. And in the primary and secondary. If there are several secondary, then in each secondary.
Both InnoDB and PostgreSQL accumulate such multi-version garbage. Old versions of lines that need to be removed over time when there are no transactions left that theoretically can see them. In the case of InnoDB, the purge system is responsible for this operation. It passes through the UNDO log and deletes those entries from the UNDO log that are no longer visible to any transaction.
In the case of PostgreS, a process called “vacuum” is responsible for this, which scans the data, since we have multi-version garbage mixed with the data itself, it goes through all the data and decides whether this line can already be marked as available for reuse or not . Same thing with indexes. From them, too, you need to remove the old version of the lines.
What is bad vacuum? In my opinion, vacuum is the most archaic, most inefficient PostgreSQL system. There are many reasons for this, and not because it is poorly written. No, it is well written. This follows from how PostgreS organizes data on disk.
Firstly, it is solved by data scanning. Not by scanning your own history, as is the case with InnoDB in the UNDO log, but by scanning data. There, of course, there are some optimizations that allow at least avoiding the entire data scan, but only certain segments that have been updated recently, there is some bit mask that notes that this data is clean, it has not been updated since the last vacuum. You don’t need to look there, but all these optimizations work only for the case when the data is mostly chaotic. If you have some kind of large set of frequently updated data, all these optimizations are dead poultices. Anyway, you need to go through all the recently updated pages, and determine whether this record can be thrown out or not, for each record. It is also performed in one thread. I.e,
In InnoDB purge, for example, can be parallelized to an arbitrary number of threads. By default, 4 threads are currently used. And besides, on vacuum, how efficiently and often vacuum works, the efficiency of PostgreSQL itself is tied. Various optimizations, query execution, like HOT, we’ll talk about this, or index-only scans, the replication efficiency is also tied to whether the vacuum has passed recently or not. Do we have this multi-version garbage or not.
If anyone needs bloody details, then Alexey Lesovsky made the report “Nine Circles of Hell or PostgreSQL Vacuum”. There is such a big huge diagram that shows the algorithm of operation of vacuum. And you can break the brain about it if you want.
Now, actually, to the Uber problem. The first problem they show is write amplification. The bottom line is that even a small non-index update can lead to a large write to disk. Because you need, firstly, to write a new version of the string in heap, in a bunch, update all new indexes, which in itself is an expensive operation. From a computational point of view, updating an index is expensive. It’s good if it’s also a btree index, and it could be some gin indexes that update very slowly. Add new records, all these pages changed both in the heap itself and in the indexes, will eventually go to disk, in addition, all this must be written to the transaction log. A lot of entries for a relatively small update.
Here you can talk more about optimization in PostgreSQL, which has existed for a long time, it is called “Heap-Only Tuples”, NOT updates. Uber does not mention in the report, it is not clear whether they knew about it or, maybe, in their case it did not work simply. They just don’t say anything about her. But there is such an optimization.
The point is that if both the old and the new version of the record during the index update are on the same page, instead of updating the indexes, we make a link to the new one from the old record. Indexes continue to point to the old version. We can jump to this link and get the new version without touching the indexes. Everything seems to be fine. If another line appears, we make another link and get just such a chain called “Hot chain”. In fact, there is no complete happiness, because this all works until the first index update. As soon as we get the first update, which affects the indexes for this page, you need to collapse all this hot chain, because all indexes must point to the same record. We cannot change the link for one index, but leave for others.
Accordingly, the more indices, the less chance that NOTs will work and even Robert Haas, he writes in his review that yes, there really is such a problem, and many companies suffer from it. Many developers are therefore careful about creating indexes. Theoretically, the index might even be useful, but if we add it, then the update will no longer use this optimization of Heap-Only Tuples. Secondly, it is obvious that this optimization only works if there is space on the page. The link can be put only if the old and new version of the record are on the same page. If the new version of the record moves to another page, that's all, no Heap-Only Tuples optimization.
For this, again, PostgreSQL made a crutch that, while we wait until vacuum reaches it, will free us space on the page, let's do such a mini vacuum or single gauge vacuum this is called. Each time you access a specific page, we will also vacuum a little more. See which records can be removed if they are not already visible to other transactions. Free up space on the page and, accordingly, increase the chance that NOT optimization will work.
This is all inefficient, again, if you have many concurrent transactions. Then the chance that the current version of the lines are not visible to any performing transaction, it decreases. Accordingly, NOT optimization ceases to work. It also does not work when there are not many parallel, but one long transaction that can see old versions of lines on the page. Accordingly, we cannot delete them, then this record begins to move to the next page and, again, no NOT optimization.
What does the community respond to? Robert Haas says in a rather balanced way: “Yes, there is a problem, NOT optimization is not a panacea, many companies suffer from this, not only Uber.” And in general, in a good way, I have been talking about this for a long time, we need to do the storage engine as in MySQL. In particular, we need a storage engine based on undo, not what we now have. Christoph Petus and Simon Riggs say: “There is NOT optimization. Since Uber does not mention it, it means that they do not know anything about her. So they are fools. ” Christophe, he is a man brought up when he wants to say that Uber are fools, he inserted a smiley in the presentation, shrugging his shoulders. “What to take from them.”
Alexander Korotkov says that there is NOT optimization, there are certain limitations, but then he starts listing the list of different experimental patches that will not be included in the top ten upcoming mager versions and after listing these experimental patches he comes to the conclusion that, in fact, in a good way, need to do a storage engine based on undo. All other optimizations are, rather, props, crutches for an inefficient data organization system in PostgreSQL.
What do all respondents miss? There is such a belief, I often see it, including in some Russian communities, that Uber was just some specific case. In fact, this is nothing specific. These are the usual operations for OLTP, for Internet projects, queues, some counters, metrics, when you need to update non-indexable columns. I do not see anything specific in this.
Secondly, none of the respondents said that MVCC and disk organization in PostgreSQL are poorly suited for OLTP loads.
Also, no one formulated the simple thing that there is no complete solution to the problem and apparently will not be soon. Even experimental patches, they will appear, at best, in PostgreSQL 11 and even, they are more likely to be crutches and backups, and not a radical solution to the problem. Also, that the write amplification problem that Uber describes is a rather special case of a more general problem. And MySQL has been working on a more general problem for a long time, there are write-optimized TokuDB / MyRocks engines now gaining popularity. In this regard, PostgreSQL is still worse. There are no theoretical alternatives even on the horizon.
The second problem that Uber notes is replication and problems with MMVC on replicas. Let's say we have replication, master -> replica. Replication in PostgreSQL is physical, that is, it works at the file level and all changes in the files that occur on the wizard are transferred to the replica one to one. In fact, we end up building a byte copy of the master. The problem arises if we also use a replica in order to twist some queries. And on read only, you can’t make changes there, but you can read. What to do if we have a select on the replica, if this select needs some old versions of lines that are already deleted on the wizard, because the wizard does not have these selects. Vacuum walked around and deleted these old versions without knowing that some queries were being executed on the replica or on the replicas, which versions are still needed. There are not many options. You can hold replication, stop replication on the replica until this select is completed, but since replication cannot be delayed for an infinite time, there is some default timeout - this is max_standby_streaming_delay. 30 seconds, after which transactions that use some old versions of lines begin to simply shoot back.
Uber also notes that they generally had long transactions, mainly due to the fault of the developer. There were transactions that, for example, opened a transaction, then opened an e-mail or committed some kind of blocking input / output. Of course, in theory, Uber says that this cannot be done, but there are no ideal developers, and besides, they often use some form that hides a general fact. It’s not so easy to determine when a transaction begins, when it ends. High-level forms are used.
Such cases nevertheless arose and for them it was a big problem. Either replication was delayed, or transactions were shot back, from the developer's point of view, at an unpredictable moment. Because, you understand, if, for example, we wait 30 seconds and delay replication, and after that we shoot the request, then by the time the request is shot, replication is already 30 seconds behind. Accordingly, the next transaction can be shot almost instantly, because these 30 seconds, this limit will be depleted very quickly. From the point of view of the developer, a transaction at some random moment ... we just started, haven’t done anything yet, but they’re already interrupting us.
Community Response. Everyone says that you had to use the hot_standby_feedback = on option, you had to enable it. Christoph Petus adds that Uber is just fools.
This option has a price. There are some negative consequences, otherwise it would be included simply by default. In this case, if returning to this picture, what does she do. The replica begins to transmit information to the master about which versions, which transactions are open, which version of the lines are still needed. And in this case, the wizard actually just holds the vacuum so that he does not delete these versions of the lines. There are also all sorts of problems. For example, replicas are often used for testing, so that developers play not on a combat vehicle, but not some replica. Imagine, the developer, during testing, started some kind of transaction and went to lunch, and at that time the vacuum stops at the master and as a result the place ends and the optimization stops working and as a result the master may simply die.
Or another case. Replicas are often used for some kind of analytics, long-running queries. This is also a problem. We run a long-running query with the hot_standby_feedback = on option enabled, vacuum also stops on the wizard. And then it turns out that for some long analytic queries there is no alternative at all. Either they simply kill the master with the hot_standby_feedback = on option turned on, or if it is turned off, they will simply shoot back and never finish.
What is everyone silent about? As I already said, the option that everyone advises, it delays the vacuum on the wizard and in PostgreSQL at one time they were going to make it turned on by default, but those offering immediately explained why this was not necessary. So just advising her without any reservations is not very honest, I would say.
Developers make mistakes. Christophe Petus says that since Uber was unable to hire the right developers, they are fools. It breaks where it is thin. Developers make mistakes where they are allowed to make mistakes. What no one said was that physical replication is poorly suited for scaling reads. If we use replicas in order to scale the reading, twist some selections, physical replication in PostgreSQL simply does not suit the architecture. It was a hack that was easy to implement, but it has its limits. And this again is not a specific case. Replication can be used for different purposes. Using this replication to scale reads is also a fairly common case. Another issue that Uber reports is replication and write amplification. As we have discussed, on the master with some specific updates, write amplification also occurs. Too much data is written to disk, but since replication in PostgreSQL is physical, and all changes in files go to replicas, all these changes that occur in files on the wizard go to the network. It is quite redundant, physical replication, by nature. There are speed problems between different data centers. If we are replicating from one coast to another, then to buy a sufficiently wide channel that would contain all this replication flow that comes from the master is problematic and not always possible. which occur in the files on the master go to the network. It is quite redundant, physical replication, by nature. There are speed problems between different data centers. If we are replicating from one coast to another, then to buy a sufficiently wide channel that would contain all this replication flow that comes from the master is problematic and not always possible. which occur in the files on the master go to the network. It is quite redundant, physical replication, by nature. There are speed problems between different data centers. If we are replicating from one coast to another, then to buy a sufficiently wide channel that would contain all this replication flow that comes from the master is problematic and not always possible.
What the community is responding to. Robert Haas said that one could try WAL compression, which is transmitted to replications over a network or SSL, there you can also enable compression at the network connection level. And also, you could try the logical replication solution, which is available in PostgreSQL. He mentions Slony / Bucardo / Londiste. Simon Riggs is giving a lecture on the pros and cons of physical replication, I would write all of this here too, but lately I have talked about this so often that I’ve already gotten a little tired of it, I believe that everyone understands how logical replication differs from physical. They also add that pglogical will come soon - this is the built-in logical replication solution in PostgreSQL that his company is developing.
Christoph Petus says there is no need to compare logical and physical replication. But this is strange, because that’s exactly what Uber does. He compares the pros and cons of logical and physical replication, saying that physical replication is not very suitable for them, but logical replication in MySQL is just fine. It is not clear why it is not necessary to compare. He adds that Slony, Bucardo is hard to install and hard to manage, but it's Uber. Since they could not do this, then they are fools. And besides, in version 9.4 there is pglogical. This is a third-party extension for PostgreSQL that organizes logical replication. I must say that Uber used older versions of PostgreSQL, 9.2, 9.3, for which there is no pglogical, and they could not switch to 9.4. I will continue to explain why.
Alexander Korotkov says Uber compares logical replication in MySQL with physical replication. That's right. That is what he does. There is no physical replication in MySQL and Alibaba is working on adding physical replication to MySQL - this is also true, only Alibaba does this for a completely different use case. Alibaba does this to provide high availability for its cloud application within one data center. They do not use cross-regional replication, they do not use reading from replicas. This is only used if cloud clients crashes the server so that they can quickly switch to replica. That's all. That is, this use case is different from what Uber uses.
Another Uber issue they are talking about is cross-version updates. It is necessary to switch from one major version to another and it would be desirable to do this without stopping the wizard. PostgreSQL with physical replication does not solve this problem. Because, usually, in MySQL they do like. First, replicas, or one replica, if there is only one, upgrade to an older version, then switch requests to it ... actually make it a new master, and the old master upgrades to a new version. It turns out downtime is minimal. But with physical replication in PostgreSQL, you cannot replicate from a minor version to an older one and generally the versions must be the same on both the master and the replica. Pglogical, again, was not an option for them, because they used older versions and at 9.4, from which pglogical was available, they couldn’t move, because there was no replication. This recursion.
Some of their legacy services, and they still remain in Ubere, they still work on PostgreSQL 9.2 for this simple reason.
Community Response. Robert Haas does not comment on this in any way. Simon Riggs says that in fact there is pg_upgrade - this is a procedure of its own upgrade and with the -k option, this process takes less time, so downtime will still be there, the wizard will have to be stopped and all requests will also have to be stopped. Well, not so long. In addition, Simon Riggs says that they have a commercial solution in their company that integrates old versions with new ones.
Christophe Petus says that pg_upgrade is not a panacea, in particular PostGIS, a geoinformation extension due to which Uber switched to PostgreSQL cause a lot of problems when upgrading between major versions. I don’t know, I believe him here because he is a PostgreSQL consultant. He says that Uber is not mastered, again, Slony / Bucardo, pglogical, which means they are fools. Alexander Korotkov shows how to upgrade between major versions using pglogical.
Our report is called: “Uber answer,” for Uber it was not an option. The answer to current users, then there too have their own nuances.
What is everyone silent about? Yes, there really are all sorts of third-party solutions for PostgreSQL that organize the logical replication of Slony / Bucardo / Londiste, but they are all trigger-based, all of them are based on triggers, that is, a trigger is hung on each table, with any update we take these changes, who made and write them to a special relational journaling table, and then when they move to replicas, all these changes from the relational table, we delete them. And here there are typical problems. The sore PostgreSQL with Vacuum, when data is often inserted and deleted, a lot of work for vacuum, respectively, a lot of all sorts of problems. Not to mention the fact that the trigger-based solution itself is not very effective, you understand, but the data from the tables is pulled out by some external scripts in general. Not in the DBMS itself,
With pglogical, everything is very strange. Everyone advised that pglogical can be used to upgrade from a younger version to a new one. I read the documentation, the developers do not promise that such cross-versioned replication will work. They say that it can work, but in future versions, something may part, and, in general, we promise nothing.
Low "roundness" even now, in 2017. This is a relatively new technology, it is, of course, progressive, it potentially uses some built-in things in PostgreSQL, and not some external solutions, but I don’t really understand how Uber could be recommended for use in productions in 2015, when pglogical generally just started to be developed. But the pglogical is good, because unlike the main PostgreSQL, you can go and see a bug tracker to assess the stability of the project, the maturity of the project, how widely it is used. Honestly, I was not impressed. This did not impress me on any stable project.
Activity is very low, there are some bugs that have been hanging there for years. Someone has problems with performance, someone has a problem with data loss. That does not seem like a stable project. In my hobby project ( sysbench ) there is more activity on the github than in pglogical.
What are they still not talking about? That all these problems, all these third-party and built-in pglogical solutions have problems with sequences, because you cannot hang a trigger on them and changes are not written to the transaction log, which means that you have to do some special complex squats to ensure replication sequences. DDL only through wrapper functions, because, again, triggers cannot be hung on DDL and for some reason they are not written to the transaction log in PostgreSQL. All three solutions do not have a log as such, that is, you cannot take any backup and roll up logical changes from a log like in MySQL. Only a physical backup is possible, but to make a physical PITR, you need a physical backup. You won’t roll it on a logical back up. No GTID, some last things that appeared in replication in MySQL. I still did not understand how to make re-positioning using these solutions. Say, switch the slave replica from one master to another and understand which transactions have already been lost and which are not.
Finally, there is no parallelism. The only way to speed up logical replication is to parallelize it. Much work has been done in MySQL in this direction. All these solutions do not support concurrency in any form whatsoever. He is simply not there.
What is everyone missing? Firstly, there is only one correct scenario for using physical replication in PostgreSQL or in MySQL, when it appears somewhere or somewhere else. This is High Availability within the same data center when we have a fairly fast channel, and when you do not need to read from replicas. This is the only correct use case. Everything else may or may not work, depending on the load and many other parameters.
To date, PostgreSQL cannot offer a complete alternative to replication in MySQL. Glogical is a strange thing. Many say that in PostgreSQL 10 pglogical will appear, built-in logical replication, in fact, few in the Postgres community know that some truncated version of pglogical will go into 10. I have not found any intelligible description of what exactly will go in, and what will remain overboard. In addition, the syntax in 10 will be used slightly different from what the third-party pglogical extension used. It is not clear that it will be possible to somehow use them together. Is it possible to replicate pglogical from older versions in PostgreSQL 10 or not. I have not found anywhere. The pglogical documentation is also peculiar.
And finally, an inefficient cache. What is the difference between cache implementations in InnoDB and PostgreSQL. InnoDB uses unbuffered I / O without relying on its own operating system cache, therefore, all memory, all maximum available memory on the machine is allocated to the InnoDB cache. InnoDB uses ordinary pread and pwrite system calls to perform unbuffered I / O. Without relying on the operating system cache.
PostgreSQL has a different architecture. For some reason, direct I / O support for unbuffered I / O has not yet been done, therefore, PostgreSQL relies on both its own cache and the operating system cache at the same time. In addition, for some reason he uses not such system calls that allow you to immediately shift and read something and write, but first a separate system call to move to the desired position, and then read and write. Two system calls per operation. There is some duplication of data. Data memory pages will be stored in both the native cache and the operating system cache.
Robert Haas and Simon Riggs do not respond to this. Christoph Petus, something happened to him, he does not call Uber fools here, I agree that there is a problem, but the effect in practice is not clear. Alexander Korotkov says that pread instead of lseek gives optimization only 1.5 percent of the performance, but this is part of the problems that Uber lists. Says nothing more.
What is everyone omitting? That 1.5% were received on one benchmark on the laptop of the developer who developed this patch. Experimentally, there was some discussion on the Postgres hackers mailing list, and the patch ended up stuck in the discussions. Robert Haas said this patch needs to be enabled. Even 1.5% is likely to be larger on combat systems, even 1.5% is already good, and Tom Lane, another key PostgreSQL developer, was against it, saying that you do not need to touch stable code because of some 1 ,5%. In general, the two key developers did not find understanding and the patch eventually got stuck. He did not go anywhere.
This is far from the only problem in the design of shared buffers; in my last year's report on DevConf, this was revealed in detail. There are many more problems than Uber lists. There is data duplication, there is a problem with double checksum calculation, there is a problem with “flushing” data from the operating system cache, which InnoDB with its cache can prevent. And in the future, if compression and encryption appears in PostgreSQL, problems with such an architecture will also begin there.
What we see in the end. Uber, I talked to them, and they also gave presentations at conferences, on Percona live. They say that: “We are generally happy with MySQL, we are not going to move anywhere. We have old legacy systems that still work on PostgreSQL, but all the main processing is done in MySQL. PostGIS is the main reason why we switched to PostgreSQL, a geographic information extension. This is cool, but with its pitfalls and in addition, this thing does not scale well for OLTP loads. We found a way to do without it in MySQL. Yes, there is no such possibility in MySQL yet. ”
I give advice that you do not believe in this hype, in the halo of a perfect system, devoid of flaws. Be engineers, be skeptics, try to make decisions based on objective things, and not based on rumors and some kind of common hysteria. And most importantly - test. Before you plan to replace the DBMS in your project, conduct at least some tests to see how this DBMS behaves in your specific case. That's all for me!
Here is such a report with a clear desire to show the advantages of InnoDb :) Of course, in the near future we will dilute our habr with a report from the opposite camp. Come May 18th at DevConf(special prices valid until April 30). Alex promised to be there with a new report. True, now it will be more about MySQL. And Ivan Panchenko from Postgres Professional will talk about the very logical replication that was included in PostgreSQL 10.
Background
Uber switched from MySQL to Postgres in 2013, and the reasons they list were first: PostGIS is a geoinformation extension for PostgreSQL and hype. That is, PostgreSQL has a certain halo of serious, solid DBMS, perfect, without flaws. At least when compared to MySQL. They did not know much about PostgreSQL, but they fell for all this hype and switched, and after 3 years they had to move back. And the main reasons, if you summarize their report, are poor performance during operation in production.
It is clear that this was a fairly large reputation hit for the PostgreSQL community. There were a lot of discussions on a variety of sites, including mailing lists. I read all these comments and try to consider some of the most detailed, most comprehensive answers.
Here are our contestants today. The first is Robert Haas. Robert Haas is one of two key PostgreSQL developers. He gave the most balanced, balanced answer. He does not blame Uber for anything, on the contrary, says that, well done, thank you for telling us about the problems, we have something to work on, we have already done a lot for PostgreSQL, but there is still a lot of room for improvements. The second answer from Christophe Pettus is the CEO of some PostgreSQL consulting company. This is what I saw in this report that shocked me to the depths of my soul, because it feels like a person did not even try to be objective. Initially, he set himself the task of making Uber fools, and PostgreSQL white and fluffy. Everything turns inside out and I asked the opinion of Uber engineers, what they think about this report. They said it was some kind of nonsense. We didn’t say that at all. This is the exact opposite of Robert Haas's answer.
There is also Simon Riggs. This is CTO of the famous company 2ndQuadrant, which also does consulting and develops a lot in PostgreSQL itself. It is somewhere in between, in the middle. Simon answers some questions in detail, and some does not give any answer and hides some things. And the fourth report by Alexander Korotkov, development director of the Russian company PostgreSQL professional. The expected report is detailed and detailed, but it seems to me that Alexander also sweeps some things under the carpet.
And my moment of glory. A year ago, at last year’s DevConf, I also made a report about Postgres problems. And this report was a month before Uber published its own. Moreover, I considered PostgreSQL exclusively from an architectural point of view, having no real experience in the industrial operation of PostgreSQL, but I was able to identify most of the problems that Uber describes based on actual operation. The only Uber issue that I generally ignored in my talk was the issue with MVCC on replicas. I saw that there were some problems, I read something on this topic, but decided that it was not so important in industrial applications and decided to omit all this. As it turned out, Uber shows that for them this problem was also important.
There will be many pictures. To explain the Uber problem, we need to talk a little about the difference in MVCC, the implementation in MVCC, and the difference in data organization between MySQL, specifically InnoDB and PostgreSQL.
Data organization
InnoDB uses a clustered index, which means that user data is stored along with index values for a single index selected. The primary key is usually “PRIMARY KEY”. Secondary indexes use pointers to records, but they use not just some kind of offset, they use a primary key to point to records.
PostgreSQL has a different organization of data on disk. The user data itself is stored separately from the index in an object called “heap” or “heap”, and all indexes, both primary and secondary, use pointers to the data in the form of page numbers, roughly speaking, and offsets on this page. That is, they use some kind of physical pointer.
Essentially, this means that all indexes in PostgreSQL are secondary. The primary index is just some kind of alias for the secondary. Organizationally it is arranged in the same way, there is no difference.
If you go down one level, I’m using a simplified Uber report here to see how the data is stored in pages or in blocks. So let's say we have a table with the names and surnames of famous mathematicians, just the year of their birth on this column with the year of birth built a secondary index and there is a primary key for some ID, just a number. This is how it looks like in InnoDB, schematically, this is how it looks in PostgreSQL.
What happens with non-index upgrade? That is, we perform an update without affecting any columns for which there are any indices, non-indexable columns. In the case of InnoDB, the data is updated directly in: the primary key, where they are stored locally, but we need lines that are visible to other transactions, we need some history for lines. This historical data leaves for a separate segment called the UNDO log. Moreover, not the whole line leaves, but only what we have changed. The so-called update vector.
Accordingly, the pointer in the secondary index, if we have one, does not need to be changed, because we use the PRIMARY KEY. PRIMARY KEY has not changed, we only change non-indexed columns, which means that you do not need to change anything, we just transfer the old data to the UNDO log and enter the new data directly into the table. PostgreSQL has a fundamentally different approach. if we do an non-indexed update, we insert another full copy of the line with the new data, but, accordingly, then the indices must also be changed. Pointers in all indexes must point to the latest version of the string. Accordingly, another record is also inserted in them, which already indicates not the old version of the line, but the new one.
What happens with an indexed update? What if we update some sort of indexed column, here is the year of birth here. In this case, InnoDB again makes a new entry in the UNDO log and again indicates only the data that has changed. Again, the secondary index does not change. Well, more precisely, a new one is inserted there, since we have now replaced the indexed column, a new record is inserted - a secondary index that points to the same line. In PostgreSQL, the same thing happens as with non-indexed update. We insert another full copy of the line and again update all indexes. Insert new entries into each index. And in the primary and secondary. If there are several secondary, then in each secondary.
Both InnoDB and PostgreSQL accumulate such multi-version garbage. Old versions of lines that need to be removed over time when there are no transactions left that theoretically can see them. In the case of InnoDB, the purge system is responsible for this operation. It passes through the UNDO log and deletes those entries from the UNDO log that are no longer visible to any transaction.
In the case of PostgreS, a process called “vacuum” is responsible for this, which scans the data, since we have multi-version garbage mixed with the data itself, it goes through all the data and decides whether this line can already be marked as available for reuse or not . Same thing with indexes. From them, too, you need to remove the old version of the lines.
What is bad vacuum? In my opinion, vacuum is the most archaic, most inefficient PostgreSQL system. There are many reasons for this, and not because it is poorly written. No, it is well written. This follows from how PostgreS organizes data on disk.
Firstly, it is solved by data scanning. Not by scanning your own history, as is the case with InnoDB in the UNDO log, but by scanning data. There, of course, there are some optimizations that allow at least avoiding the entire data scan, but only certain segments that have been updated recently, there is some bit mask that notes that this data is clean, it has not been updated since the last vacuum. You don’t need to look there, but all these optimizations work only for the case when the data is mostly chaotic. If you have some kind of large set of frequently updated data, all these optimizations are dead poultices. Anyway, you need to go through all the recently updated pages, and determine whether this record can be thrown out or not, for each record. It is also performed in one thread. I.e,
In InnoDB purge, for example, can be parallelized to an arbitrary number of threads. By default, 4 threads are currently used. And besides, on vacuum, how efficiently and often vacuum works, the efficiency of PostgreSQL itself is tied. Various optimizations, query execution, like HOT, we’ll talk about this, or index-only scans, the replication efficiency is also tied to whether the vacuum has passed recently or not. Do we have this multi-version garbage or not.
If anyone needs bloody details, then Alexey Lesovsky made the report “Nine Circles of Hell or PostgreSQL Vacuum”. There is such a big huge diagram that shows the algorithm of operation of vacuum. And you can break the brain about it if you want.
Now, actually, to the Uber problem. The first problem they show is write amplification. The bottom line is that even a small non-index update can lead to a large write to disk. Because you need, firstly, to write a new version of the string in heap, in a bunch, update all new indexes, which in itself is an expensive operation. From a computational point of view, updating an index is expensive. It’s good if it’s also a btree index, and it could be some gin indexes that update very slowly. Add new records, all these pages changed both in the heap itself and in the indexes, will eventually go to disk, in addition, all this must be written to the transaction log. A lot of entries for a relatively small update.
Here you can talk more about optimization in PostgreSQL, which has existed for a long time, it is called “Heap-Only Tuples”, NOT updates. Uber does not mention in the report, it is not clear whether they knew about it or, maybe, in their case it did not work simply. They just don’t say anything about her. But there is such an optimization.
The point is that if both the old and the new version of the record during the index update are on the same page, instead of updating the indexes, we make a link to the new one from the old record. Indexes continue to point to the old version. We can jump to this link and get the new version without touching the indexes. Everything seems to be fine. If another line appears, we make another link and get just such a chain called “Hot chain”. In fact, there is no complete happiness, because this all works until the first index update. As soon as we get the first update, which affects the indexes for this page, you need to collapse all this hot chain, because all indexes must point to the same record. We cannot change the link for one index, but leave for others.
Accordingly, the more indices, the less chance that NOTs will work and even Robert Haas, he writes in his review that yes, there really is such a problem, and many companies suffer from it. Many developers are therefore careful about creating indexes. Theoretically, the index might even be useful, but if we add it, then the update will no longer use this optimization of Heap-Only Tuples. Secondly, it is obvious that this optimization only works if there is space on the page. The link can be put only if the old and new version of the record are on the same page. If the new version of the record moves to another page, that's all, no Heap-Only Tuples optimization.
For this, again, PostgreSQL made a crutch that, while we wait until vacuum reaches it, will free us space on the page, let's do such a mini vacuum or single gauge vacuum this is called. Each time you access a specific page, we will also vacuum a little more. See which records can be removed if they are not already visible to other transactions. Free up space on the page and, accordingly, increase the chance that NOT optimization will work.
This is all inefficient, again, if you have many concurrent transactions. Then the chance that the current version of the lines are not visible to any performing transaction, it decreases. Accordingly, NOT optimization ceases to work. It also does not work when there are not many parallel, but one long transaction that can see old versions of lines on the page. Accordingly, we cannot delete them, then this record begins to move to the next page and, again, no NOT optimization.
What does the community respond to? Robert Haas says in a rather balanced way: “Yes, there is a problem, NOT optimization is not a panacea, many companies suffer from this, not only Uber.” And in general, in a good way, I have been talking about this for a long time, we need to do the storage engine as in MySQL. In particular, we need a storage engine based on undo, not what we now have. Christoph Petus and Simon Riggs say: “There is NOT optimization. Since Uber does not mention it, it means that they do not know anything about her. So they are fools. ” Christophe, he is a man brought up when he wants to say that Uber are fools, he inserted a smiley in the presentation, shrugging his shoulders. “What to take from them.”
Alexander Korotkov says that there is NOT optimization, there are certain limitations, but then he starts listing the list of different experimental patches that will not be included in the top ten upcoming mager versions and after listing these experimental patches he comes to the conclusion that, in fact, in a good way, need to do a storage engine based on undo. All other optimizations are, rather, props, crutches for an inefficient data organization system in PostgreSQL.
What do all respondents miss? There is such a belief, I often see it, including in some Russian communities, that Uber was just some specific case. In fact, this is nothing specific. These are the usual operations for OLTP, for Internet projects, queues, some counters, metrics, when you need to update non-indexable columns. I do not see anything specific in this.
Secondly, none of the respondents said that MVCC and disk organization in PostgreSQL are poorly suited for OLTP loads.
Also, no one formulated the simple thing that there is no complete solution to the problem and apparently will not be soon. Even experimental patches, they will appear, at best, in PostgreSQL 11 and even, they are more likely to be crutches and backups, and not a radical solution to the problem. Also, that the write amplification problem that Uber describes is a rather special case of a more general problem. And MySQL has been working on a more general problem for a long time, there are write-optimized TokuDB / MyRocks engines now gaining popularity. In this regard, PostgreSQL is still worse. There are no theoretical alternatives even on the horizon.
MMVC and replication
The second problem that Uber notes is replication and problems with MMVC on replicas. Let's say we have replication, master -> replica. Replication in PostgreSQL is physical, that is, it works at the file level and all changes in the files that occur on the wizard are transferred to the replica one to one. In fact, we end up building a byte copy of the master. The problem arises if we also use a replica in order to twist some queries. And on read only, you can’t make changes there, but you can read. What to do if we have a select on the replica, if this select needs some old versions of lines that are already deleted on the wizard, because the wizard does not have these selects. Vacuum walked around and deleted these old versions without knowing that some queries were being executed on the replica or on the replicas, which versions are still needed. There are not many options. You can hold replication, stop replication on the replica until this select is completed, but since replication cannot be delayed for an infinite time, there is some default timeout - this is max_standby_streaming_delay. 30 seconds, after which transactions that use some old versions of lines begin to simply shoot back.
Uber also notes that they generally had long transactions, mainly due to the fault of the developer. There were transactions that, for example, opened a transaction, then opened an e-mail or committed some kind of blocking input / output. Of course, in theory, Uber says that this cannot be done, but there are no ideal developers, and besides, they often use some form that hides a general fact. It’s not so easy to determine when a transaction begins, when it ends. High-level forms are used.
Such cases nevertheless arose and for them it was a big problem. Either replication was delayed, or transactions were shot back, from the developer's point of view, at an unpredictable moment. Because, you understand, if, for example, we wait 30 seconds and delay replication, and after that we shoot the request, then by the time the request is shot, replication is already 30 seconds behind. Accordingly, the next transaction can be shot almost instantly, because these 30 seconds, this limit will be depleted very quickly. From the point of view of the developer, a transaction at some random moment ... we just started, haven’t done anything yet, but they’re already interrupting us.
Community Response. Everyone says that you had to use the hot_standby_feedback = on option, you had to enable it. Christoph Petus adds that Uber is just fools.
This option has a price. There are some negative consequences, otherwise it would be included simply by default. In this case, if returning to this picture, what does she do. The replica begins to transmit information to the master about which versions, which transactions are open, which version of the lines are still needed. And in this case, the wizard actually just holds the vacuum so that he does not delete these versions of the lines. There are also all sorts of problems. For example, replicas are often used for testing, so that developers play not on a combat vehicle, but not some replica. Imagine, the developer, during testing, started some kind of transaction and went to lunch, and at that time the vacuum stops at the master and as a result the place ends and the optimization stops working and as a result the master may simply die.
Or another case. Replicas are often used for some kind of analytics, long-running queries. This is also a problem. We run a long-running query with the hot_standby_feedback = on option enabled, vacuum also stops on the wizard. And then it turns out that for some long analytic queries there is no alternative at all. Either they simply kill the master with the hot_standby_feedback = on option turned on, or if it is turned off, they will simply shoot back and never finish.
What is everyone silent about? As I already said, the option that everyone advises, it delays the vacuum on the wizard and in PostgreSQL at one time they were going to make it turned on by default, but those offering immediately explained why this was not necessary. So just advising her without any reservations is not very honest, I would say.
Developers make mistakes. Christophe Petus says that since Uber was unable to hire the right developers, they are fools. It breaks where it is thin. Developers make mistakes where they are allowed to make mistakes. What no one said was that physical replication is poorly suited for scaling reads. If we use replicas in order to scale the reading, twist some selections, physical replication in PostgreSQL simply does not suit the architecture. It was a hack that was easy to implement, but it has its limits. And this again is not a specific case. Replication can be used for different purposes. Using this replication to scale reads is also a fairly common case. Another issue that Uber reports is replication and write amplification. As we have discussed, on the master with some specific updates, write amplification also occurs. Too much data is written to disk, but since replication in PostgreSQL is physical, and all changes in files go to replicas, all these changes that occur in files on the wizard go to the network. It is quite redundant, physical replication, by nature. There are speed problems between different data centers. If we are replicating from one coast to another, then to buy a sufficiently wide channel that would contain all this replication flow that comes from the master is problematic and not always possible. which occur in the files on the master go to the network. It is quite redundant, physical replication, by nature. There are speed problems between different data centers. If we are replicating from one coast to another, then to buy a sufficiently wide channel that would contain all this replication flow that comes from the master is problematic and not always possible. which occur in the files on the master go to the network. It is quite redundant, physical replication, by nature. There are speed problems between different data centers. If we are replicating from one coast to another, then to buy a sufficiently wide channel that would contain all this replication flow that comes from the master is problematic and not always possible.
What the community is responding to. Robert Haas said that one could try WAL compression, which is transmitted to replications over a network or SSL, there you can also enable compression at the network connection level. And also, you could try the logical replication solution, which is available in PostgreSQL. He mentions Slony / Bucardo / Londiste. Simon Riggs is giving a lecture on the pros and cons of physical replication, I would write all of this here too, but lately I have talked about this so often that I’ve already gotten a little tired of it, I believe that everyone understands how logical replication differs from physical. They also add that pglogical will come soon - this is the built-in logical replication solution in PostgreSQL that his company is developing.
Christoph Petus says there is no need to compare logical and physical replication. But this is strange, because that’s exactly what Uber does. He compares the pros and cons of logical and physical replication, saying that physical replication is not very suitable for them, but logical replication in MySQL is just fine. It is not clear why it is not necessary to compare. He adds that Slony, Bucardo is hard to install and hard to manage, but it's Uber. Since they could not do this, then they are fools. And besides, in version 9.4 there is pglogical. This is a third-party extension for PostgreSQL that organizes logical replication. I must say that Uber used older versions of PostgreSQL, 9.2, 9.3, for which there is no pglogical, and they could not switch to 9.4. I will continue to explain why.
Alexander Korotkov says Uber compares logical replication in MySQL with physical replication. That's right. That is what he does. There is no physical replication in MySQL and Alibaba is working on adding physical replication to MySQL - this is also true, only Alibaba does this for a completely different use case. Alibaba does this to provide high availability for its cloud application within one data center. They do not use cross-regional replication, they do not use reading from replicas. This is only used if cloud clients crashes the server so that they can quickly switch to replica. That's all. That is, this use case is different from what Uber uses.
Another Uber issue they are talking about is cross-version updates. It is necessary to switch from one major version to another and it would be desirable to do this without stopping the wizard. PostgreSQL with physical replication does not solve this problem. Because, usually, in MySQL they do like. First, replicas, or one replica, if there is only one, upgrade to an older version, then switch requests to it ... actually make it a new master, and the old master upgrades to a new version. It turns out downtime is minimal. But with physical replication in PostgreSQL, you cannot replicate from a minor version to an older one and generally the versions must be the same on both the master and the replica. Pglogical, again, was not an option for them, because they used older versions and at 9.4, from which pglogical was available, they couldn’t move, because there was no replication. This recursion.
Some of their legacy services, and they still remain in Ubere, they still work on PostgreSQL 9.2 for this simple reason.
Community Response. Robert Haas does not comment on this in any way. Simon Riggs says that in fact there is pg_upgrade - this is a procedure of its own upgrade and with the -k option, this process takes less time, so downtime will still be there, the wizard will have to be stopped and all requests will also have to be stopped. Well, not so long. In addition, Simon Riggs says that they have a commercial solution in their company that integrates old versions with new ones.
Christophe Petus says that pg_upgrade is not a panacea, in particular PostGIS, a geoinformation extension due to which Uber switched to PostgreSQL cause a lot of problems when upgrading between major versions. I don’t know, I believe him here because he is a PostgreSQL consultant. He says that Uber is not mastered, again, Slony / Bucardo, pglogical, which means they are fools. Alexander Korotkov shows how to upgrade between major versions using pglogical.
Our report is called: “Uber answer,” for Uber it was not an option. The answer to current users, then there too have their own nuances.
What is everyone silent about? Yes, there really are all sorts of third-party solutions for PostgreSQL that organize the logical replication of Slony / Bucardo / Londiste, but they are all trigger-based, all of them are based on triggers, that is, a trigger is hung on each table, with any update we take these changes, who made and write them to a special relational journaling table, and then when they move to replicas, all these changes from the relational table, we delete them. And here there are typical problems. The sore PostgreSQL with Vacuum, when data is often inserted and deleted, a lot of work for vacuum, respectively, a lot of all sorts of problems. Not to mention the fact that the trigger-based solution itself is not very effective, you understand, but the data from the tables is pulled out by some external scripts in general. Not in the DBMS itself,
With pglogical, everything is very strange. Everyone advised that pglogical can be used to upgrade from a younger version to a new one. I read the documentation, the developers do not promise that such cross-versioned replication will work. They say that it can work, but in future versions, something may part, and, in general, we promise nothing.
Low "roundness" even now, in 2017. This is a relatively new technology, it is, of course, progressive, it potentially uses some built-in things in PostgreSQL, and not some external solutions, but I don’t really understand how Uber could be recommended for use in productions in 2015, when pglogical generally just started to be developed. But the pglogical is good, because unlike the main PostgreSQL, you can go and see a bug tracker to assess the stability of the project, the maturity of the project, how widely it is used. Honestly, I was not impressed. This did not impress me on any stable project.
Activity is very low, there are some bugs that have been hanging there for years. Someone has problems with performance, someone has a problem with data loss. That does not seem like a stable project. In my hobby project ( sysbench ) there is more activity on the github than in pglogical.
What are they still not talking about? That all these problems, all these third-party and built-in pglogical solutions have problems with sequences, because you cannot hang a trigger on them and changes are not written to the transaction log, which means that you have to do some special complex squats to ensure replication sequences. DDL only through wrapper functions, because, again, triggers cannot be hung on DDL and for some reason they are not written to the transaction log in PostgreSQL. All three solutions do not have a log as such, that is, you cannot take any backup and roll up logical changes from a log like in MySQL. Only a physical backup is possible, but to make a physical PITR, you need a physical backup. You won’t roll it on a logical back up. No GTID, some last things that appeared in replication in MySQL. I still did not understand how to make re-positioning using these solutions. Say, switch the slave replica from one master to another and understand which transactions have already been lost and which are not.
Finally, there is no parallelism. The only way to speed up logical replication is to parallelize it. Much work has been done in MySQL in this direction. All these solutions do not support concurrency in any form whatsoever. He is simply not there.
What is everyone missing? Firstly, there is only one correct scenario for using physical replication in PostgreSQL or in MySQL, when it appears somewhere or somewhere else. This is High Availability within the same data center when we have a fairly fast channel, and when you do not need to read from replicas. This is the only correct use case. Everything else may or may not work, depending on the load and many other parameters.
To date, PostgreSQL cannot offer a complete alternative to replication in MySQL. Glogical is a strange thing. Many say that in PostgreSQL 10 pglogical will appear, built-in logical replication, in fact, few in the Postgres community know that some truncated version of pglogical will go into 10. I have not found any intelligible description of what exactly will go in, and what will remain overboard. In addition, the syntax in 10 will be used slightly different from what the third-party pglogical extension used. It is not clear that it will be possible to somehow use them together. Is it possible to replicate pglogical from older versions in PostgreSQL 10 or not. I have not found anywhere. The pglogical documentation is also peculiar.
Inefficient cache
And finally, an inefficient cache. What is the difference between cache implementations in InnoDB and PostgreSQL. InnoDB uses unbuffered I / O without relying on its own operating system cache, therefore, all memory, all maximum available memory on the machine is allocated to the InnoDB cache. InnoDB uses ordinary pread and pwrite system calls to perform unbuffered I / O. Without relying on the operating system cache.
PostgreSQL has a different architecture. For some reason, direct I / O support for unbuffered I / O has not yet been done, therefore, PostgreSQL relies on both its own cache and the operating system cache at the same time. In addition, for some reason he uses not such system calls that allow you to immediately shift and read something and write, but first a separate system call to move to the desired position, and then read and write. Two system calls per operation. There is some duplication of data. Data memory pages will be stored in both the native cache and the operating system cache.
Robert Haas and Simon Riggs do not respond to this. Christoph Petus, something happened to him, he does not call Uber fools here, I agree that there is a problem, but the effect in practice is not clear. Alexander Korotkov says that pread instead of lseek gives optimization only 1.5 percent of the performance, but this is part of the problems that Uber lists. Says nothing more.
What is everyone omitting? That 1.5% were received on one benchmark on the laptop of the developer who developed this patch. Experimentally, there was some discussion on the Postgres hackers mailing list, and the patch ended up stuck in the discussions. Robert Haas said this patch needs to be enabled. Even 1.5% is likely to be larger on combat systems, even 1.5% is already good, and Tom Lane, another key PostgreSQL developer, was against it, saying that you do not need to touch stable code because of some 1 ,5%. In general, the two key developers did not find understanding and the patch eventually got stuck. He did not go anywhere.
This is far from the only problem in the design of shared buffers; in my last year's report on DevConf, this was revealed in detail. There are many more problems than Uber lists. There is data duplication, there is a problem with double checksum calculation, there is a problem with “flushing” data from the operating system cache, which InnoDB with its cache can prevent. And in the future, if compression and encryption appears in PostgreSQL, problems with such an architecture will also begin there.
What we see in the end. Uber, I talked to them, and they also gave presentations at conferences, on Percona live. They say that: “We are generally happy with MySQL, we are not going to move anywhere. We have old legacy systems that still work on PostgreSQL, but all the main processing is done in MySQL. PostGIS is the main reason why we switched to PostgreSQL, a geographic information extension. This is cool, but with its pitfalls and in addition, this thing does not scale well for OLTP loads. We found a way to do without it in MySQL. Yes, there is no such possibility in MySQL yet. ”
I give advice that you do not believe in this hype, in the halo of a perfect system, devoid of flaws. Be engineers, be skeptics, try to make decisions based on objective things, and not based on rumors and some kind of common hysteria. And most importantly - test. Before you plan to replace the DBMS in your project, conduct at least some tests to see how this DBMS behaves in your specific case. That's all for me!
Here is such a report with a clear desire to show the advantages of InnoDb :) Of course, in the near future we will dilute our habr with a report from the opposite camp. Come May 18th at DevConf(special prices valid until April 30). Alex promised to be there with a new report. True, now it will be more about MySQL. And Ivan Panchenko from Postgres Professional will talk about the very logical replication that was included in PostgreSQL 10.