PostgreSQL evangelist reminder: replicants versus replication



    In the continuation of the series of publications “Memo of the PostgreSQL Evangelist ...” ( 1 , 2 ), the expensive editors get in touch again, this time with the promised review of replication mechanisms in PostgreSQL and MySQL. The main reason for writing was the frequent criticism of MySQL replication. As often happens, typical criticism is a boisterous mixture of truth, half-truth, and evangelism . All this is repeatedly replicated by different people without any special attempts to understand what they heard. And since this is a rather extensive topic, I decided to put the analysis into a separate publication.

    So, as part of a cultural exchangeand in anticipation of HighLoad ++, where there will probably be a lot of criticism of MySQL as usual, consider replication mechanisms. For starters, a little boring basic things for those who have not yet.

    Replication Types


    Replication can be logical and physical. The physical is a description of the changes at the data file level (simplified: write such bytes at such an offset on such a page). The logical one describes changes at a higher level without reference to a specific representation of the data on the disk, and here options are possible. You can describe changes in terms of table rows, for example, an operator UPDATEcan be reflected as a sequence of pairs (old values, new values) for each changed row. In MySQL, this type is called row-based replication. And you can simply write the text of all SQL queries that modify the data. This type is called statement-based replication in MySQL.

    Physical replication is often called binary (especially in the PostgreSQL community), which is not true. The data format of both logical and physical replication can be either textual (i.e., human-readable) or binary (requiring processing for human reading). In practice, all formats in both MySQL and PostgreSQL are binary. Obviously, in the case of statement-based replication in MySQL, the query texts can be read with the naked eye, but all the service information will still be in binary form. Therefore, the journal used in replication is called binary regardless of the replication format.

    Features of logical replication:


    • independence of the data storage format: the master and slave can have different representations of the data on the disk, different processor architectures, different table structures (subject to schema compatibility), different configurations and locations of data files, different storage engines (for MySQL), different server versions, indeed, a master and a slave can be different DBMSs (and such solutions for cross-platform replication exist). These properties are used often, especially in large-scale projects. For example, in rolling schema upgrade.
    • read accessibility: data can be read from each node in replication without any restrictions. With physical replication, this is not so easy (see below)
    • multi-source feature : combining changes from different masters on one slave. Usage example: aggregation of data from several shards to build statistics and reports. The same Wikipedia uses multi-source for these purposes.
    • multi-master feature: for any topology, you can have more than one writable server, if necessary
    • partial replication: the ability to replicate only individual tables or schemas
    • compactness: the amount of data transmitted over the network is less. In some cases, much less.


    Features of physical replication:


    • easier to configure and use: The task of byte-mirroring one server to another by itself is much simpler than logical replication with its many use cases and topologies. Hence, the famous “set up and forget” in all MySQL vs PostgreSQL holivas.
    • low resource consumption: Logical replication requires additional resources, because the logical description of the changes still needs to be “translated” into the physical, i.e. understand what exactly and where to write to disk
    • 100% node identity requirement: physical replication is possible only between absolutely identical servers, up to the processor architecture, paths to tablespace files, etc. This can often be a problem for large replication clusters, as a change in these factors leads to a complete stop of the cluster.
    • no entry on the slave: follows from the previous paragraph. Even a temporary table cannot be created.
    • reading from a slave is problematic: reading from a slave is possible, but not without problems. See “Physical Replication in PostgreSQL” below
    • limited topologies: no multi-source and multi-master are possible. Cascading replication at best.
    • no partial replication: all of the same requirement follows 100% identity of data files
    • large overhead: you need to transfer all the changes in the data files (operation with indexes, vacuum and other internal accounting). This means that the load on the network is higher than with logical replication. But everything as usual depends on the number / type of indices, load and other factors.

    And replication can be synchronous, asynchronous and semi-synchronous, regardless of format. But this fact has little relevance to the things discussed here, so we leave it outside the brackets.

    Physical Replication in MySQL


    It is not as such, at least not built into the server itself. There are architectural reasons for this, but this does not mean that it is impossible in principle. Oracle could have implemented physical replication for InnoDB with relatively little effort, and that would already have covered the needs of most users. A more sophisticated approach would require the creation of some API, implementing which alternative engines could support physical replication, but I do not think that this will ever happen.

    But in some cases, physical replication can be implemented by external means, for example using DRBD or hardware solutions . The pros and cons of this approach have been discussed repeatedly and in detail.

    In the context of comparison with PostgreSQL, it should be noted that using DRBD and similar MySQL physical replication solutions is most similar to PostgreSQL's warm standby. But the amount of data transmitted over the network in DRBD will be higher, because DRBD works at the block device level, which means that not only entries in the REDO log (transaction log) are replicated, but also records in data files and file system meta-information updates.

    Logical replication in MySQL


    This topic causes the most excitement. Moreover, most of the criticism is based on the report “Asynchronous MySQL replication without censorship or why PostgreSQL will conquer the world” by Oleg Tsarev zabivator , as well as a related article on Habré.

    I would not single out one specific report if I had not been referred to in about one in ten comments on previous articles. Therefore, I have to answer, but I would like to emphasize that there are no perfect reports (I personally get bad reports), and all the criticism is directed not at the speaker, but at the technical inaccuracies in the report. I would be happy if this helps improve its future versions.

    In general, the report contains quite a few technically inaccurate or simply incorrect statements, some of which I addressed inThe first part of the evangelist’s memo . But I do not want to drown in trifles, therefore I will analyze the main points.

    So, in MySQL, logical replication is represented by two subtypes: statement-based and row-based.

    Statement-based is the most naive way to organize replication (“let's just send commands to the SQL slave!”), Which is why it first appeared in MySQL and it was a very long time ago. It even works as long as the SQL commands are strictly determined, i.e. lead to the same changes regardless of runtime, context, triggers, etc. Tons of articles have been written about this, I will not dwell here in detail.

    In my opinion, statement-based replication is a hack and “legacy” in MyISAM style. Surely someone somewhere else finds her application, butAvoid this if possible .

    Interestingly, Oleg also talks about the use of statement-based replication in his report. The reason is row-based replication would generate terabytes of information per day. Which is generally logical, but how is this consistent with the statement “PostgreSQL will conquer the world” if PostgreSQL does not have asynchronous statement-based replication at all? That is, PostgreSQL would also generate terabytes of updates per day, the disk or network would become the “bottleneck”, as expected, and with the conquest of the world would have to wait .

    Oleg draws attention to the fact that logical replication is usually CPU-bound, that is, rests on the processor, and physical replication is usually I / O-bound, that is, rests on the network / disk. I’m not entirely sure of this statement: CPU-bound load with one hand turns into an elegant I / O bound as soon as the active data set ceases to fit in memory (a typical situation for Facebook, for example). And along with this, most of the difference between logical and physical replication is leveled. But in general, I agree: logical replication requires relatively more resources (and this is its main drawback), and physical less (and this is almost its only advantage).

    There can be many reasons for “slowing down" replication: it is not only single-threaded or a processor lack, it can be a network, disk, inefficient requests, inadequate configuration. The main harm from the report lies in the fact that it “rows everyone into one comb”, explaining all the problems with a certain “MySQL architectural mistake”, and leaving the impression that these problems have no solution. That is why he was joyfully taken into service by evangelists of all stripes . In fact, I’m sure that 1) most of the problems have a solution and 2) all these problems exist in the logical replication implementations for PostgreSQL, possibly even in a more severe form (see “Logical replication for PostgreSQL”).

    From Oleg’s report it’s very difficult to understand what actually became a problem in his tests: there is no attempt to analyze, there are no metrics, neither at the OS level nor at the server level. For comparison: the publication of engineers from Booking.com on the same topic, but with a detailed analysis and without "evangelical" conclusions. I especially recommend that you read the Under the Hood section . That's the right way to do and show benchmarks. In Oleg’s report, 3 slides were allocated for benchmarks.

    I’ll just briefly list the possible problems and their solutions. I foresee a lot of comments in the spirit of "but in the elephant everything works fine without any shamanism!" I will answer them once and I won’t: physical replication is easier to configure than logical replication, but not everyone has enough of its capabilities. Logical possibilities have more, but there are also disadvantages. Here are ways to minimize the flaws for MySQL.

    If we rest against the disk


    Often weak machines are allocated for the slave for reasons of "well, this is not the main server, this old basin will come down." In the old basin, usually there is a weak disk, in which everything rests.

    If the disk is a bottleneck during replication, and it is not possible to use something more powerful, you need to reduce the disk load.

    Firstly, you can control the amount of information that master writes to the binary log, which means it is sent over the network and written / read on a slave. Settings worth seeing: binlog_rows_query_log_events, binlog_row_image.

    Secondly, you can disable the binary log on the slave. It is needed only if the slave itself is a master (in multi-master topology or as an intermediate master in cascading replication). Some keep the binary log enabled in order to speed up the switch of the slave to master mode in case of failover. But if there is a problem with disk performance, it can and should be disabled.

    Third, you can loosen the durability settings on the slave. By definition, a slave is irrelevant (due to asynchrony) and is not the only copy of the data, which means that if it falls, it can be recreated either from the backup, from the master, or from another slave. Therefore, there is no reason to keep strict durability setting keywords: sync_binlog, innodb_flush_log_at_trx_commit, innodb_doublewrite.

    Finally, no one canceled the general configuration of InnoDB for intensive recording. Tags: innodb_max_dirty_pages_pct, innodb_stats_persistent, innodb_adaptive_flushing, innodb_flush_neighbors, innodb_write_io_threads , innodb_io_capacity, innodb_purge_threads, innodb_log_file_size, innodb_log_buffer_size.

    If all else fails, you can look towards the TokuDB engine, which is firstly optimized for intensive recording, especially if the data does not fit in memory, and secondly provides the ability to organize read-free replication . This can solve the problem in both IO-bound and CPU-bound loads.

    If we run into the processor


    With a fairly intense recording on the master and the absence of other bottlenecks on the slave (network, disk), you can run into the processor. Parallel replication comes to the rescue here, it's also multi-threaded slave (MTS).

    In 5.6, MTS was made in a very limited way: only updates to different databases were performed in parallel (schemes in PostgreSQL terminology). But certainly in the world there is a non-empty set of users for whom this is quite enough (hello, hosters!).

    In 5.7, MTS was expanded to run arbitrary updates in parallel. In early pre-release versions 5.7, concurrency was limited by the number of simultaneously committed transactions within the group commit. This limited parallelism, especially for systems with fast disks, which most likely led to insufficiently effective results for those who tested these early versions. This is quite normal, for that they are early versions, so that interested users can test and scold. But not all users are aware of making a report out of this with the conclusion “PostgreSQL will conquer the world.”

    Nevertheless, here are the results of the same sysbench tests that Oleg used for the report, but already on the GA release 5.7. What we see in the dry residue:

    • MTS on the slave achieves 10x performance increase over single-threaded replication
    • use slave_parallel_workers=4already leads to an increase in slave throughput by more than 3.5 times
    • row-based replication performance is almost always higher than statement-based. But MTS has a greater effect on statement-based, which somewhat equalizes both formats in terms of performance on OLTP loads.

    Another important conclusion from Booking.com testing: the smaller the transaction size, the more parallelism can be achieved. Prior to the advent of group commit in 5.6, developers tried to make transactions as large as possible, often unnecessarily from the point of view of applications. Starting with 5.6, this is not necessary, but for parallel replication in 5.7 it is better to reconsider the transactions and break them into smaller ones where possible.

    In addition, you can adjust the parameters binlog_group_commit_sync_delayand binlog_group_commit_sync_no_delay_counton the master, which can lead to more parallel to the slave, even in the case of long transactions.

    On this topic, with replication in MySQL and a popular report, I think it’s closed, we move on to PostgreSQL.

    Physical Replication in PostgreSQL


    In addition to all the pros and cons of physical replication listed earlier, the implementation in PostgreSQL has another significant drawback: replication compatibility is not guaranteed between major releases of PostgreSQL, since WAL compatibility is not guaranteed. This is really a serious drawback for loaded projects and large clusters: it requires a wizard stop, upgrade, then a complete recreation of the slaves. For comparison: problems with replication from old versions to new ones happen in MySQL, but they are fixed and in most cases it works, no one refuses compatibility. What is used when updating large-scale clusters - the advantages of "flawed" logical replication.

    PostgreSQL provides the ability to read data from a slave (the so-called Hot Standby), but this is far from simple as with logical replication. From the documentation on Hot Standby it was possible to find out that:

    • SELECT ... FOR SHARE | UPDATE not supported because modification of data files is required for this
    • 2PC commands are not supported for the same reasons
    • explicit indication of “read write” transaction status ( BEGIN READ WRITEetc.), LISTEN, UNLISTEN, NOTIFY, sequence updates are not supported. Which is generally understandable, but this means that some applications will have to be rewritten when migrating to Hot Standby, even if they do not modify any data
    • Even read-only requests can cause conflicts with DDL and vacuum operations on the wizard (hello to the “aggressive” vacuum settings!) In this case, requests can either delay replication or be forced to abort, and there are configuration parameters that control this behavior
    • slave can be configured to provide "feedback" with the master (parameter hot_standby_feedback). Which is good, but the overhead of this mechanism in loaded systems is interesting.

    In addition, I found a marvelous warning in the same documentation:
    • "Operations on hash indexes are not presently WAL-logged, so replay will not update these indexes" - uh, what’s it like? And with physical backups what?


    There are some features with failover that may seem strange to the MySQL user, such as the inability to return to the old master after failover without re-creating it. I quote the documentation:
    Once failover to the standby occurs, there is only a single server in operation. This is known as a degenerate state. The former standby is now the primary, but the former primary is down and might stay down. To return to normal operation, a standby server must be recreated, either on the former primary system when it comes up, or on a third, possibly new, system.


    There is another specific feature of physical replication in PostgreSQL. As I wrote above, the traffic overhead for physical replication is generally higher than for logical replication. But in the case of PostgeSQL, full images of pages updated after checkpoints ( full_page_writes) are written (and therefore transmitted over the network) to the WAL . I can easily imagine the load, where such behavior can be a disaster. Here, for sure, several people will rush to explain to me the meaning full_page_writes. I know that it’s just implemented in InnoDB a little differently, not through a transaction log.

    Updated 09.28.2016: The same problems with replication, but in English words in an article by Uber engineers about the reasons for switching from PostgreSQL to MySQL: eng.uber.com/mysql-migration Updated 10.30.2017

    :An interesting problem that arose precisely from the fact that PostgreSQL replication is physical: thebuild.com/blog/2017/10/27/streaming-replication-stopped-one-more-thing-to-check

    Otherwise, physical replication in PostgreSQL is probably really a reliable and easy-to-configure mechanism for those to whom physical replication is generally suitable.

    But PostgreSQL users are human too and nothing human is alien to them. Someone wants sometimes multi-source. And someone really likes multi-master or partial replication. Perhaps that is why it exists ...

    Logical Replication in PostgreSQL


    I tried to understand the state of logical replication in PostgreSQL and was downcast . There is no built-in, there are a bunch of third-party solutions ( who said “confusion”? ): Slony-I (by the way, where is Slony-II?), Bucardo, Londiste, BDR, pgpool 1/2, Logical Decoding, and that's not counting the dead or proprietary projects.

    Everyone has some problems of their own — some look familiar ( replication in MySQL is often criticized for them ), some look strange to the MySQL user. Some kind of total trouble with DDL replication, which is not supported even in Logical Decoding (I wonder why?).

    BDR requires a patched version of PostgreSQL ( who said “forks”? ).

    I have some doubts about performance. I am sure that someone in the comments will begin to explain that replication on triggers and scripts in Perl / Python is fast , but I will only believe this when I see comparative load tests with MySQL on the same hardware.

    Logical Decoding looks interesting. But:

    1. This is not replication as such, but a constructor / framework / API for creating third-party logical replication solutions
    2. Using Logical Decoding requires writing additional information to the WAL (requires installation wal_level=logical). Hello to the critics of the binary log in MySQL!
    3. Some of the third-party solutions have already moved to Logical Decoding, and some have not.
    4. From reading the documentation, I got the impression that this is an analogue of row-based replication in MySQL, but with a bunch of limitations: no parallel in principle, no GTID (how do slave cloning and failover / switchover do in complex topologies?), Cascading replication is not supported.
    5. If I understand these slides correctly, the SQL interface in Logical Decoding uses the Poll model to propagate the changes, and the protocol for replication uses the Push model. If this is true, what happens when a slave temporarily drops out of replication in the Push model, for example, due to network problems?
    6. There is support for synchronous replication, which is good. What about semi-synchronous replication , which is more relevant in heavily loaded clusters?
    7. You can select the redundancy of information using the option REPLICA IDENTITYfor the table. This is a kind of analog variable binlog_row_imagein MySQL. But the variable in MySQL is dynamic, it can be set separately for the session, or even separately for each request. Is it possible in PostgreSQL?
    8. In short, where can I see the report, “Asynchronous Logical Replication in PostgreSQL Uncensored”? . I would love to read and see.


    As I said, I do not pretend to be knowledgeable about PostgreSQL. If any of this is incorrect or inaccurate - let me know in the comments and I will definitely fix it. It would also be interesting to get answers to questions that I had along the way.

    But my overall impression is that logical replication in PostgreSQL is in its early stages. In MySQL, logical replication has existed for a long time, all its pros, cons and pitfalls are well known, studied, discussed and shown in various reports . In addition, she has changed a lot in recent years.

    Conclusion


    Since this publication contains some criticism of PostgreSQL, I predict another explosion of comments in the style of: “But my friend and I worked together on the muscle and everything was bad, but now we work on the elephant and life has gone smoothly . I believe. No seriously. But I do not urge anyone to move somewhere or change anything at all.

    The article has two goals:

    1) an answer to not quite correct criticism of MySQL, and
    2) an attempt to systematize the numerous differences between MySQL and PostgreSQL. Such comparisons require a lot of work, but this is what is often expected from me in the comments.

    In the next post, I'm going to continue the comparison, this time in the light of performance.

    Also popular now: