How we built a reliable PostgreSQL cluster on Patroni

    Today, high availability of services is required always and everywhere, not only in large expensive projects. Temporarily unavailable sites with the message “Sorry, maintenance is being performed” still occur, but usually cause a condescending smile. Add to this the life in the clouds, when to start an additional server you need only one call to the API, and you don’t need to think about “iron” operation. And there is no longer any excuse why the critical system was not made reliably using cluster technologies and redundancy.

    We will tell you what solutions we considered to ensure the reliability of the databases in our services and what we came to. Plus a demo with far-reaching conclusions.

    Legacy in high availability architecture

    This is even better seen in the context of the development of various opensource-systems. Old solutions were forced to add high-availability technologies as demand increased. And their quality was different. Next-generation solutions put high availability at the core of their architecture. For example, MongoDB positions the cluster as the main use case. The cluster scales horizontally, which is a strong competitive advantage of this DBMS.

    Back to PostgreSQL. This is one of the oldest popular opensource projects, the first release of which took place in the 95th year of the last century. For a long time, the project team did not consider high availability as a task that needs to be addressed by the system. Therefore, the replication technology for creating data copies became integrated only in version 8.2 in 2006, but it was filed (log shipping). In 2010, streaming replication appeared in version 9.0, and it is the basis for creating a wide variety of clusters. This, in fact, is very surprising for people who get to know PostgreSQL after Enterprise SQL or modern NoSQL - the standard solution from the community is just a couple of master-replica with synchronous or asynchronous replication. At the same time, the wizard is switched manually in the drain, and the issue of switching clients is also proposed to be solved independently.

    How we decided to make reliable PostgreSQL and what we chose for this

    However, PostgreSQL would not have become so popular if there weren’t a huge number of projects and tools that help build a fault-tolerant solution that does not require constant attention. In a cloud of Cloud Solutions (the MCS) from the start DBaaS were available single PostgreSQL servers and a pair of master replica with asynchronous replication.

    Naturally, we wanted to simplify everyone’s life and make PostgreSQL installation available, which could serve as the basis for highly accessible services that you would not have to constantly monitor and wake up at night to make the switch. In this segment there are both old proven solutions and a generation of new utilities that use the latest developments.

    Today, the problem of high availability does not rest on redundancy (it goes without saying), but on consensus - an algorithm for choosing a leader (Leader election). Most often, major accidents occur not because of a lack of servers, but because of problems with consensus: a new leader did not get out, two leaders appeared in different data centers, etc. An example is a crash on the Github MySQL cluster - they wrote a detailed post mortem .

    The mathematical base in this matter is very serious. On the one hand, there is a CAP theorem that imposes theoretical restrictions on the possibility of constructing HA solutions, on the other hand, mathematically proven consensus determination algorithms such as Paxos and Raft. On this basis, there are quite popular DCS (decentralized consensus systems) - Zookeeper, etcd, Consul. Therefore, if the decision-making system works on some of its own algorithm, written independently, you should be extremely careful about it. After analyzing a huge number of systems, we settled on Patroni - an opensource system, mainly developed by Zalando.

    As a lyrical digression, I’ll say that we also considered multi-master solutions, that is, clusters that can be horizontally scaled to recording. However, for two main reasons, they decided not to make such a cluster. Firstly, such solutions have high complexity and, accordingly, more vulnerabilities. It will be hard to make a stable decision for all cases. Secondly, in this case, PostgreSQL ceases to be pure (native), some functions will not be available, some applications may experience hidden bugs when working.


    So how does Patroni work? The developers did not reinvent the wheel and suggested using one of the proven DCS solutions as the basis. All issues with synchronization of configurations, choice of a leader and quorum are given to him. We have chosen etcd for this.

    Further, Patroni deals with the correct application of all settings on PostgreSQL and replication settings, as well as the execution of commands on switchover and failover (that is, regular and non-standard switching wizards). Specifically, in the MCS cloud, you can create a cluster from a wizard, a synchronous replica, and one or more asynchronous replicas. The presence of a synchronous replica ensures the safety of data on at least 2 servers, and this replica will be the main "candidate for the master."

    Since etcd is deployed on the same servers, it is recommended that the number of servers be 3 or 5, for the optimal quorum value. Such a cluster is scaled horizontally for reading (I wrote about scaling for writing above). Nevertheless, it should be borne in mind that asynchronous replicas are lagging, especially at high loads.

    The use of such read standby replicas is justified for reporting or analytic tasks and offloads the master server.

    If you want to make such a cluster yourself, then you will need:

    • prepare 3 or more servers, configure IP addressing and firewall rules between them;
    • install packages for services etcd, Patroni, PostgreSQL;
    • configure etcd cluster;
    • configure the patroni service to work with PostgreSQL.

    That is, in total, you need to correctly compose a dozen configuration files and not make a mistake anywhere. To do this, it’s definitely worth using a configuration management tool such as Ansible, for example. However, there is still no highly available TCP balancer. To do it is a separate work.

    For those who need a ready-made cluster, but do not want to poke around with all this, we tried to simplify our life and made a ready-made cluster on Patroni in our cloud, it can be tested for free. In addition to the cluster itself, we did:

    • TCP balancer on different ports, it always points to the current master, synchronous or asynchronous replica, respectively;
    • API for switching the active Patroni wizard.

    They can be connected both through the MCS cloud API and the web console.


    To test the capabilities of the PostgreSQL cluster in the MCS cloud, let's see how a live application behaves in case of problems with the DBMS.

    The following is the code for an application that will log artificial events and report this to the screen. In case of errors, it will report this and continue its work in a cycle until we stop it with the combination Ctrl + C.

    from __future__ import print_function
    from datetime import datetime
    from random import randint
    from time import sleep
    import psycopg2
    def main():
            connection = psycopg2.connect(user = "admin",
                                          password = "P@ssw0rd",
                                          host = "",
                                          port = "5432",
                                          database = "myproddb")
            cursor = connection.cursor()
            cursor.execute("SELECT version();")
            record = cursor.fetchone()
            print("Connection opened to", record[0])
                "INSERT INTO log VALUES ({});".format(randint(1, 10000)))
            cursor.execute("SELECT COUNT(event_id) from log;")
            record = cursor.fetchone()
            print("Logged a value, overall count: {}".format(record[0]))
        except Exception as error:
            print ("Error while connecting to PostgreSQL", error)
            if connection:
                print("Connection closed")
    if __name__ == '__main__':
            while True:
                except Exception as e:
                    print("Caught error:\n", e)
        except KeyboardInterrupt:

    An application needs PostgreSQL to work. Create a cluster in the MCS cloud using the API. In a regular terminal, where the OS_TOKEN variable contains a token for accessing the API (you can get it with the openstack token issue command), type the commands:

    Create a cluster:

    cat < pgc10.json
    curl -s -H "X-Auth-Token: $OS_TOKEN" \
    -H 'Accept: application/json' \
    -H 'Content-Type: application/json' \
    -d @pgc10.json

    When the cluster enters the ACTIVE status, all fields will receive the current values ​​- the cluster is ready.

    In the GUI:

    Let's try to connect and create a table:

    psql -h -U admin -d myproddb
    Password for user admin:
    psql (11.1, server 10.7)
    Type "help" for help.
    myproddb=> CREATE TABLE log (event_id integer NOT NULL);
    myproddb=> INSERT INTO log VALUES (1),(2),(3);
    INSERT 0 3
    myproddb=> SELECT * FROM log;
    (3 rows)

    In the application, we indicate the current settings for connecting to PostgreSQL. We will specify the address of the TCP-balancer, thereby eliminating the need for manual switching to the address of the wizard. Run it. As you can see, the events are successfully logged into the database.

    Scheduled master switch

    Now we test the operation of our application during the planned switching of the wizard: We

    observe the application. We see that the application is really interrupted, but it only takes a few seconds, in this particular case, a maximum of 9.

    Car fall

    Now let's try to simulate the fall of a virtual machine, the current master. It would be possible to simply turn off the virtual machine through the Horizon interface, only it will be a regular shutdown. Such a switch will be processed by all services, including Patroni.

    We need an unpredictable shutdown. Therefore, I asked our administrators for test purposes to turn off the virtual machine - the current master - in an abnormal way.

    At the same time, our application continued to work. Naturally, such an emergency switch of the master cannot pass unnoticed.

    2019-03-29 10:45:56.071234
    Connection opened to PostgreSQL 10.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit
    Logged a value, overall count: 453
    Connection closed
    2019-03-29 10:45:59.205463
    Connection opened to PostgreSQL 10.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit
    Logged a value, overall count: 454
    Connection closed
    2019-03-29 10:46:02.661440
    Error while connecting to PostgreSQL server closed the connection unexpectedly
            This probably means the server terminated abnormally
            before or while processing the request.
    Caught error:
     local variable 'connection' referenced before assignment
    ……………………………………………………….. - здесь какое-то количество ошибок
    2019-03-29 10:46:30.930445
    Error while connecting to PostgreSQL server closed the connection unexpectedly
            This probably means the server terminated abnormally
            before or while processing the request.
    Caught error:
     local variable 'connection' referenced before assignment
    2019-03-29 10:46:31.954399
    Connection opened to PostgreSQL 10.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit
    Logged a value, overall count: 455
    Connection closed
    2019-03-29 10:46:35.409800
    Connection opened to PostgreSQL 10.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit
    Logged a value, overall count: 456
    Connection closed

    As you can see, the application was able to continue its work in less than 30 seconds. Yes, a certain number of service users will have time to notice problems. However, this is a serious server failure, this does not happen so often. At the same time, the person (administrator) would hardly have managed to react as quickly, unless he was sitting in the console ready with a switching script.


    It seems to me that such a cluster provides a tremendous advantage for administrators. In fact, serious breakdowns and malfunctions of the database servers will not be noticeable for the application and, accordingly, for the user. You won’t have to repair something in a hurry and switch to temporary configurations, servers, etc. And if you use this solution in the form of a ready-made service in the cloud, then you will not need to waste time preparing it. It will be possible to do something more interesting.

    Also popular now: