Extreme testing of streaming replication PostgreSQL 9.1
The problem arose of introducing streaming replication of Postgresql 9.1 on production. And since had no business with her before, they decided to conduct her “extreme testing”.
To do this, two VMWare Player virtual machines with Ubuntu Server were launched. Installed and configured streaming replication on Postgresql 9.1.
A script was launched from the host machine that added a lot of rows to the test table of the replicated database in a loop. The script was run in 3 threads.
On virtual machines, a load was created both on the processor (an infinite loop in bash) and by reading data from the test table. The uneven load was also checked - it was either disconnected on the slave, then on the master.
The result was a very nice picture. Although they were evaluated "by eye", but no synchronization was noticed for more than 1-2 seconds. And then, such a desynchronization occurred only at a very high load (several parallel load processes) and for a short time.
To speed up update and insert operations, the previous version of Postgresql was used with the fsync = off option. The behavior of replication was tested with fsync disabled and hard shutdown of the virtual machine (simulation of power failure).
The results were quite predictable. After starting and restoring the database, it turned out that the current point in the binary log on master is earlier than the processed point on the slave. Those. the database is completely out of sync and server synchronization is necessary from scratch (according to the usual procedure for creating a slave database - with copying the entire data directory).
After synchronizing the servers, we tested the replication behavior when fsync is enabled.
When the salve was turned off “hard”, nothing special happened. He rose and “caught up” in the magazine with master.
A more interesting situation is when you turn off master. When it was disconnected accurately, it turned off, waited, turned on, checked, everything went fine. The base rose, recovered, and turned out to be synchronized with the slave.
Next, a simulation of "blinking electricity" in the data center was performed. The master server was “hard-reset” several times at arbitrary points in time. Typically, these moments fell at the time the Postgresql server started or a little later. As a result, the base on the master rose normally, but there were some minor problems with slave. In the slave log, messages began to pop up about the wrong size of some transaction record and it did not synchronize with master. However, when restarting Postgresql on slave, everything worked fine again.
The dependence of the insertion speed on the fsync value was tested in several cycles. According to the results, it turned out that with fsync = off the speed is higher by about 3.3%. Is it worth it to accelerate the loss of the base when the power is turned off - it is for everyone to decide. I personally intend to use Postgresql exclusively with fsync = on.
Switching works well. But there is one “but.” With this switch, slave becomes a completely independent master. Those. it is impossible to return it back to slave state without full synchronization with master. This is a consequence of the fact that a new time line is created on it - timeline. Also, as a result of this, if you use several slaves, all the others are disconnected from the new master. And to connect them, their full synchronization is required.
As a result of these tests, I personally had the confidence that streming replication in Postgresql 9.1 was fully functional and reliable.
To do this, two VMWare Player virtual machines with Ubuntu Server were launched. Installed and configured streaming replication on Postgresql 9.1.
Sync speed
A script was launched from the host machine that added a lot of rows to the test table of the replicated database in a loop. The script was run in 3 threads.
On virtual machines, a load was created both on the processor (an infinite loop in bash) and by reading data from the test table. The uneven load was also checked - it was either disconnected on the slave, then on the master.
The result was a very nice picture. Although they were evaluated "by eye", but no synchronization was noticed for more than 1-2 seconds. And then, such a desynchronization occurred only at a very high load (several parallel load processes) and for a short time.
Power outage stability
fsync = off
To speed up update and insert operations, the previous version of Postgresql was used with the fsync = off option. The behavior of replication was tested with fsync disabled and hard shutdown of the virtual machine (simulation of power failure).
The results were quite predictable. After starting and restoring the database, it turned out that the current point in the binary log on master is earlier than the processed point on the slave. Those. the database is completely out of sync and server synchronization is necessary from scratch (according to the usual procedure for creating a slave database - with copying the entire data directory).
fsync = on
After synchronizing the servers, we tested the replication behavior when fsync is enabled.
When the salve was turned off “hard”, nothing special happened. He rose and “caught up” in the magazine with master.
A more interesting situation is when you turn off master. When it was disconnected accurately, it turned off, waited, turned on, checked, everything went fine. The base rose, recovered, and turned out to be synchronized with the slave.
Next, a simulation of "blinking electricity" in the data center was performed. The master server was “hard-reset” several times at arbitrary points in time. Typically, these moments fell at the time the Postgresql server started or a little later. As a result, the base on the master rose normally, but there were some minor problems with slave. In the slave log, messages began to pop up about the wrong size of some transaction record and it did not synchronize with master. However, when restarting Postgresql on slave, everything worked fine again.
Insertion speed depending on fsync value
The dependence of the insertion speed on the fsync value was tested in several cycles. According to the results, it turned out that with fsync = off the speed is higher by about 3.3%. Is it worth it to accelerate the loss of the base when the power is turned off - it is for everyone to decide. I personally intend to use Postgresql exclusively with fsync = on.
Switch slave to master mode using a file trigger
Switching works well. But there is one “but.” With this switch, slave becomes a completely independent master. Those. it is impossible to return it back to slave state without full synchronization with master. This is a consequence of the fact that a new time line is created on it - timeline. Also, as a result of this, if you use several slaves, all the others are disconnected from the new master. And to connect them, their full synchronization is required.
Summary
As a result of these tests, I personally had the confidence that streming replication in Postgresql 9.1 was fully functional and reliable.