Half-HA cluster PostgreSQL on Windows 2012

Thanks


Thank you, Habr, for what you are and for those wonderful hours that I spent in you! Thank you, brave habropisis, for the articles of high quality , subtle humor and wide outlook. Only thanks to you and Mary Ivanovna (who taught me how to read in the first grade) I achieved serious success in IT, and now I want to share a story when out-of-the-box thinking helped solve the idiomatic problem from the customer.

Introduction


Some time ago I was working on a proprietary product. This software stored its data in an external DBMS. Initially, the work was carried out with MS SQL, but later, especially for users “I only install * nix-like systems” they made compatibility with PostgreSQL . Then they took me to solve the problems of the class "the database has fallen - you need to restore it from transaction logs."

One fine day, the team went to implement another project. It was initially agreed that the facility already has two Windows 2012 servers on which a fail-safe MS SQL cluster will be deployed. However, at the last moment, the customer did not have enough money for the instances, and I was instructed to put the growth databases, as well as somehow solve the problem of fault tolerance. Thanks to the community of postgres developers who compiled the installer for Windows. Tasks:

  1. Install PostgreSQL DBMS on Windows;
  2. Provide data replication between databases;
  3. Solve the issue of fault tolerance.

1. PostgreSQL installation


Everything is simple here:

  • We download the distribution kit for Windows - version 9.4 for x64 is suitable for us;
  • We launch the installer, select the Russian locale, install it into the default directory and play the enikeyschik by clicking the "Next" button.

Note: Stack bilder at the end you can not run - we do not need its elements.

2. Data replication


By default, we will be set to Slony-I, however, asynchronous replication is already in the “box”, which we will use. Sometimes access to the linux command line and configuration files were refused by Ibniks, and I had to configure the DBMS via psql, so mad skillz will go on. Here and in the future, all utilities are taken from the C: \ Program Files \ PostgreSQL \ 9.4 \ bin \ folder (it would be correct to specify the $ PGDATA variable, but for some reason it was not set by the installer by default). On the wizard:

Connect to the DBMS command:

psql.exe -U postgres

Turn on data duplication mode:

alter system set wal_level=hot_standby;

Disable archiving mode:

alter system set archive_mode=off;

We allow to create replication slots:

alter system set max_replication_slots=1;

We set the maximum number of data replication processes (the second process will serve the operation of the pgadmin utility ):

alter system set max_wal_senders=2;

We execute the command "\!" and ... haha! We find ourselves at the Windows command prompt. We edit the pg_hba.conf file (hereinafter, we mean the files from the C: \ Program Files \ PostgreSQL \ 9.4 \ data folder ). He is responsible for the access of hosts \ users to the DBMS. Add lines:

# В дальнейшем этот файл перекачует на слейв
host    all             all                         md5 
# Разрешаем всем пользователям с указанного хоста подключаться к любым БД 
host    all             all                         md5
host    replication     postgres                         trust
# Разрешаем пользователю postgres делать репликацию с указанного хоста без пароля
host    replication     postgres                         trust

On the slave:

1. Stop the postgresql service.

2. Delete the contents of the folder C: \ Program Files \ PostgreSQL \ 9.4 \ data

3. Make a backup from the wizard using the pg_basebackup command . As parameters we use:

pg_basebackup -h  -D "C:\Program Files\PostgreSQL\9.4\data" -U postgres

At the end of the execution, the utility is outraged that WAL archiving is not configured, but we basically do not need it because of the next stage (as a bonus - saving disk space).

4. Go to the postgresql.conf file and set the parameters:

# Флаг "горячей замены"
hot_standby=on
# Включаем отлуп от слейва
hot_standby_feedback=on
# Отключаем интервал ожидания WAL-архива
wal_receiver_status_interval=0

We return to the master:

1. Assign the host as the master:

alter system set hot_standby=off;

2. Disable waiting for response from the slave:

alter system set hot_standby_feedback=off;

3. Disable the interval in the transfer of the WAL log:

alter system set wal_receiver_status_interval=0;

4. Restart the DBMS service on the wizard (you can without leaving psql):

\! pg_ctl restart -D "C:\Program Files\PostgreSQL\9.4\data"

5. Create a replication slot:

select pg_create_physical_replication_slot('slot_1');

3. Fault tolerance


Slony-I was not part of the project specification, and it was lazy to invent PowerShell scripts for it. The customer completely refused to install an additional linux server, so the option with pgpool-II or pgbouncer was no longer available (and it is not clear how they work with Windows). Therefore, there was a transition to the recording mode by the trigger file. Configure the slave to enter the wizard mode. To do this, create a recovery.conf file and add lines to it:

standby_mode=on # Активируем режим ожидания
primary_conninfo='host= port=5432 user=postgres' # Настройки подключения к мастеру
primary_slot_name=slot_1 # Имя слота на мастере
trigger_file=startmaster # Имя файла, создание которого приведёт к запуску слева в режим записи

Note: if the slave enters wizard mode, the file will change its name to recovery.done .

We start the postgres service on the slave. In case of successful setup, it will start correctly and will pull data from the wizard. You can check replication:

- On the wizard using the command:

select (active) from pg_replication_slots;

- On the slave, you can try to create some object and get a read-only transaction error. For the client, you can create a test table on the wizard, which is automatically duplicated on the slave.

If the wizard fails, the potential system administrator must perform the following sequence of actions:

a) Make sure that the recording wizard is not available;
b) Verify that the IP address of the DBMS cluster is free. For example, ping ip or knock out the network interface of the wizard;
c) Change the ip slave to the master;
d) Indicate the slave to work as a master. To do this, you can use pg_ctl promote (a service privilege elevation error fell out on the object) or, since we configured the reconfiguration trigger file, create an empty startmaster file .

Additionally (something that was not within the scope of the task): you can make a slave from the fallen master by completing the slave setup items from the second stage.

My boss joked: " You know Sharp - write! " Two days of trying in the field on my knee - I took it and washed it down . About the basics of OOP and thread safety, please write all comments here ).

The program uses the Npgsql library to work with the DBMS. Unpack the archive on the slave and run the program for execution.


Next, enter:

- ip slave;
- ip masters;
- cluster ip (the main ip should in principle coincide with the ip-master, but anything happens);
- the name of the database to which the connection will be made (a testofcluster test table is created in it )

To perform a cluster check, click the "Test" button. The program will verify that it is running on the slave, the wizard allows you to record to the test table, and replication occurs as usual. As a bonus , the recovery.conf, postgresql.conf, postgresql.auto.conf files will be saved on the slave, which will simplify the procedure of transferring the wizard to slave mode.

Start monitoring: the software cyclically checks the availability of the wizard and if it fails, it puts the left into recording mode and changes the ip on the left to the cluster one.

Specification:
- Must have installed dotnet 4.5
- Compatibility checked on Windows 7 x64, Windows 2012 with PostgreSQL 9.4 (x64).
- The reaction time to turn off the wizard is from 20 to 35 seconds (a phased check of the availability of the wizard and the main ip is performed).
- The transfer time on the left to the master is less than 5 seconds (service restart is not required).
- Note: on virtual machines (on VMware Workstation - 100%) the IP address does not change.
- Windows should be Russified; PostgreSQL should have Russian locale selected.

Download: You can download the archive with the compiled program and two libraries from here .

Also popular now: