Logical replication between versions of PostgreSQL
- Transfer
There are different approaches to updating PostgreSQL, but some lead to downtime. If you want to avoid downtime, use replication to upgrade - logical or physical (streaming), depending on the scenario. In this article, we will look at the difference between logical and physical replication in PostgreSQL. Then we’ll talk in detail how to update the version using logical replication while avoiding application downtime. The next article will discuss physical replication.
In previous articles, we already talked about methods for updating PostgreSQL ( Upgrading PostgreSQL using pg_dumpall and Upgrading PostgreSQL using pg_dump / pg_restore ) as part of the Upgrading or Migrating Old PostgreSQL Versions series . But both of these methods do not exclude downtime.
Logical Replication Types
Here we discuss 2 types of replication:
- Replication between PostgreSQL 10 and 11 using built-in logical replication.
- Replication between PostgreSQL 9.4 (or prior to PG 11) and PostgreSQL 11 using the pglogical extension .
To minimize downtime, you can upgrade using replication. When all relevant data is replicated to another latest PostgreSQL server, you simply transfer the application to the new server with minimal downtime - although, of course, it all depends on the complexity of the application stack.
Logical replication in PostgreSQL allows users to selectively replicate tables and open a backup server for write operations. Physical replication in PostgreSQL is done in blocks. In this case, each database in the wizard is replicated to the backup server, inaccessible to write operations. Next, we will call physical replication streaming .
When using logical replication on a standby server, you can enable replication from multiple masters. This is useful in situations where you need to replicate data from multiple PostgreSQL (OLTP) databases to a single PostgreSQL server for reporting and data storage.
The main advantage of logical replication over streaming is that with logical replication, you can replicate changes from the old version of PostgreSQL to the new one. Stream replication works only when the master and the backup server have the same major version. Ideally, additional versions should also match.
Replication between PostgreSQL 10 and 11
Starting with PostgreSQL 10, logical replication is available by default. Therefore, you can easily replicate the PostgreSQL 10 database in PostgreSQL 11. Logical replication uses the publish and subscribe model. The node submitting the changes becomes the publisher. And the node subscribing to these changes becomes a subscriber. There can be several subscriptions per publication.
Publication
A publication is an array of changes created by a group of tables. It is called a change set or replication set . Publications can only contain tables, but not other objects. DML in these tables can be replicated, but DDL cannot.
In the publication, you can choose which type of DML to replicate: INSERT, DELETE, UPDATE, or ALL. By default, ALL is selected. The table must have a replica identifier in order to replicate UPDATE and DELETE operations to the subscriber. Replica identifiers help you find rows that are updated or deleted.
The primary key of the table is the default replica identifier. Or you can make the identifier a unique index with NOT NULL values. If you do not have a primary key or a unique index with NO NULL values, set replica_identity to FULL. In this case, Postgres uses the entire string as the key. But this is not very rational.
If a table without a primary key and replica identifier is added to the publication by default after an UPDATE or DELETE operation, errors may occur.
Subscription
A subscriber may subscribe to one or more publications. Before adding a subscription, make sure that the replicated tables are created on the subscriber node. To do this, dump only the schemes from the publisher to the subscriber.
Logical Replication Example
The following example describes logical replication only between PostgreSQL versions 10 and 11.
Create a publication on the publisher site. Add all or only some tables to the publication.
-- For adding ALL Tables in Database
CREATE PUBLICATION percpub FOR ALL TABLES;
-- For adding Selected Tables in Database
CREATE PUBLICATION percpub FOR TABLE scott.employee scott.departments;
On the subscriber site, create a subscription to this publication. Do a DDL dump of the tables to the subscriber before creating the subscription, as mentioned above.
$ pg_dump -h publisher_server_ip -p 5432 -d percona -Fc -s -U postgres | pg_restore -d percona -h subscriber_node_ip -p 5432 -U postgres
CREATE SUBSCRIPTION percsub CONNECTION 'host=publisher_server_ip dbname=percona user=postgres password=secret port=5432' PUBLICATION percpub;
This command also copies existing data in the tables. If you want to disable copying existing data, use the following command and only the changes to the publisher will be copied.
CREATE SUBSCRIPTION percsub CONNECTION 'host=publisher_server_ip dbname=percona user=postgres password=oracle port=5432' PUBLICATION percpub WITH (copy_data = false);
Track replication using the following command in the publisher node:
$ psql
\x
select * from pg_stat_replication;
Replication between PostgreSQL 9.4 and PostgreSQL 11
What to do with versions prior to PostgreSQL 10? For versions 9.4 to 11 there is a special extension - pglogical
. Using pglogical, you can replicate PostgreSQL 9.4 to PostgreSQL 11 in two ways.
The following are general instructions for setting up replication between PG 9.4 and PG 11 using the pglogical extension .
Step 1. Consider pgserver_94 as the source server with the percona_94 database on PostgreSQL 9.4. Create the following extension.
the code
[pgserver_94:] $psql -d percona_94 -c "CREATE EXTENSION pglogical_origin"
CREATE EXTENSION
[pgserver_94:] $psql -d percona_94 -c "CREATE EXTENSION pglogical"
CREATE EXTENSION
Step 2. Now add all or some of the tables to the schema or several schemas for replication. In the following example, you see an error because one of the tables does not have a primary key.
[pgserver_94:] $psql -d percona_94
psql (9.4.21)
Type "help" for help.
percona_94=# SELECT pglogical.create_node(node_name := 'provider1',dsn := 'host=192.168.0.24 port=5432 dbname=percona_94');
create_node
-------------
2976894835
(1 row)
percona_94=# SELECT pglogical.replication_set_add_all_tables('default', ARRAY['public']);
ERROR: table pgbench_history cannot be added to replication set default
DETAIL: table does not have PRIMARY KEY and given replication set is configured to replicate UPDATEs and/or DELETEs
HINT: Add a PRIMARY KEY to the table
percona_94=# ALTER TABLE pgbench_history ADD PRIMARY KEY (tid,aid,delta);
ALTER TABLE
percona_94=# SELECT pglogical.replication_set_add_all_tables('default', ARRAY['public']);
replication_set_add_all_tables
--------------------------------
t
(1 row)
Step 3. On the subscriber node, that is, in the PostgreSQL 11 database, run the following commands.
[pgserver_11:] $psql -d percona_11
psql (11.2)
Type "help" for help.
percona_11=# SELECT pglogical.create_node(node_name := 'subscriber1',dsn := 'host=127.0.0.1 port=5432 dbname=percona_11 password=secret');
create_node
-------------
330520249
(1 row)
percona_11=# SELECT pglogical.create_subscription(subscription_name := 'subscription1',provider_dsn := 'host=192.168.0.24 port=5432 dbname=percona_94 password=secret');
create_subscription
---------------------
1763399739
(1 row)
Step 4. Then check the replication status by sending a request to several tables, which pglogical always updates:
percona_11=# select * from pglogical.local_sync_status;
sync_kind | sync_subid | sync_nspname | sync_relname | sync_status | sync_statuslsn
-----------+------------+--------------+------------------+-------------+----------------
f | 1763399739 | public | pgbench_accounts | r | 0/2EB7D48
f | 1763399739 | public | pgbench_history | r | 0/2EB7D48
f | 1763399739 | public | pgbench_tellers | r | 0/2EB7D48
f | 1763399739 | public | pgbench_branches | r | 0/2EB7D48
d | 1763399739 | | | r | 0/0
(5 rows)
percona_11=# select * from pglogical.subscription;
sub_id | sub_name | sub_origin | sub_target | sub_origin_if | sub_target_if | sub_enabled | sub_slot_name | sub_rep
lication_sets | sub_forward_origins | sub_apply_delay
------------+---------------+------------+------------+---------------+---------------+-------------+----------------------------------------+----------------
-----------------------+---------------------+-----------------
1763399739 | subscription1 | 2976894835 | 330520249 | 2402836775 | 2049915666 | t | pgl_percona_11_provider1_subscription1 | {default,defaul
t_insert_only,ddl_sql} | {all} | 00:00:00
(1 row)
Primary Key Selection
In the second step, you saw how all tables in the public schema were added to the replication set by creating a primary key for a table that did not have one. I may have chosen the wrong primary key for this table, but this is just for demonstration. When choosing a primary key, make sure it is correct. It must be unique and use columns that do not contain NULL values. If you do not find the correct primary key, this can lead to downtime of the application. Here is an example of an error that may occur:
[pgserver_94:] $pgbench -c 10 -T 300 -n percona_94
Client 7 aborted in state 12: ERROR: duplicate key value violates unique constraint "pgbench_history_pkey"
DETAIL: Key (tid, aid, delta)=(7, 63268, 2491) already exists.
Here's how to use pglogical to create replication between the old and new versions of PostgreSQL. After setting up replication, simply switch applications to the latest version so that the downtime is minimal.