Replication from Percona Server for MySQL to PostgreSQL using the pg_chameleon tool

Original author: Avinash Vallarapu
  • Transfer

Replication is one of the well-known functions that allow you to create an identical copy of the database. It is supported in virtually any relational database management system (RDBMS). The ability to replicate provides significant benefits, especially high availability and load sharing. But what if replication is required between two databases (DB) with different structure, such as MySQL and PostgreSQL? Is it possible to continuously replicate changes from a MySQL database to a PostgreSQL database? The answer to this question is the replication tool pg_chameleon .


image


For continuous replication of changes, pg_chameleon uses the MySQL replication library, which allows you to get logical copies of rows from a MySQL database, which are converted to a jsonb object. The pl / pgsql function in Postgres decodes the jsonb object and reproduces the changes in the Postgres database. To configure this type of replication, the binlog_format variable for the MySQL database must be ROW (string).


A few things to know before setting up this tool:


  1. Tables that need to be replicated must have a primary key.
  2. The tool works in PostgreSQL versions above 9.5 and MySQL systems above 5.5
  3. To set up such replication, the binlog_format variable must be set to ROW.
  4. Python version must be higher than 3.3

When replication starts, pg_chameleon retrieves data from MySQL in CSV format with grouping into groups of a certain length to avoid memory overload. This data is reset to Postgres with the COPY command (copy). If the copy fails, the INSERT (paste) command is executed, which can slow down the process. If the INSERT command fails, the string is lost.


To replicate changes from MySQL, pg_chameleon simulates the behavior of a replica (slave) of MySQL. This creates a schema in Postgres, performs initial data loading, connects to the MySQL replication protocol, copies of rows are stored in the Postgres table. At the same time, the corresponding Postgres functions provide decoding of strings and making changes. This is similar to storing transfer logs in Postgres tables and applying them to the Postgres schema. Creating a Postgres database schema using any data description languages ​​is not required. For tables specified during replication setup, the pg_chameleon tool does this automatically. If you need to convert any types in a specific way, you can specify this in the configuration file.


The following is an exercise that can be experimented with. Use the suggested options if it fully meets your requirements. We carried out such tests on CentOS Linux OS version 7.4.


Environment preparation


Configure Percona Server for MySQL


Install MySQL version 5.7 and add the appropriate parameters for replication.


In this exercise, I installed Percona Server for MySQL version 5.7 using the YUM repository.


yum installhttp://www.percona.com/downloads/percona-release/redhat/0.1-6/percona-release-0.1-6.noarch.rpm
yum install Percona-Server-server-57
echo "mysql ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers
usermod -s /bin/bash mysql
sudo su - mysql

Pg_chameleon requires configuration of the following parameters in the my.cnf file (file with MySQL server parameters). You can add the following parameters to the /etc/my.cnf file


binlog_format= ROW
binlog_row_image=FULLlog-bin = mysql-bin
server-id = 1

Now, after including the above parameters in the my.cnf file, start the MySQL server.


$ service mysql start

Obtain a temporary password for the root account from the mysqld.log file and reset the root password using the mysqladmin command.


$ grep "temporary" /var/log/mysqld.log
$ mysqladmin -u root -p password'Secret123!'

Now connect to your own MySQL database instance and create a sample schema / tables. I also created an emp table for validation.


$ wget http://downloads.mysql.com/docs/sakila-db.tar.gz
$ tar -xzf sakila-db.tar.gz
$ mysql -uroot -pSecret123! < sakila-db/sakila-schema.sql
$ mysql -uroot -pSecret123! < sakila-db/sakila-data.sql
$ mysql -uroot -pSecret123! sakila -e "create table emp (id int PRIMARY KEY, first_name varchar(20), last_name varchar(20))"

Create a user to configure replication using the pg_chameleon tool and grant it permissions by following the steps below.


$ mysql -uroot -p
createuser'usr_replica'@'%'identifiedby'Secret123!';
GRANT ALL ON sakila.* TO'usr_replica'@'%';
GRANT RELOAD, REPLICATIONCLIENT, REPLICATIONSLAVEON *.* TO'usr_replica'@'%';
FLUSHPRIVILEGES;

When creating a user on the MySQL server ('usr_replica' @ '%'), you may need to replace the β€œ%” character with the corresponding IP address or host name of the server that pg_chameleon is running on.


Configure PostgreSQL


Install PostgreSQL and run a copy of the database.


To install PostgreSQL version 10.x, follow these steps.


yum install https://yum.postgresql.org/10/redhat/rhel-7.4-x86_64/pgdg-centos10-10-2.noarch.rpm
yum install postgresql10*
su - postgres
$/usr/pgsql-10/bin/initdb
$ /usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/10/data start

As you can see from the following logs, you need to create a user in PostgreSQL, with which pg_chameleon will be able to write the changed data to PostgreSQL. Also create a target database.


postgres=# CREATEUSER usr_replica WITHENCRYPTEDPASSWORD'secret';
CREATEROLE
postgres=# CREATEDATABASE db_replica WITHOWNER usr_replica;
CREATEDATABASE

Steps for installing and configuring replication using pg_chameleon


Step 1. In this exercise, I installed the Python language interpreter version 3.6 and pg_chameleon version 2.0.8 by following these steps. If you already have the necessary version of the Python interpreter installed, you can skip the steps to install it. We can create a virtual environment if the OS does not include Python version 3.x by default.


yum install gcc openssl-devel bzip2-devel wget
cd /usr/src
wget https://www.python.org/ftp/python/3.6.6/Python-3.6.6.tgz
tar xzf Python-3.6.6.tgz
cd Python-3.6.6
./configure --enable-optimizations
make altinstall
python3.6 -m venv venv
source venv/bin/activate
pip install pip --upgrade
pip install pg_chameleon

Step 2. This tool requires a configuration file that stores information about the source and target servers, and a folder for storing logs. To make the pg_chameleon tool create a template for the configuration file and the corresponding folders, use the following command.


$ chameleon set_configuration_files

When executing this command, the following results are displayed. They show that this command has created several folders and files in the place from which you started it.


creating directory /var/lib/pgsql/.pg_chameleon
creating directory /var/lib/pgsql/.pg_chameleon/configuration/
creating directory /var/lib/pgsql/.pg_chameleon/logs/
creating directory /var/lib/pgsql/.pg_chameleon/pid/
copying configuration example in /var/lib/pgsql/.pg_chameleon/configuration//config-example.yml

Copy the sample configuration file to another file, say, default.yml


$ cd .pg_chameleon/configuration/
$ cp config-example.yml default.yml

This is what my default.yml file looks like after including all the necessary parameters. In this file, you can optionally specify a data type conversion, tables that should be skipped during replication, and data manipulation language events that should be ignored for a selected list of tables.


---
#global settings
pid_dir: '~/.pg_chameleon/pid/'
log_dir: '~/.pg_chameleon/logs/'
log_dest: file
log_level: info
log_days_keep: 10
rollbar_key: ''
rollbar_env: ''# type_override allows the user to override the default type conversion into a different one.
type_override:
  "tinyint(1)":
    override_to: boolean
    override_tables:
      - "*"#postgres  destination connection
pg_conn:
  host: "localhost"
  port: "5432"
  user: "usr_replica"
  password: "secret"
  database: "db_replica"
  charset: "utf8"
sources:
  mysql:
    db_conn:
      host: "localhost"
      port: "3306"
      user: "usr_replica"
      password: "Secret123!"
      charset: 'utf8'
      connect_timeout: 10
    schema_mappings:
      sakila: sch_sakila
    limit_tables:
#      - delphis_mediterranea.foo
    skip_tables:
#      - delphis_mediterranea.bar
    grant_select_to:
      - usr_readonly
    lock_timeout: "120s"
    my_server_id: 100
    replica_batch_size: 10000
    replay_max_rows: 10000
    batch_retention: '1 day'
    copy_max_memory: "300M"
    copy_mode: 'file'
    out_dir: /tmp
    sleep_loop: 1
    on_error_replay: continue
    on_error_read: continue
    auto_maintenance: "disabled"
    gtid_enable: No
    type: mysql
    skip_events:
      insert:
#        - delphis_mediterranea.foo #skips inserts on the table delphis_mediterranea.foo
      delete:
#        - delphis_mediterranea #skips deletes on schema delphis_mediterranea
      update:

Step 3. Create a replica (target database) using the command:


$ chameleon create_replica_schema --debug

The above command creates the schema and nine tables in the PostgreSQL database specified in the .pg_chameleon / configuration / default.yml file. These tables are required to manage replication from the source database to the target database. The same can be seen in the next journal.


db_replica=# \dn
List ofschemasName | Owner---------------+-------------public | postgres
sch_chameleon | target_user
(2rows)
db_replica=# \dt sch_chameleon.t_*
List of relations
Schema | Name | Type | Owner---------------+------------------+-------+-------------
sch_chameleon | t_batch_events | table | target_user
sch_chameleon | t_discarded_rows | table | target_user
sch_chameleon | t_error_log | table | target_user
sch_chameleon | t_last_received | table | target_user
sch_chameleon | t_last_replayed | table | target_user
sch_chameleon | t_log_replica | table | target_user
sch_chameleon | t_replica_batch | table | target_user
sch_chameleon | t_replica_tables | table | target_user
sch_chameleon | t_sources | table | target_user
(9rows)

Step 4. Add the source database data to pg_chameleon using the following command. Specify the name of the source database as specified in the configuration file. In this example, the source database name is mysql, and the target is the Postgres database, defined as pg_conn.


$ chameleon add_source --config default --source mysql --debug

After executing this command, you will see that the source database data is added to the t_sources table.


db_replica=# select * from sch_chameleon.t_sources;
-[ RECORD 1 ]-------+----------------------------------------------
i_id_source | 1
t_source | mysql
jsb_schema_mappings | {"sakila": "sch_sakila"}
enm_status | ready
t_binlog_name |
i_binlog_position |
b_consistent | t
b_paused | f
b_maintenance | f
ts_last_maintenance |
enm_source_type | mysql
v_log_table | {t_log_replica_mysql_1,t_log_replica_mysql_2}
$ chameleon show_status --config default
Source id Source name Type Status Consistent Read lag Last read Replay lag Last replay
----------- ------------- ------ -------- ------------ ---------- ----------- ------------ -------------
1 mysql mysql ready Yes N/A N/A

Step 5. Initialize the replica (target database) using the following command. Specify the source database from which changes are replicated to the PostgreSQL database.


$ chameleon init_replica --config default --source mysql --debug

Initialization includes the following tasks on the MySQL server (source).


  1. Clear the table cache and set a read-only lock
  2. Get the coordinates of the source database
  3. Copy the data
  4. Unlock

The above command automatically creates a schema of the target Postgres database.
In the default.yml file, we mentioned the following schema mappings (schema_mappings).


schema_mappings:
sakila: sch_sakila

A new scott scheme is now created in the target database db_replica.


db_replica=# \dn
List ofschemasName | Owner---------------+-------------public | postgres
sch_chameleon | usr_replica
sch_sakila | usr_replica
(3rows)

Step 6. Now start the replication using the following command.


$ chameleon start_replica --config default --source mysql

Step 7: Check replication status and errors using the following commands.


$ chameleon show_status --config default
$ chameleon show_errors

Here is the state of replication:


$ chameleon show_status --source mysql
Source id Source nameType Status Consistent Read lag Last read Replay lag Last replay
----------- ------------- ------ -------- ------------ ---------- ----------- ------------ -------------1 mysql mysql running No N/A N/A
== Schema mappings ==
Origin schema Destination schema--------------- --------------------
sakila sch_sakila
== Replica status ==
--------------------- ---Tablesnot replicated 0Tables replicated 17Alltables17
Last maintenance N/A
Next maintenance N/A
Replayed rows
Replayed DDL
Skipped rows

Now you can see that the changes are constantly replicated from the MySQL database to the PostgreSQL database.


Step 8. To verify, you can insert an entry into the MySQL database table that we created to verify replication in the Postgres database.


$ mysql -u root -pSecret123! -e "INSERT INTO sakila.emp VALUES (1,'avinash','vallarapu')"
mysql: [Warning] Using a passwordon the command line interface can be insecure.
$ psql -d db_replica -c "select * from sch_sakila.emp"
 id | first_name | last_name
----+------------+-----------1 | avinash    | vallarapu
(1row)

The above log shows that the record inserted into the MySQL table was replicated to the table in the Postgres database.


You can also add multiple source databases for replication to the target Postgres database.


Link


From this documentation, you can get information about the many additional features available in pg_chameleon.


Also popular now: