Data replication from MySQL to MongoDB

  • Tutorial

There are times when you need to change the system architecture on the go. Perhaps you found a bottleneck in your project or decided that at the current growth rate, difficulties with scaling or fault tolerance may soon arise. Just for such cases there is a Tungsten Replicator.
Tungsten Replicator is a free, open-source, Java-based application that extends MySQL database replication functionality. Tungsten's capabilities are wide, including multi-master replication, parallel replication, heterogeneous data replication between MySQL and Oracle, PostgreSQL, MongoDB. In this article, heterogeneous replication of the MySQL wizard with a subordinate MongoDB server will be considered, CentOS 6.5 will act as the OS.

Dependency Installation

Since Tungsten Replicator is written in java, at least you will need to install OpenJDK 1.6 or Oracle Java 6. Also, some of the scripts are written in ruby, so it will also need to be installed (version => 1.8) if you do not already have it:
# yum install java-1.6.0-openjdk ruby

Install MySQL

If MySQL You already have Server installed, you only need to tweak its configuration, as described below, and get the user to replicate.
If MySQL is not already installed, click here.
Connect the necessary repositories:
# rpm -Uvh download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
# rpm -Uvh rpms.famillecollet.com/enterprise/remi-release-6.rpm

Install MySQL Server 5.5:
# yum --enablerepo=remi install mysql mysql-server

Adjust the MySQL configuration for replication needs:
# vim /etc/my.cnf

…
# Назначаем сервер “Мастером”
server-id = 1
# Куда писать бинарный лог
log_bin = /var/log/mysql/mysql-bin.log
expire_logs_days = 10
# Для реликации с MongoDB обязательно выставлять формат бинарного лога в “ROW”
binlog_format = ROW
# Максимальный размер файла лога
max_binlog_size = 256M
# Частота синхронизации операций из бинарного лога с диском
sync_binlog = 1
...

We activate the automatic start of MySQL at system startup and start the service:
# chkconfig --level 35 mysqld on
# service mysqld start

Run the initial MySQL installation script and set the password for root.
# mysql_secure_installation

Enter current password for root (enter for none):
Change the root password? [Y/n] y
New password: SomeSecretPasswD
Re-enter new password: SomeSecretPasswD
Remove anonymous users? [Y/n] y
Disallow root login remotely? [Y/n] y
Remove test database and access to it? [Y/n] y
Reload privilege tables now? [Y/n] y

For replication, it is desirable to have a separate user:
# mysql -u root -p
# mysql > grant all on *.* to tungsten identified by 'password' with grant option;
Query OK, 0 rows affected (0.01 sec)


Installing MongoDB

If you already have MongoDB, you can proceed to the next step.
If not, Mongo installation instructions are here.
Connect the MongoDB repository:
# vim /etc/yum.repos.d/mongodb.repo

[mongodb]
name=MongoDB Repository
baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/x86_64/
gpgcheck=0
enabled=1

And install the MongoDB Server 2.4 packages:
# yum install mongo-10gen mongo-10gen-server

Activate the automatic launch of MongoDB, when the system boots up and start the service:
# chkconfig --level 35 mongodb on
# service mongodb start

To check the service’s health, look at the list of databases:
# mongo
MongoDB shell version: 2.4.3
connecting to: test
> show dbs
local 0.078125GB
By default, authorization is not required.

Installing the master replicator for MySQL The

current, at the time of writing, version of the replicator is located at this link.
To begin with, we will
# cd /opt
# mkdir replicator
# cd replicator
# mkdir mysql # Здесь мы разместим репликатор для мастер-ноды
# mkdir mongodb # Здесь будет репликатор для слейва

create several directories for our replicator, for example, in / opt: Download the archive with the replicator in the current directory:
# wget tungsten-replicator.googlecode.com/files/tungsten-replicator-2.2.0-292.tar.gz
# tar -xzf tungsten-replicator-2.2.0-292.tar.gz
# mv tungsten-replicator-2.2.0-292/ tungsten-replicator/

Now, prepare the installation script for the master replicator:
# vim master-installer.sh

cd /opt/replicator/tungsten-replicator
./tools/tungsten-installer --master-slave -a \
  --datasource-type=mysql \
  --master-host=127.0.0.1  \
  --datasource-user=tungsten  \
  --datasource-password=password  \
  --datasource-mysql-conf=/etc/my.cnf
  --datasource-log-directory=/var/log/mysql/
  --datasource-port=3306  \
  --service-name=mongodb \
  --home-directory=/opt/replicator/mysql \
  --cluster-hosts=127.0.0.1 \
  --thl-port=10001 \
  --rmi-port=11001 \
  --java-file-encoding=UTF8 \
  --mysql-use-bytes-for-string=false \
  --mysql-enable-enumtostring=true \
  --mysql-enable-settostring=true \
  --svc-extractor-filters=colnames,pkey \
  --svc-parallelization-type=none --start-and-report

In this configuration everything will be replicated, i.e. each base along with all tables.
If you need to replicate strictly defined tables, you can immediately identify the filter by replacing the penultimate line (svc-extractor-filters) in the script with:
  --svc-extractor-filters=replicate \
  "--property=replicator.filter.replicate.do=db1.table1,db2.table2,dbN.tableN" \
Accordingly, instead of db1.table1, etc., indicate your databases and tables that you will need to replicate.
Everything is ready for the wizard, we can run the installation script:
# sh master-installer.sh
INFO >> 127_0_0_1 >> Getting services list
INFO >> 127_0_0_1 >> ...
Processing services command ...
NAME VALUE
- - appliedLastSeqno: 0
appliedLatency: 1.218
role: master
serviceName: mongodb
serviceType: local
started: true
state: ONLINE
Finished services command ...
The wizard is ready, now you can connect one or more slaves to it - MySQL or PostgreSQL for example. But our goal is replication with MongoDB, let's move on to it.

Installing a slave replicator for MongoDB We will

prepare a script for installing a slave replicator that will pump data into MongoDB.
# vim slave-installer.sh
cd /opt/replicator/tungsten-replicator
tools/tungsten-installer --master-slave -a \
  --datasource-type=mongodb \
  --master-host=127.0.0.1  \
  --service-name=mongodb \
  --home-directory=/opt/replicator/mongodb \
  --cluster-hosts=127.0.0.1 \
  --datasource-port=27017 \
  --master-thl-port=10001 \
  --thl-port=10002 \
  --rmi-port=11002 \
  --java-file-encoding=UTF8 \
  --skip-validation-check=InstallerMasterSlaveCheck \
  --svc-parallelization-type=none --start-and-report
Note, I replicate on one host, if you need to replicate to another host, do not forget to specify the correct host address on which you have a master and open the corresponding ports in the firewall (thl-port, rmi-port).
If all conditions are met, you can run the installation script.
# sh slave-installer.sh
WARN >> 127.0.0.1 >> Currently unable to check for the THL schema in mongodb
INFO >> 127_0_0_1 >> Getting services list
INFO >> 127_0_0_1 >> Processing services command ...
NAME VALUE
- - appliedLastSeqno: 0
appliedLatency: 0.5
role: slave
serviceName: mongodb
serviceType: local
started: true
state: ONLINE
Finished services command ...
If you see the status “ONLINE” in the “state” line, then the installation was successful and you can try to check if replication is working.

Verifying Replication

Before you begin verifying, you need to understand how you can transfer data from a relational database to a non-relational database such as MongoDB. Indeed, unlike traditional DBMSs such as MySQL, Oracle, and others that store data in tables, Mongo stores data in a simplified form - “key-value”. Therefore, data from MySQL tables must be translated into MongoDB “collections”. In this case, only “insert”, “update” and “delete” tables can be replicated, the rest of the operations (CREATE / DROP / ALTER) will be ignored by the replicator.
Let's check how the inserts will be replicated. To do this, we make a test circuit and a table:
mysql> create schema testdb;
mysql> use testdb;
mysql> create table testrepl (id int not null primary key, name char(20), date date);
Query OK, 0 rows affected (0.00 sec)
Let's make two arbitrary insertions: Now let's see what the replicator generated on the slave side: Check how the updates are replicated: Result in Mongo: The replicator also processes the deletion: The record also deleted on the receiver: We examined several examples showing how replication works. To finally make sure that the replicator is doing its job, you can try to fill in some large dump, and then count the number of tables and collections in MySQL and MongoDB, respectively. You can check the replication status, restart the replicator or stop replication using the trepctl command from the directory of the corresponding service: For the slave (MongoDB): And for the master (MySQL):
mysql> insert into testrepl values (1, 'Vasya', '1965-01-01');
Query OK, 1 row affected (0.00 sec)
mysql> insert into testrepl values (2, 'Petya', '1991-02-02');
Query OK, 1 row affected (0.00 sec)


> show dbs
local 0.078125GB
testdb 0.203125GB
tungsten_mongodb 0.203125GB
> use testdb
switched to db testdb
> show collections
testrepl
system.indexes
> db.testdb.find()
{ "_id" : “01”, "id" : "1", "name" : "Vasya", "date" : "1965-01-01" }
{ "_id" : “02”, "id" : "2", "name" : "Petya", "date" : "1991-02-02" }


mysql> update testrepl set name = 'Vasya P' where id =1;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0


> db.testrepl.find()
{ "_id" : “01”, "id" : "1", "name" : "Vasya P", "d" : "1965-01-01" }
{ "_id" : “02”, "id" : "2", "name" : "Petya", "date" : "1991-02-02" }


mysql> delete from testrepl where id =2;
Query OK, 1 row affected (0.00 sec)

> db.myfirst.find()
{ "_id" : “01”, "id" : "1", "name" : "Vasya P", "d" : "1965-01-01" }




# cd /opt/replicator/

# ./mongodb/tungsten/tungsten-replicator/bin/trepctl status

# ./mysql/tungsten/tungsten-replicator/bin/trepctl status

For details on how to manage the replication service, the “help” option will tell you.

What for?

The logical question is, why might you need replication with MongoDB or some other DBMS? There may be several scenarios. For example, a developer’s desire to experience the benefits of a non-relational DBMS. Or maybe use the data to create a new application designed to use MongoDB.

Tungsten Replicator Application Options


In conclusion, it is worth noting that Tungsten Replicator may turn out to be for you the same useful tool for various MySQL replication options as it has become for us . Its functionality is wide and the description may take more than one article. However, on the official website there is a fairly detailed documentation. Someone might be interested in the paid enterprise version from Continuent. In the enterprise edition, in addition to the benefits of the community version, it is announced: automatic recovery, fault tolerance, upgrade without downtime and other goodies, but for a lot of money.

Tungsten Replicator Documentation:
Heterogeneous Replication
MySQL to MongoDB Replication
Administration
Useful articles:
Installing and Administering Tungsten Replicator
Getting started with replication from MySQL to MongoDB

MongoDB Manual

Also popular now: