unwrecker October 18, 2012 at 15:34

Percona XtraDB Cluster. Installation and testing

Tutorial
Recovery mode

Some time ago, I thought about increasing the availability of the servers of my project. Everything was solved relatively easily with the exception of the MySQL database. Master-master replication creates problems with index synchronization, and the cluster NDBcluster solution, although rapidly developing, is still not suitable for migrating a finished project to it due to the large list of differences and limitations.

But there is an alternative solution called Galera Cluster, on which the Percona XtraDB Cluster is based, about which I will tell you about the installation, configuration and testing of which applies to Ubuntu.

Why is NDBcluster better?

First: fewer restrictions (list of NDBcluster restrictions: dev.mysql.com/doc/refman/5.5/en/mysql-cluster-limitations.html , list of XtraDB Cluster restrictions: www.percona.com/doc/percona-xtradb -cluster / limitation.html ).
I only had to redo the few MyISAM tables in InnoDB.
Secondly: multiple MySQL patch engine code from Percona.
Thirdly: the ability to work in a two-single configuration (I’ll immediately make a reservation that this is only suitable for testing).
And finally, MySQL forks are now in trend :)

What is better than a Galera Cluster screwed to a fork of MySQL?

The presence of xtrabackup utility, which allows you to make backups on the fly. This is useful because when you connect a new node to the cluster, you do not have to stop any of the working nodes to drain the database from it.

Installation

Installation under Ubuntu, as usual, is elementary:

gpg --keyserver  hkp://keys.gnupg.net --recv-keys 1C4CBDCDCD2EFD2A
gpg -a --export CD2EFD2A | sudo apt-key add -
sudo apt-get update
sudo apt-get install percona-xtradb-cluster-server-5.5

Sometimes he swears at unsolvable dependencies. Then it is enough to first install the dependencies, and then the server itself.

First, without changing anything in the settings, you need to go to the mysql console (enough on the node that will be launched first) and add the user for backup:

grant RELOAD, LOCK TABLES, REPLICATION CLIENT, FILE on *.* to backup@localhost identified by 'password';

Then you need one small alteration. The fact is that in Ubunt, a symlink with sh leads to dash, and the cluster start script is designed for bash. If there are no contraindications, you can resolve this system-wide:

dpkg-reconfigure dash

Answer no.

Since the databases after starting the cluster will become completely identical, the password of the system user on all nodes will become like the first. Therefore, you need to copy the /etc/mysql/debian.cnf file from the server that will be launched first to the others.

My configs look like this:

[mysqld_safe]
wsrep_urls=gcomm://192.168.1.1:3400,gcomm://192.168.1.2:3400,gcomm://
[mysqld]
port=3306
socket=/var/run/mysqld/mysqld.sock
datadir=/var/lib/mysql
basedir=/usr
user=mysql
log_error=error.log
binlog_format=ROW
wsrep_provider=/usr/lib/libgalera_smm.so
wsrep_sst_receive_address=192.168.1.1:3500
wsrep_node_incoming_address=192.168.1.1
wsrep_slave_threads=2
wsrep_cluster_name=cluster0
wsrep_provider_options="gmcast.listen_addr=tcp://192.168.1.1:3400;"
wsrep_sst_method=xtrabackup
wsrep_sst_auth=backup:password
wsrep_node_name=node0
innodb_locks_unsafe_for_binlog=1
innodb_autoinc_lock_mode=2
innodb_buffer_pool_size=5000M
innodb_log_file_size=256M
innodb_log_buffer_size=4M
[client]
port=3306
socket=/var/run/mysqld/mysqld.sock

Check the location of libgalera_smm.so.
The value of the wsrep_slave_threads parameter is recommended to be set as the number of cores * 4.
In wsrep_sst_auth, the username and password for backups are specified.
innodb_buffer_pool_size, innodb_log_file_size, innodb_log_buffer_size - affect performance, and are selected experimentally. In my case, each node costs 8 gigs of RAM.

To add nodes, add them to the wsrep_urls line (there should be an empty entry at the end of the line).
All ip addresses found in the file (except for the wsrep_urls line) indicate the address of the current node. They need to be changed when distributing this file to other nodes. You also need to change the name of the node in wsrep_node_name.

In my setup, port 3400 is used for synchronization, port 3500 is used to fill the dump, port 3306 (standard) is used to connect the client.

You can run multiple nodes on the same machine, giving them different ports. If you decide to do this, then you need to create several configuration files in / etc / mysql and start the server with the following command, for example:
/usr/bin/mysqld_safe --defaults-file=/etc/mysql/my0.cnf
Note that xtrabackup can only connect using the standard socket /var/run/mysqld/mysqld.sock (ignores the parameters from the config). So in this case you have to not use it: wsrep_sst_method = rsync

Well, in conclusion, restart the daemon:

sudo service mysql restart

If something went wrong, look at /var/lib/mysql/error.log.
Usually you need to delete / var / lib / mysql / ib_logfile * due to the change in the size of the log in the config.

Sometimes it’s easier to erase the whole / var / lib / mysql / (if there is nothing valuable in the databases) and recreate the default databases:

mysql_install_db

Some other possible errors I cited at the end of the article.

Number of nodes

The manufacturer recommends at least three, but there are differences from NDB: initially, all nodes are of the same rank and have the same functionality. Nothing prevents you from making 2 nodes.

With a two-day configuration, everything is simple and sad: it works until the connection between the nodes disappears. In this case, the cluster does not allow anything to be done in order to protect against split-brain. There are a couple of options for the config, in which it will work even if the connection is disconnected, but, of course, the damage from them is more than good - when the connection appears, we will get 2 different copies of the database, which are unclear how to combine.

The option with three nodes in my case did not fit, since I only have 2 servers on colocation, and adding a third one somewhere on the office channel would greatly spoil the performance, because the cluster operation speed is determined by the speed of its slowest node.

But there is a solution: Galera Arbitator. This is an inferior node that does not store databases, and which does not need a fast channel, but for now the node keeps in touch with it, it will continue to work. This node is so defective that it does not even use the config, all parameters are passed to it through the command line.
Of the parameters, only the address of one of the normal nodes and the name of the cluster are needed:

garbd -a gcomm://192.168.1.1:3400 -g cluster0

Tests

I tested the standard sql-bench utility from the test package, first the nodes separately, and then the cluster. The software did not change at the same time - percona-xtradb-cluster.

Install and run the test package:

apt-get install percona-server-test-5.5 libdbd-mysql-perl
cd /usr/share/sql-bench/sql-bench/
perl run-all-tests --server=mysql --user=root --password=

First tested on two identical fast machines with SSD.
I threw out the results of the first run, since they were much better than the subsequent ones (we will write to pure SSDs).

This is 1 node:

alter-table: Total time: 17 wallclock secs ( 0.04 usr  0.00 sys +  0.00 cusr  0.00 csys =  0.04 CPU)
ATIS: Total time:  7 wallclock secs ( 2.48 usr  0.16 sys +  0.00 cusr  0.00 csys =  2.64 CPU)
big-tables: Total time: 10 wallclock secs ( 1.87 usr  0.34 sys +  0.00 cusr  0.00 csys =  2.21 CPU)
connect: Total time: 64 wallclock secs (19.80 usr  5.68 sys +  0.00 cusr  0.00 csys = 25.48 CPU)
create: Total time: 548 wallclock secs ( 3.35 usr  1.66 sys +  0.00 cusr  0.00 csys =  5.01 CPU)
insert: Total time: 531 wallclock secs (155.04 usr 19.15 sys +  0.00 cusr  0.00 csys = 174.19 CPU)
select: Total time: 168 wallclock secs (17.93 usr  1.90 sys +  0.00 cusr  0.00 csys = 19.83 CPU)
transactions: Test skipped because the database doesn't support transactions
wisconsin: Total time:  5 wallclock secs ( 1.31 usr  0.18 sys +  0.00 cusr  0.00 csys =  1.49 CPU)

This is a cluster:

alter-table: Total time: 21 wallclock secs ( 0.04 usr  0.05 sys +  0.00 cusr  0.00 csys =  0.09 CPU)
ATIS: Total time: 21 wallclock secs ( 2.76 usr  0.30 sys +  0.00 cusr  0.00 csys =  3.06 CPU)
big-tables: Total time: 17 wallclock secs ( 1.98 usr  0.40 sys +  0.00 cusr  0.00 csys =  2.38 CPU)
connect: Total time: 67 wallclock secs (21.13 usr  5.59 sys +  0.00 cusr  0.00 csys = 26.72 CPU)
create: Total time: 597 wallclock secs ( 3.55 usr  1.55 sys +  0.00 cusr  0.00 csys =  5.10 CPU)
insert: Total time: 1710 wallclock secs (164.66 usr 35.25 sys +  0.00 cusr  0.00 csys = 199.91 CPU)
select: Total time: 187 wallclock secs (19.49 usr  2.44 sys +  0.00 cusr  0.00 csys = 21.93 CPU)
transactions: Test skipped because the database doesn't support transactions
wisconsin: Total time: 47 wallclock secs ( 1.62 usr  0.88 sys +  0.00 cusr  0.00 csys =  2.50 CPU)

A 3-fold drop in write operations and a serious (if not epic) drop in the Wisconsin test, where, in theory, read and write operations should alternate.
I don’t know why swears on transactions. In fact, they work.

There was an assumption that everything depends on the speed of the network, so I tested it on machines (no longer identical) with hard drives.

The first node is separate:

alter-table: Total time: 55 wallclock secs ( 0.04 usr  0.02 sys +  0.00 cusr  0.00 csys =  0.06 CPU)
ATIS: Total time: 10 wallclock secs ( 2.40 usr  0.14 sys +  0.00 cusr  0.00 csys =  2.54 CPU)
big-tables: Total time:  7 wallclock secs ( 1.23 usr  0.15 sys +  0.00 cusr  0.00 csys =  1.38 CPU)
connect: Total time: 53 wallclock secs (16.31 usr  7.65 sys +  0.00 cusr  0.00 csys = 23.96 CPU)
create: Total time: 3215 wallclock secs ( 2.58 usr  0.83 sys +  0.00 cusr  0.00 csys =  3.41 CPU)
insert: Total time: 541 wallclock secs (142.41 usr 22.53 sys +  0.00 cusr  0.00 csys = 164.94 CPU)
select: Total time: 154 wallclock secs (12.66 usr  1.34 sys +  0.00 cusr  0.00 csys = 14.00 CPU)
transactions: Test skipped because the database doesn't support transactions
wisconsin: Total time:  4 wallclock secs ( 1.15 usr  0.29 sys +  0.00 cusr  0.00 csys =  1.44 CPU)

The second note is separate:

alter-table: Total time: 59 wallclock secs ( 0.03 usr  0.03 sys +  0.00 cusr  0.00 csys =  0.06 CPU)
ATIS: Total time: 11 wallclock secs ( 2.35 usr  0.23 sys +  0.00 cusr  0.00 csys =  2.58 CPU)
big-tables: Total time: 11 wallclock secs ( 1.92 usr  0.30 sys +  0.00 cusr  0.00 csys =  2.22 CPU)
connect: Total time: 64 wallclock secs (19.67 usr  5.84 sys +  0.00 cusr  0.00 csys = 25.51 CPU)
create: Total time: 4592 wallclock secs ( 3.90 usr  1.39 sys +  0.00 cusr  0.00 csys =  5.29 CPU)
insert: Total time: 581 wallclock secs (148.16 usr 19.80 sys +  0.00 cusr  0.00 csys = 167.96 CPU)
select: Total time: 168 wallclock secs (18.45 usr  2.07 sys +  0.00 cusr  0.00 csys = 20.52 CPU)
transactions: Test skipped because the database doesn't support transactions
wisconsin: Total time:  5 wallclock secs ( 1.18 usr  0.25 sys +  0.00 cusr  0.00 csys =  1.43 CPU)

Cluster:

alter-table: Total time: 110 wallclock secs ( 0.04 usr  0.02 sys +  0.00 cusr  0.00 csys =  0.06 CPU)
ATIS: Total time: 496 wallclock secs ( 1.61 usr  0.17 sys +  0.00 cusr  0.00 csys =  1.78 CPU)
big-tables: Total time: 116 wallclock secs ( 1.02 usr  0.16 sys +  0.00 cusr  0.00 csys =  1.18 CPU)
connect: Total time: 34 wallclock secs (10.98 usr  2.49 sys +  0.00 cusr  0.00 csys = 13.47 CPU)
create: Total time: 4638 wallclock secs ( 2.42 usr  0.91 sys +  0.00 cusr  0.00 csys =  3.33 CPU)
insert: Estimated total time: 43470.8 wallclock secs (106.50 usr 15.34 sys +  0.00 cusr  0.00 csys = 121.84 CPU)
select: Total time: 631 wallclock secs (11.02 usr  1.02 sys +  0.00 cusr  0.00 csys = 12.04 CPU)
transactions: Test skipped because the database doesn't support transactions
wisconsin: Total time: 1576 wallclock secs ( 1.37 usr  0.44 sys +  0.00 cusr  0.00 csys =  1.81 CPU)

However, the assumption was not confirmed: the write operation on the hard drive computers in the cluster slowed down as much as 10 times. How to explain this - I do not know.

Regarding the use of this in real production - everyone chooses for himself. To me, such a relative decrease in performance is uncritical due to small absolute numbers.

Possible problems and methods for solving them

WSREP: Process completed with error: wsrep_sst_xtrabackup 'donor' '192.168.1.1:6000/xtrabackup_sst' 'backup:password' '/var/lib/mysql2/' '/etc/mysql/my2.cnf' '9bdd7773-0cb4-11e2-0800-8e876ebc6b70' '0' '0': 22 (Invalid argument)

xtrabackup could not connect to the database. Are you sure you are not trying to make several nodes on the same machine? See details in innobackup.backup.log.

2.

121002 15:19:54 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql0
/usr/bin/mysqld_safe: 172: [: gcomm://: unexpected operator

Instead of bash, another interpreter is used. Either change the first line in mysqld_safe, or edit the symlinks.

3. If MyISAM tables are created, but not populated from a dump, then it should be so: only InnoDB is supported.

4. If all commands write or read data, the MySQL client issues an “Unknown command”: this is protection against split-brain when communication between nodes is broken.

Tags: