Backing up and restoring a Graylog server

  • Tutorial
Greetings, habrayuzery!
It was evening, there was nothing to do, and then I remembered - I wanted to share my recent combat experience with the community.
I had a task - to automate the backup procedure and create a recovery procedure for the Graylog server.


The server was unknown to me, I had not encountered before.
Well, sat and read, but thought - nothing complicated. However, searches on Google showed that not every day such a task appears, because there was practically no information.
“Where ours didn’t disappear?” I thought, everything should be extremely simple, copy the configuration files and voila.
I will make a small digression in order to describe the Graylog server and its components.

What is a Graylog server?



Graylog2 is an open-source system for collecting and analyzing statistics that allows you to process data quite flexibly. As an agent - it uses syslog . Data is sent from nodes via syslog and aggregated by a Graylog server. MongoDB
is used as a database for storing content and settings . Well, the most cumbersome part of the server is ElasticSearch , a powerful tool for indexing and searching data.


Backup process



The task began to take shape. It was necessary to copy the contents of MongoDB and ElasticSearch indexes , as well as the configuration files of each part of Graylog .
Having stopped the graylog-server and elasticsearch services beforehand , I started the backup.
/etc/init.d/graylog-server stop
/etc/init.d/elasticsearch stop
/etc/init.d/chef-client stop


In my case, in MongoDB we had a base called graylog2 . In order to get a copy of it, I created a dump database with the following command:
logger -s -i "Dumping MongoDB"
mkdir -p path-to-backup
mongodump -h 127.0.0.1 -d graylog2 -o path-to-backup/

Thus, in the path-to-backup directory, a dump of the “graylog2” database located on localhost is created (you can also specify a remote node).

The next step is backing up and compressing the ElasticSearch indexes. In our case, over 7 months of work, about 12 GB of indexes were collected. By default, their compression was not configured, which could reduce the cost of storage space at times.
The directory storing the indexes, in our case, was on a mounted partition. The path.data parameter in /etc/elasticsearch/elasticsearch.yml is responsible for indicating where the indexes are stored . Also, an important parameter (it won’t work without it, in any way) is the cluster name specified in the same configuration file by the cluster.name parameter .
To backup indexes, I used the following command, which compressed and packed the contents of the index directory:
logger -s -i "Dumping MongoDB"
tar -zcf path-to-backup/elasticsearch.tar.gz --directory=path-to-indices graylog2

As a result, from 12 GB of source information, an archive of 1.8 GB was obtained. Well, not bad already ...

Next, it remained to copy the configuration files Graylog, MongoDB and ElasticSearch. It is worth noting that the ElasticSeach configuration file - elasticsearch.yml - also contained the node.name parameter , which is the hostname of our server. This is important if the restoration of the Graylog server will occur on a node with a different hostname . Similarly, the contents of the Graylog configuration file - graylog2.conf - contained settings for our specific MongoDB database, with the user and password used in it for access.
I mention all this to that thoughtless copying of configuration files will not lead to good, and these are “not our methods, Shurik” (c)

After all the configuration files have been packed and copied, it remains to transfer these files to the backup server. Here, in fact, everyone is free to do as he wants and as infrastructure requires.

In my case, copying was done using scp using an authentication key:

logger -s -i "Copying backups to Backup server"
scp -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -r -i /root/.ssh/id_rsa path-to-backup backup-user@backup-server: 

logger -s -i “Copying backups to Backup server: DONE” To

summarize the backup process, I would like to highlight the steps that need to be taken:
  • Stopping Graylog and ElasticSearch Services
  • Creating a dump (copy) of the MongoDB database
  • Copying and Archiving the ElasticSearch Index Directory
  • Copy configuration files


Graylog server recovery process



Not surprisingly, the recovery process is a mirror image of the backup process.
Below I will give a small bash script that restores the Graylog server:

/etc/init.d/graylog-server stop
/etc/init.d/elasticsearch stop
scp -r user@backup-server/graylog-backup/* ./
tar zxf graylog2-mongodump.tar.gz
tar zxf elasticsearch.tar.gz
mongorestore -d graylog2 ./graylog2
mv ./elasticsearch/* /opt/elasticsearch/data/
mv ./graylog2.conf /etc/
mv ./elasticsearch.yaml /etc/elasticsearch/elasticsearch.yml
/etc/init.d/graylog-server start
/etc/init.d/elasticsearch start


The script copies the archives from backup-server, unpacks them, then restores the graylog2 database in MongoDB and the ElasticSearch indexes are moved to the default directory. The ElasticSearch and Graylog server configuration files are also copied. After that, the ElasticSearch service and Graylog-server are launched.

In order to verify the integrity of the recovery, you can do the following:
  • go to the web-interface of the server and make sure that all Messages, Hosts, Streams and parameters are present in identical condition
  • compare the result of the curl request curl -XGET " localhost : 9200 / graylog2_0 / _mapping


The process is simple, tested on multiple instances. However, little documented. It is also worth noting that with the release of ElasticSearch v.1 - it is simplified by the introduction of the procedure for obtaining "casts" of indices, but this does not change the essence.
I hope this article helps someone. Thanks for attention.

PS Special thanks to my colleague Siah , who made this script beautiful and amenable to automation. Well, I'm a lazy topic starter :)

Also popular now: