MongoDB Replication on Amazon EC2

system.indexes


  • Foreword
  • Amazon EC2 setup
  • Install MongoDB
  • Replication setup
  • What to read

local.abstract


In this article I will talk about how to organize MongoDB replication based on Amazon EC2 as painlessly as possible. Undoubtedly, there is excellent documentation on how to work with Amazon EC2, and how to configure MongoDB in general and replication in particular. But, as you know, the devil lives in small things. And in this article I will highlight those "little things" that most of all pester me.

{step: 1, title: "Amazon EC2 configure", devilCount: 2}


Let's start from the beginning - with setting up instances.

The first step is to create two privacy groups: for web instances and for database instances.
For web-instances we will open access for SSH, HTTP and HTTPS:


For db-instances we will open all the same access via SSH, plus access to port 27017 for web and db privacy groups:


Now we can run the instances themselves: one small instance for the web application, two large and one micro instance for the database. I chose Ubuntu Server 12 as the Amazon Machine Image (AMI). Important: For MongoDB replication to work, an odd number of instances is imperative. To this end, we will use the 3rd - micro-instance - as an arbiter. About what is an arbitrator, I will tell below. Of course, we could just run 5, 7, or 2n + 1 large instances. But with this example, I want to show a good version of how you can minimize the costs of Amazon EC2, and once again focus on the fact that there should be an odd number of instances in replication.

There is another unobvious, but quite significant nuance - the dynamism of IP-addresses for instances. Accordingly, tying them up when setting up replication is not the most ideal option. It is better to use the aliases that are configured in the / etc / hosts file for these purposes. On each instance, let's bring the hosts file to something like this:
127.0.0.1 db1 localhost
10.40.120.30 db1
10.40.120.31 db2
10.40.120.32 db3

Now we have instances ready for further use.

{step: 2, title: "MongoDB install", devilCount: 1}


Proceed to install MongoDB. The installation process is well described in the official MongoDB manual, so we follow its instructions clearly. Create the mongo_install.bash file and write the following script into it:
apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10
echo "deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen" | tee -a /etc/apt/sources.list.d/10gen.list
apt-get -y update
apt-get -y install mongodb-10gen

We execute our script:
sudo bash ./mongo_install.bash

If everything went well, then we will see the PID of the launched MongoDB:
mongodb start/running, process 2368

Now it remains to start the mongod process:
sudo service mongodb start

Finally, a little trick: in order not to go through such a painful and monotonous way of installing software on each instance, you can use the functionality of Amazon EC2 Images.

{step: 3, title: "Replication", devilCount: 2}


So we come to the main point - replication setup. First, we define the replSet parameter in the configuration files (/etc/mongodb.conf) on all db instances. This parameter must contain the name of the replication:
replSet = myproject

After that, restart the service:
sudo service mongodb restart

Next, connect to Monge team
mongo

We initiate a replica:
rs.initiate()

Add a second instance to our replication:
rs.add("db2:27017")

The third instance, and this is important, we add as an arbiter:
rs.addArb("db3:27017")

The arbiter does not keep his copy of the database. He is not involved in writing or reading data. Designed exclusively for voting for Primary. This is due to the fact that we can run the arbiter on the minimum hardware.

Let's see the current status of the replica:
mydb:PRIMARY> rs.status()
{
        "set" : "myproject",
        "date" : ISODate("2013-02-04T12:17:42Z"),
        "myState" : 1,
        "members" : [
                {
                        "_id" : 0,
                        "name" : "db1:27017",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 1139012,
                        "optime" : Timestamp(1359738450000, 12),
                        "optimeDate" : ISODate("2013-02-01T17:07:30Z"),
                        "self" : true
                },
                {
                        "_id" : 1,
                        "name" : "db2:27017",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 1138953,
                        "optime" : Timestamp(1359738450000, 12),
                        "optimeDate" : ISODate("2013-02-01T17:07:30Z"),
                        "lastHeartbeat" : ISODate("2013-02-04T12:17:42Z"),
                        "pingMs" : 0
                },
                {
                        "_id" : 2,
                        "name" : "db3:27017",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 442498,
                        "optime" : Timestamp(1359738450000, 12),
                        "optimeDate" : ISODate("2013-02-01T17:07:30Z"),
                        "lastHeartbeat" : ISODate("2013-02-04T12:17:40Z"),
                        "pingMs" : 0
                }
        ],
        "ok" : 1
}

Find out which field is responsible for what can be found here: docs.mongodb.org/manual/reference/replica-status/#fields

Check the config:
mydb:PRIMARY> rs.config()
{
        "_id" : "myproject",
        "version" : 12,
        "members" : [
                {
                        "_id" : 0,
                        "host" : "db1:27017"
                },
                {
                        "_id" : 1,
                        "host" : "db2:27017"
                },
                {
                        "_id" : 2,
                        "host" : "db3:27017",
                        "arbiterOnly" : true
                }
        ]
}

We see that everything looks exactly as we intended. Hurrah!

And for dessert, one more thing. In the replica settings, each participant, among others, has a priority property. By default, it is 1 and, as the default value, is not displayed in the config. This value affects the likelihood that a member will be elected Primary. Let's make db1 guaranteed by Primary (well, for example, it has more memory):
config = rs.config()
config.members[0].priority = 2
rs.reconfig(config)

Now the config will look like this:
mydb:PRIMARY> rs.config()
{
        "_id" : "myproject",
        "version" : 12,
        "members" : [
                {
                        "_id" : 0,
                        "host" : "db1:27017",
		"priority" : 2
                },
                {
                        "_id" : 1,
                        "host" : "db2:27017"
                },
                {
                        "_id" : 2,
                        "host" : "db3:27017",
                        "arbiterOnly" : true
                }
        ]
}

Very important note from StamPit :
Problem: The oplog size is not specified. For 64bit systems, by default it is 5% of the available disk space, but not less than 1Gb. If the disk is large and the average activity of insert / update is to limit the size in the config, 2000Mb is enough:
oplogSize = 2000

If there is not so much data, and the number of insert / update is not too large, you can slightly reduce disk activity as follows:
1) Disable preallocate noprealloc = true
2) Reduce file size (both data and log files will decrease) smallfiles = true

Perhaps that's all you need to set up Monga replication based on Amazon EC2. If necessary, you can easily add new instances to this configuration.

local.links


  1. Amazon EC2 aws.amazon.com/documentation/ec2
  2. MongoDb Installation docs.mongodb.org/manual/tutorial/install-mongodb-on-ubuntu
  3. MongoDB docs.mongodb.org/manual/tutorial/getting-started
  4. MongoDb Replication docs.mongodb.org/manual/replication
  5. MongoDb Replica Set Arbiters docs.mongodb.org/manual/administration/replica-sets/#replica-set-arbiters
  6. MongoDb Replica Set Configuration docs.mongodb.org/manual/reference/replica-configuration

Also popular now: