AWS: Good, Bad, Angry

Original author: Laurie Voss (seldo)
  • Transfer
Here at , we used Amazon AWS for hosting right from the start. Over the past three years, we have studied what is good and what is not very well and formulated for ourselves our own set of rules for launching a highly accessible, highly productive system, which in some cases differs from what Amazon recommends.

We are going to talk about the following related concepts:
  1. For people who have heard about Amazon, but have not yet had the opportunity to use it, we will show all the advantages and disadvantages of this service that we encountered in our work.
  2. For those who already use AWS, we will clarify some details and talk about best practices for using Amazon for high-performance services such as ours, where continuous operation of the system is the highest criterion.

It is no exaggeration to say that Amazon has radically changed the economic aspect of launching IT start-ups , this has been slow and gradual, but now it is a fact. No one realizes how many companies use Amazon EC2 anywhere in their infrastructure until they crash and give the impression that half of the Internet has stopped working. This does not mean that Amazon is just lucky, in fact they have a very good product. Everyone uses this service because it greatly simplified the launch of applications and services, significantly reducing the amount of knowledge, steps that need to be taken and the money that is needed to start a startup.

EC2 is a new way to run software.
The first and most important thing you need to know about EC2 is thatit is not just shared hosting . It’s better to think of it as hiring a part-time system and network administrator. Instead of hiring a highly paid employee who will do the full amount of work and automate everything for you, you pay a little more for each server, but get rid of a number of problems. Power supply, network topology, cost of hardware, incompatibility of equipment from different manufacturers, network data storages - I had to think about all these things back in 2004 (or throw this idea out of my head). With AWS, its competitors, whose number is growing rapidly, you no longer need to think about such things until you want something more.

The main difference and advantage of using EC2 is flexibility. We can start a new server quickly, very quickly, it will take about 5 minutes, from the moment the thought “I need new equipment” appears until the moment you can log into the system for the first time. This gives us the opportunity to do things that a few years ago seemed impossible, for example:

  • we can install the latest hardware updates. When we have a big update, we launch a new server, install all the necessary software on it, install all the dependencies, transfer the configuration files, and then simply include this server in our load balancer, if everything is in order, we simply remove the old servers from the balancer and turn them off, if something went wrong, we can easily switch the balancer back. You can keep running duplicates of old and new servers in the quantity we need, the time we need and then turn off the ones that are not needed, without having to buy new equipment.
  • for some non-critical systems where one hour of downtime is acceptable, we used the following algorithm: the server was monitored and in case of problems, we simply raised the new host manually
  • we can expand our infrastructure at the time of increasing load, instead of having to install new equipment in advance. In the case of increasing load, we launch new capacities that we need just at a certain point in time in order to cope with the current load
  • we do not have to worry about pre-calculating the capacities we may need. When we need we launch a new server, if it does not cope with the load, we launch a more powerful one, or vice versa, we can start a weaker server if its capacity is sufficient. This is one of the best features of AWS, which is provided at the hardware level, and this is only possible because the provision of new servers and the removal of old ones occurs almost instantly.

EC2 is financially beneficial for startups.
The most obvious economic benefit is that we can literally start at zero cost.. You use the same Amazon account that you use to buy a variety of unnecessary things over the Internet, press a button and start playing with your servers for an hour. You pay only for those servers that are running and only for those disk drives that are used, so that the cost of starting is minimal for you. This makes it possible to conduct experiments with equipment: start up to 10 times more power than you need, run load tests and then turn everything off until we really need such power. This is not just convenience, it is a revolutionary breakthrough, along with other AWS advantages, this quantitative feature becomes qualitative .

As I mentioned, AWS dramatically reduces operating costs. Until 2012, more than two years from the moment we launched our company, we did not have a dedicated system administrator. This was a bad trend, we had to hire at least one person in 2011 or earlier. Now we have only one system administrator, who works on a full-time basis and manages our entire infrastructure, which consists of hundreds of servers. This is a fairly high ratio of the number of people to the number of machines serviced. The effect is enhanced by the fact that we don’t have to worry about the network, power supply and much more, and as soon as you get used to it, you begin to underestimate it .

Of course, this is not just hosting, it costs more than conventional hosting. But Amazon is trying to remove this shortcoming and periodically reduces prices: by 18% in October, by 10% in March, and this only this year. Also, to save money, you can use spare servers that run, if available, and are cheaper, instead of those that run on demand. Also, with long-term use, you can pay in advance and use the reserved machines, so you can save up to 50%. We at are obsessed with reliability and use excess hardware, so redundancy was a big win for us.

EC2 has a number of problems.
At this point, the letter of praise ends and it is time to redefine Amazon. While we love EC2 and cannot imagine life without it, it is important to be honest and understand that this path is not cloudless and not lined with rose petals. EC2 has serious performance and reliability constraints that need to beware and considered in your plans.

First of all, this is the declared independence of the infrastructure and its failures within the access zone. AWS services are located in several locations around the world called accessibility regions. Each region consists of several access zones, which in theory are isolated from each other, are independent data centers, have their own network infrastructure, power supply, and the like. There are several important facts to consider when using regions and accessibility zones:

  1. Virtual hardware is not real physical hardware.
    Our three-year observations showed that the average life cycle of a virtual machine on EC2 is 200 days . After that, the chance that the server will retire greatly increases. And this process is unpredictable: sometimes those. Support informs us in advance of 10 days that the machine will be turned off, sometimes a message that the machine will be turned off comes two hours after it is turned off. Suddenly disappearing equipment is not the biggest problem - you can easily start a new one, but it is important to take this fact into account and spend time on automation of this process in advance in order to save time spent on regular launching of new equipment.
  2. Your servers must be located in more than one availability zone and have all the necessary services in both zones. Our experience has shown that it is more likely that an entire zone may fail than a separate server. So, if you plan to use the primary and backup servers in the same availability zone, in case of problems with the primary server, you will lose the backup server due to common problems within the entire zone. your system in this case will be a single point of failure and you will not be able to restore your data from the backup or extract your files from the servers, because if there are problems with the zone, you won’t even be able to see your servers, not even or data.
  3. Problems with several zones within the region also happen. So if you can afford it, use different regions as well . The US-East region, which is the most popular because it is the oldest and cheapest, experienced problems in June 2012, March 2012 and the most severe glitch was in April 2011, which was called the cloud apocalypse. Our opinion on this issue, due to which we may lose friends in Amazon, the unstable operation of entire regions happens quite often and it happens for the same reason. This led us to the next solution.

To ensure high reliability, we must stop trusting EBS.
This is the point at which we strongly disagree with Amazon marketers and their advice. Amazon suggests that using EBS is fundamental when using EC2. You must store all the data on the EBS disk, you can connect it to new servers, you can take a snapshot of the EBS disk to create database backups and then use it to restore it. Amazon also wants you to use EBS as the root disk device of your system using EBS-backed images. EBS brought us several key issues:

  1. I / O speed on EBS is unsatisfactory
    The I / O speed on virtual hardware is much lower than on pure hardware, but our experience has shown that EBS performance is much lower than the performance of local drives on a virtual machine, which Amazon calls ephemeral storage. EBS drives are essentially network drives . The performance that you should expect from any network drive is not very large. AWS also provides drives with a guaranteed amount of I / O, but they are quite expensive and are not suitable as a slightly more attractive compromise.
  2. EBS fails at the region level, not at the disc level.
    Our experience has shown that EBS has two behaviors: all drives are available or all drives are unavailable. Two of the three failures within the region that were described earlier were related to problems at the EBS level, problems started in one zone and spread to others. If your recovery plan is tied to work with EBS disks, and the denial of work is caused by problems at the EBS level, you will not succeed, we have encountered a similar problem several times.
  3. Problems with EBS on an Ubuntu system are extremely difficult: because EBS is a network drive that is emulated in the system under the guise of a real hard drive, this disrupts operation at the operating system level. It had terrible consequences for us. As soon as problems with EBS occur, the entire server to which the EBS disk is attached is completely inaccessible, and this affects functionality that has nothing to do with disk activity.

For this reason, and also because our main goal is the longest system uptime, we completely abandoned EBS about 6 months ago. We spent some time implementing complex operations, mainly related to backup and recovery, but it was worth it, given the increased system uptime.

Be careful. Other Amazon services may also use EBS.
Due to the fact that some Amazon services use EBS, in case of problems at the EBS level, these services are also not available. This is true for ELB load balancer, RDS database service, Elastic Beanstalk cloud application service and others.

Based on our experience, we came to the conclusion that with serious problems with Amazon, the EBS service is also almost always unavailable. So if EBS does not work, and you need to switch the balancer to another region, you will not be able to do this, since it is tied to EBS. Also, you will not be able to launch new equipment, because the Amazon console is running on EBS. So we love EC2 and really love S3, but we do not use any additional services.
The advantage of our approach is that we can easily switch to using another provider and are not strongly attached to AWS.

The lessons we have learned.
If we started tomorrow, I would use Amazon without too much thought.For a startup with a small team and with a limited budget, this is what you need to get started quickly . AWS does not actually pose any threat; it is not something scary or bad.

IaaS providers such as Joyent and Rackspace are on the heels of Amazon: we have good friends in both companies and we are going to work with them. When the number of our servers grows from 100 to 1000, we will have to diversify our infrastructure with these providers, as well as such as Carpathia, which use AWS Direct Connect to provide hosting services with low AWS access time, which makes the creation of hybrid cloud infrastructures easier.

I hope this information has been helpful to you.

Original article:
Author: Laurie Voss (seldo)

Also popular now: