irriss July 30, 2012 at 13:35

Web Architecture Internet Maps

Hello everybody!

You've probably already heard about the Internet Map. If not, you can look at it here , and you can read about it in my previous post .

In this article I would like to tell you how the site of the Internet Map is arranged, what technologies ensure its normal functioning and what steps had to be taken to withstand a large flow of visitors who want to look at the map.

The performance of Internet Maps is supported by modern technologies from Internet giants: Google Maps engine from Google displays the map, web requests are processed by Microsoft .net technologies, and Amazon Web Services from Amazon provides hosting and content delivery. All three components are vital for the normal operation of the card.

Further, a large sheet about the internal architecture of the card: mainly AWS praises, issues of performance and hosting costs will also be touched. If you are not afraid, welcome to cat.

Amazon CloundFront and Google Maps

Google Maps technology involves the use of tiles - small pictures of 256x256 pixels from which the map image is formed. The main point associated with these pictures is that there are really a lot of them. When you see a map on your screen with high resolution, it all consists of these small pictures. This means that the server must be able to process a lot of requests very quickly and give tiles at the same time so that the client does not notice the mosaic. The total number of tiles required to display the map is sum (4 ^ i), where i runs through the values from 0 to N, where N is the total number of zooms. In the case of the Internet Map, the number of zooms is 14, i.e. the total number of tiles should be approximately 358 million. Fortunately, this astronomical figure was reduced to 30 million, abandoning the generation of empty tiles. If you open the browser console, you will see many 403 errors, these are exactly they are - missed tiles, but this map is not visible because if there is no tile, then the square is filled with a black background. One way or another, 30 million tiles is also a significant figure.

Therefore, the standard scheme for placing content on a dedicated server, in this case, is not suitable. There are a lot of tiles, there are a lot of users, there should be a lot of servers, and they should be near the users so that they do not notice the delay. Otherwise, users from Russia will receive a good response, and users from Japan will remember the time of dial-up modems looking at your card. Fortunately, Amazon has a solution for this case (Akamai also has a company, but it's not about her). It is called CloudFront and is a global content delivery network (CDN). You place your content somewhere (this is called Origin) and create a Distribution in CloudFront. When a user requests your content, CloudFront automatically finds the site of the network closest to the user and,

It turns out that your data is replicated many times and is likely to be delivered from CloudFront servers, and not your expensive, weak and unreliable storage. In the case of the Internet Map, the CloudFront connection actually meant that the data from my hard drive was physically copied to the Singapore Simple Storage Service (S3) segment, and then through Distribution was created through the AWS console, where S3 was specified as the data source (Origin). If you look at the code of the Internet Map page, you can see that the tiles are taken from the CloudFront address d2h9tsxwphc7ip.cloudfront.net . Detecting the nearest site, keeping the content up to date and all these things CloudFront does automatically. Hurrah!

In the picture you can see how the original map is divided into tiles, the tiles are stored in S3, and from there they are uploaded to CloudFront and delivered to users from its nodes.

Amazon rds

To provide a site search on the map, you need a database where information about sites and their coordinates will be stored. In this case, we have MS SQL Express in the Amazon cloud. This is called the Relational Database Service (RDS). We don’t really need relationality. we only have one table, but it’s better to have a complete database than to reinvent the wheel. RDS allows you to use not only MS SQL, but also Oracle, MySql and, probably, something else.

In the picture you can see how the source map turns into a table in the RDS database.

Amazon elastic beanstalk

Probably, this feature in the Amazon cloud services family impressed me the most. Elastic Beanstalk allows literally with one click to release the project under load with minimal time or without leaving the site offline. Knowing how hard releases are, especially when the infrastructure contains several servers and a load balancer, I was simply amazed how easily and elegantly Elastic Beanstalk handles this! At the first deployment, it creates the entire infrastructure necessary for your application (environment): load balancer (Elastic Load Balancer - ELB), computing units (Elastic Compute Cloud - EC2) and determines the scaling parameters. Roughly, if you have one server and all the requests go directly to it, then when a certain threshold is reached, your server will cease to cope with the load and most likely will crash. Sometimes he will not even be able to rise under the load on which he worked perfectly before, because to enter the operating mode, usually it takes some time, and constant requests do not allow this to be done. In general, whoever fought knows.

Elastic Beanstalk takes care of all infrastructure issues. In fact, you can put the plugin in MS Visual Studio and forget about the details. He will support version control, deploy, etc. And if the load increases, it will create as many EC2 instances as necessary.
On the diagram, the Elastic Beanstalk is surrounded by a dotted line, inside you can see the ELB, which accepts incoming requests and distributes them to IIS in EC2 instances.

Performance and price

Immediately after the publication of the article on the website Habrahabr.ru, a stream of visitors went to the Internet Map. On the graph you can see a very sharp increase in traffic, in the first 6 hours 30,000 people visited the site, and on the first day almost 50,000, mainly from Russia and the countries of the former USSR. Feeling something was wrong, Elastic Beanstalk created 10 EC2 instances and they did a good job. Complaints about problems with access to the site have been reported. The map could be viewed freely. But RDS immediately died: at first the search began to work very slowly, then intermittently, and then completely stopped. The account for the first day was about $ 200. About 100 for S3 + CloudFront and 50 for EC2 and RDS.

Having studied the gained experience, I carried out optimization and reconfiguration of autoscaling parameters. And that helped. During the week, the site was visited by an average of 30-50 thousand people a day from around the world and nothing fell off. True, there was no such a sharp influx as on the first day.

Then someone posted information about the Card on reddit.com and this caused an explosive increase in traffic. About half a million people visited the site on Sunday, and only one small instance EC2 and one small RDS instance worked. True, there was one complaint that the card does not load, but, I think that this is normal for such a wave.

And here is the bill for the first week

Conclusion

I began to engage in information technology when the word cloud had nothing to do with IT. Since then, much has changed and standalone servers live their lives. Of course, hosting in the cloud has its drawbacks (you can ask Instagram, for example). But the ability to shift most concerns to the cloud service, in my opinion, more than pays for all the risks. If you are starting to develop your project and quality, availability, reliability and scalability are important to you, then most likely you are in the cloud.

Tags: