Public and private computing clouds - real-world experience

Recently, Box.net and Zynga made a presentation on the use of public computing clouds in their infrastructure. The topic interested me, especially in light of the refusal in April 2011 of several Availability zones of the Amazon EC2 cloud, which made several large Internet resources and games on Facebook unavailable for several days. Presentations were presented very briefly, the speakers did not disclose specific implementation details. But even superficial data is of interest.


Box.net provides a business-level remote storage service. To service 300 million documents and more than 100 TB of disk space, more than 2500 virtual machines are used, more than 500 of which are occupied by MySQL servers. Box.net uses Scalr software to manage and scale the cloud. OpsCode and Puppet are used to manage software versions and configurations.

Scalr is responsible for monitoring, load balancing, and adding new virtual machines. Virtual machines are distributed across three public clouds - Amazon EC2, RackSpace and OpneStack, which allows Box.net to survive the failure of any of their two clouds. Copies of virtual machines are added by Scalr automatically through the API of each cloud. The most difficult task of scaling a site is scaling the database. This task is also solved by Scalr. In case of failure of one MySQL replica in one of the clouds - it is simply copied to the same cloud from another replica. In case of failure of the MySQL wizard, the application is placed in read-only mode, after which one of the replicas clones itself, after which it declares itself to be the master. All replicas switch to the new wizard, the application continues to work in the usual mode.

Zynga’s spokesperson, CTO at Allan Leinwand, began the presentation with a description of the company's main infrastructure requirement — lightning-fast scaling after launching a new game. The last to rejoice at FarmVille's success in 2009 was the Zynga Operations Department. In the first 26 weeks after the launch of the game, the number of virtual farmers has grown by a million, instead of the expected 200 thousand. The place in the Zynga data center simply ran out - there was nowhere to grow. At that moment, the company had developments that allowed it to quickly transfer the application to virtual servers in the Amazon EC2 cloud. This and automatic scaling in Amazon EC2 has increased the number of users to 70 million, making FarmVille one of the most popular online games.

The flip side to fame is the huge bills from Amazon EC2. It was decided to transfer the popular application to their own data centers. But - taking into account the experience gained - in your own cloud by analogy with Amazon EC2. The requirements for its own cloud - ZCloud - turned out to be the following:
ZCould should work on x86 architecture.
Support for at least 1000 servers.
Using recognized virtualization technologies (Xenserver, KVM).
Using ONLY one virtual machine per physical server.
CentOS support.
Availability zone support, similar to Amazon Availability Zones.
Integration with RightScale already used at that time.
Cloud operability through a network with routing (IP routed network) - that is, to eliminate the dependence on inter-rack VLANs traditional for data centers.

All these requirements were implemented in ZCloud, which operates in two data centers: one on the east coast of the United States, and the second on the west. Data centers are weakly connected - the inaccessibility of one should not affect the availability and performance of the application. Allan refused to answer the direct question about the number of servers in ZCloud, unveiling only the fact that once they had to enter 1000 new servers into the cloud in 24 hours.

As with Box.net, a third-party application is used to control and scale the cloud, in this case, RightScale. Zynga implemented the balancing and monitoring on their own, at least they could not find out any details on this subject.

Zynga continues to use Amazon EC2 the same way, launching new applications first there, studying traffic and popularity. Successful games that reach a certain traffic translate mostly to ZCloud, thereby reducing costs and increasing application performance.

Finally, Allan expressed his thoughts about the future of public clouds: they have room to grow and improve. Performance in the public cloud tends to be poor. On the other hand, your data center or cloud makes sense only when you reach a certain level of traffic, implying capital investments in hardware and the cost of developing your own cloud.

On my own, I just want to add that the hybrid model with a public / private cloud seemed interesting enough to me. There is also an option in the middle of cost / scalability / performance - server rental (dedicated server hosting).

Also popular now: