
How to measure the performance of the "cloud"?

On the graph are the performance tests of virtual machines of the five largest players on the IaaS- “clouds” market from cloudspectator.com and the speed of our CROC cloud was measured using the same methodology. The breakdown by day from 06/20/13 to 06/29/13.
Despite our pleasant high result, the conclusions of this testing seemed to us somewhat one-sided. And the task itself caused a lot of controversy among our colleagues: after all, evaluating the performance of the “cloud” is a non-trivial question . And we decided to look deeper.
“Cloud” consists, roughly speaking, of equipment that allows this “cloud” to work, and equipment on which virtual machines of customers run. The performance of the virtual machines themselves depends on the performance of the physical servers on which they are running. There is no single standard: everyone on the market offers completely abstract virtual processors, respectively, completely different configurations of these virtual circuits.
Nobody reveals their technologies used to build the “cloud”. Some even go on to develop their own data center architecture for their own “cloud” - but only very large players, such as Amazon, do it. If you saw their data centers, then you know that they are the size of an airfield.
To begin with, we asked our colleagues and clients about how and for what they choose the “cloud” . It turned out that there was no “single number” as such, but the vast majority had a similar scenario. First, the customer wants to make sure that a single virtual machine is fast enough (that is, what they are offered, for example, not a quarter of the kernel instead of the whole: processors, memory and disk subsystem are important here). Then there are network issues: (ping to the “cloud”, bandwidth of communication channels with the Internet, bandwidth of the cloud-physics interface, and bandwidth of the channel between our data centers).
For large business, the main thing is the speed of interaction between virtual networks and the speed of data center interconnects.Additionally, there are issues of SLA, backup as a service, security, monitoring, reliability of the data center and so on. We have a complete order with this, so performance remains the main point of choice. So, in order to adequately compare ourselves with the rest of the “clouds”, we decided not to depart from the method and testing conditions proposed by the startup and began to measure the speed of our own virtual machines. And there are also nuances here.
If you look at the situation through the eyes of the customer, the results of such tests of individual virtual machines can be useful in order to estimate the necessary size of a virtual server, to which current physical servers can be migrated.
The essence of the initial test is simple: in all “clouds” the usually most common type of virtual machine (1 vCPU, 4 GB RAM) of the same size is taken, on which the same utility is run for some time, making performance measurements. In this particular case, this is the open UnixBench utility, which allows to evaluate in numerical terms the performance of the kernel that is allocated to the virtual machine from the physical processor. Ok, fine, we will do the same.
Examples of virtualka configurations from the same report:

"Cheat" test
Considering the fact that we wanted to get real results on our “cloud”, we tried to bring the conditions as close as possible to the combat ones. What is the difference between tests under real load from synthetic, I think, no need to explain.
To begin with, the question arises of tuning the infrastructure for the test.Settings, of course, you can "tweak" the results by several tens of percent. As well as possible and for the specific tasks of a particular customer, carry out tuning of the cloud's physical infrastructure. For example, knowing exactly the load on the CPU and the ratio of read / write operations of applications of a large customer, you can change the settings to provide greater performance. However, every time there is a choice between buying additional servers or a “tun” for a specific task, we choose to add additional servers. This is due to the fact that the larger the "cloud" - the more physical equipment you need to support. And maintaining the same infrastructure is much easier. If you get carried away by tuning, flexibility is lost and glitches can go. Therefore, for the test, we did not perform any optimization of physical servers, as well as virtual machines.
Secondly, obviously, it makes no sense to measure the performance of a virtual machine, for example, at three in the morning . Since in the "cloud" the performance of a physical server is divided between all customers whose virtual machines are located on it. At night, all virtual machines certainly work, but they are not at all loaded. That is, the test will show much greater values than at the moment when the "cloud" is working with a more or less high load. Our main load falls at approximately 5 p.m. This situation is quite constant, and therefore we began to take measurements at this time. Tested for one and a half weeks. Each test takes about 10-15 minutes.
Network testing
So, we got practical data on the speed of virtual machines and compared ourselves with other “clouds”. Our next step is network tests: they determine the speed of the network between virtual machines, the stability of your virtual infrastructure, the speed and stability of the network at the junction of our “cloud” and the physical equipment of the customer, which is located on another site (or, for example, physical storage systems customer data, docked to its virtual infrastructure), as well as the speed of communication channels that provide interconnect between data centers.
Why is this needed?Because, for example, if you have two active-active systems in two different data centers, or the main system and a copy of high availability in case of failure of the first, then the channel between these points is very important. We specialize in such solutions for large businesses. And soon my colleagues in a separate topic will tell how we fought for one client for the critical reduction of delays from 7 ms to 5 ms for transactions between data centers.
Total
Tests passed in our "cloud", located in the data center "Compressor". UnixBench on configuration 1 vCPU, 4Gb RAM. The results of the tests themselves pleasantly surprised us: they showed that the performance of virtual machines in the CROC “cloud” is no worse than that of the leader in the list of results we received Windows Azure - the infrastructure cloud platform of Microsoft.
The results are as follows: The

final test table:
date | CROC Cloud | Amazon | HP Cloud | Rackspace | Softlayer | Windows azure |
06/20/2013 | 1468.9 | 337.9 | 1350.4 | 561.6 | 1137.2 | 1437.6 |
06/21/2013 | 1454.4 | 375 | 1338.6 | 569.5 | 1139.4 | 1446.1 |
06/22/2013 | 1459.8 | 421.2 | 1352.7 | 568.7 | 1140.7 | 1444.5 |
06/23/2013 | 1446.7 | 378.6 | 1333.5 | 469.1 | 1137.1 | 1451 |
06/24/2013 | 1470.1 | 378.5 | 1344.9 | 568 | 1137.5 | 1433.2 |
06/25/2013 | 1467.8 | 381.5 | 1326.4 | 565.2 | 1142.3 | 1447.1 |
06/26/2013 | 1465.8 | 380.2 | 1338.2 | 563.6 | 1134.1 | 1449.3 |
06/27/2013 | 1439.9 | 375.8 | 1328.8 | 574.1 | 1140 | 1442.6 |
06/28/2013 | 1430.2 | 376.5 | 1327.7 | 537.3 | 1139 | 1434 |
06/29/2013 | 1419.1 | 373.2 | 1341.6 | 575.2 | 1135 | 1442.2 |