Uninvited guest. Why aren't virtual machines the best solution for tomorrow's applications?
- Transfer
Hello dear readers!
Despite the ongoing holidays, we do not stop studying foreign technical thought and from time to time check with Radar O'Reilly. In particular, we were interested in an article published on May 4 by Dinesh Subhraveti, which tells about the prospects and problems of virtualization. It touches upon the problems of the adequate use of virtualization, the performance of distributed systems, and the correct operation with big data. The author is trying to investigate the question of whether virtual machines are so indispensable, and whether they will find a place in tomorrow. Since there are very high-rated books on the market, one way or anotherrelated to this colossal topic, we hope that the proposed article will seem to you informative and interesting. If you have any suggestions for the publication of such books, we will listen to them with pleasure.

Currently, the most upscale among distributed systems are operating systems for data centers. So, Hadoop is evolving from the MapReduce framework to YARN - a universal platform for scaling applications horizontally.
In order for a rich ecosystem to work on such platforms, in which various programs normally coexist, it is absolutely necessary to properly isolate these applications from each other. The isolation mechanism should ensure compliance with the limit of consumed resources, eliminate unnecessary software dependencies between applications and the host, guarantee security and privacy, localize failures, etc. This problem is easily and beautifully solved using containers. However, the question often arises: why not use a virtual machine (VM)? In the end, all these systems face the same set of problems that are solved by virtualizing traditional enterprise applications.
Why not use virtual machines?
Although YARN and similar systems face approximately the same problems that are traditionally solved in practice using virtual machines, VMs are poorly suited for horizontal scaling for a number of reasons.
Costs
The resources consumed by the level of virtualization can easily turn into a significant factor affecting the overall costs of the system. Such overheads may not play a significant role in traditional applications, but when we deal with large distributed applications, the percentage of resource overheads quickly accumulates. The shares of host memory lost on each node in a horizontally scaled cluster lead to tremendous wastes of power. Moreover, the active use of resources by virtual machines prevents dense configurations. As a rule, only a couple of virtual machines can work on one physical machine.
High start-up latency is a major source of costs for virtual machines. Unlike regular applications that start and then just continue to work, very short-term tasks are often performed in new ecosystems. If a typical task within a large highly parallelized task is completed within a couple of minutes, then it is unacceptable to spend a serious percentage of this time only on starting a virtual machine.
Despite the extensive optimizations being undertaken across the stack, from iron to the application layer, the execution time overhead caused by virtual machines is still a problem. Although the hardware capabilities make it possible to cope with the costs of processor virtualization, the cost problem remains acute under workloads related primarily to input / output. So, in the case of Hadoop, the virtualized I / O stack consists of HDFS, a guest file system, a guest driver, a virtual device, an image format interpreter, a host file system, a host driver, and finally a physical device. Cumulative costs are quite significant compared to native execution.
Interestingly, the results of experiments on measuring the performance of tasks performed on a virtualized distributed framework like Hadoop can lead to incorrect conclusions. If the task is illiterate, then it seems that in a virtual infrastructure it is sometimes even faster than on native equipment. However, this is due only to a more complete general utilization of resources in terms of tasks, and not any acceleration of individual tasks due to virtualization per se. After all, properly regulated tasks are ultimately limited by the amount of resources provided by the basic equipment.
How the application hypervisor is playing hide and seek
As a rule, applications and operating systems are developed with the expectation of interaction with each other. In the context of a virtualized application, the hypervisor plays the role of a regular operating system that manages the physical hardware. At the same time, the symbiosis of applications and the OS is destroyed, since an opaque level of virtualization arises between them. In fact, the host, the guest system, and the hypervisor perform only a subset of the functions of a regular operating system. It is not so important whether the type A or type B hypervisor is involved. For example, in the case of Xen, the Xen core is the hypervisor, Dom0 is the host system, and guest systems work on DomUs. In Linux, the Linux OS itself is the host, Qemu / KVM is the hypervisor, which, in turn, provides the work of guest kernels. A multi-level system of programs that perform low-level system functions,
In applications running on a virtual machine, it is impossible to consider the topology and configuration of basic physical resources. A certain component may “seem” to the application as a directly connected block device, but may turn out to be a file located on some distant NFS server. Obfuscation of computer and network topologies complicates resource planning at the application level. In the case of Hadoop, the resource manager will make suboptimal planning decisions, as it will come from misconceptions about physical resources. Data and information about the locality of tasks may be lost, but this is not so bad; main and duplicate blocks may end up in the same failed domain, which will cause data loss to be irrecoverable.
Similarly, the hypervisor does not allow you to "peek" into the application. A rough idea of resources in the absence of information about their semantics at the application level does not allow many optimization options to be performed. For example, reading a specific config value from a file is a read operation performed by the block device at the virtual hardware level. Without a semantic context, optimizations such as prefetching or caching will be ineffective. Another example: the hypervisor reserves large areas of physical memory, even if it is not used by the guest application - the fact is that the hypervisor simply cannot detect unused memory pages in the guest system.
Technical support
A large number of virtual machines and guest operating systems based on them are onerous technical support. The timely application of security patches to each individual virtual machine in the vast dynamic infrastructure, where virtual machines are created and removed literally on the fly, can be an impossible task in a large enterprise. The proliferation of virtual machines is another problem. Moreover, the actual cost of licensing guest operating systems can be fabulous, especially when it comes to horizontal scaling.
Bad pairing between application and operating system
It is generally accepted that virtualization helps to “decouple” applications from hardware. However, virtualization leads to the formation of new close ties between applications and their guest operating systems. Applications run as appendages of a virtual machine, which, in turn, embeds the guest operating system into the black box of the virtualized image. You can migrate the entire virtual machine — for example, to repair equipment — but you cannot update the operating system without disrupting the functioning of the application running in it.
Since the application is always tied to the guest operating system, the resources allocated to the application cannot be scaled on demand. First, resources are added to the guest operating system, and it, in turn, provides them to the application. However, usually guest operating systems require a reboot so that they can recognize additional memory or new kernels.
Virtual Machines: Incorrect Application Abstraction
Ultimately, the customer is interested in a well-functioning application — not operating systems or virtual machines. It is the application that needs to be virtualized. However, a virtual machine cannot directly virtualize applications; To make up for this shortcoming, she needs an additional guest operating system.

For virtualization of the application, virtual machines need an additional level with the guest OS
. Author's illustration.
In the course of many years of work, industrial and research committees have devoted a lot of joint efforts to solving problems associated with virtual machines. Numerous innovations have been proposed. Some of them even developed into stand-alone technologies. However, upon careful examination, it turns out that many of these innovations do not give progress and a transition to a qualitatively new level compared to containers. Such technologies are primarily aimed at eliminating the problems caused by the virtual machines themselves. In principle, a huge segment of industrial development is oriented in the wrong direction: we optimize virtual machines, not applications. Such a fundamentally erroneous model allows only relative optimization to be achieved. In the following examples, only a few widespread tricks are considered,
Paravirtualization
Paravirtualization is one of the most pervasive ways to optimize the performance of virtual machines. Because the hypervisor cannot directly view the guest operating system and its applications, and also affect it or them. Instead, he relies on the guest operating system, which receives prompts from him and performs the operations prescribed by him. The interface between the guest system and the hypervisor is called the “Paravirtualization API” or “Hyper-Call Interface”. Of course, this technique will not work with standard unmodified operating systems. Implementing such changes is not easy, as well as supporting them, adapting to changing versions of the kernel.
Dynamic memory allocation
Operating systems manage physical memory very carefully. Thanks to a whole complex of techniques (lazy allocation, copying during recording, etc.), requests for memory allocation are rejected in all cases except when absolutely necessary. In order to cope with the inability of the hypervisor to access the internal components of the operating system, a technique called “dynamic memory redistribution” is used, it is also called “ballooning”. A special driver is used on the guest system, which allows detecting unused memory areas and transmitting this information to the hypervisor. Unused memory pages are squeezed out of the guest operating system and provided to the host system. Unfortunately, the result is an unpleasant side effect: applications periodically experience artificial memory shortages.
Deduplication
Using multiple instances of the same guest operating system and its standard services in a closed area of the address space of each virtual machine leads to the fact that some content samples are stored on several pages of memory. To reduce these costs, an online page deduplication technique called “Shared Page Sharing” (KSM) has been developed. However, it is fraught with serious performance overheads, especially on those hosts where memory restrictions are not provided and configurations with NUMA (uneven memory access) are used.
We open the black box
Virtual machines regard file system data as monolithic image blobs that the guest file system should interpret. Some work has been done to clarify the opaque image structure of virtual machines for indexing, deduplication, offline competitive patching of basic images, etc. However, it turned out that it is very difficult to take into account all the characteristic features of image formats, segments of the devices allocated to them, file systems and changing disk structures.
Containers: Cost-effective virtualization for horizontally scalable applications
Containers are a very peculiar virtualization mechanism aimed at directly virtualizing the applications themselves, and not the operating system. While a virtual machine provides a virtual hardware interface on which an operating system can run, a container provides a virtual operating system interface on which applications can run. It separates the application from its ecosystem through the consistent interface of the virtual operating system, virtualizing a qualitatively defined and semantically rich interface between the application and the operating system, and not between the operating system and hardware.
A container consists of many namespaces, each of which projects a subset of the host resources onto the application by their virtual names. Computing resources are virtualized by the process namespace, network resources are virtualized by the network namespace, the virtual file system is represented by the mount namespace, etc. Since containerized processes natively work on a host under the control of the virtualization level, those subsystems to which containerized virtualization is applied can be adapted for use in a specific practical context. The extent to which a host and its resources are provided for use by a containerized process can be controlled with meticulous precision. For instance,

Containers project a subset of host resources onto an application using multiple namespaces. Author's illustration
Unlike working with virtual machines, when working with containers, there is no guest operating system level, due to which the containers are lightweight, there is no duplication of functionality, and the costs associated with intermediate levels disappear almost completely. At the same time, the delay at startup also becomes negligible, the scalability increases by an order of magnitude, and system management is also simplified.
Already there are first versions of data centers using technologies such as YARN, Mesos and Kubernetes. To ensure adequate isolation in these data centers, containers are used as the main substrate. This paves the way for a new generation of innovations, that is, for true progress.
Despite the ongoing holidays, we do not stop studying foreign technical thought and from time to time check with Radar O'Reilly. In particular, we were interested in an article published on May 4 by Dinesh Subhraveti, which tells about the prospects and problems of virtualization. It touches upon the problems of the adequate use of virtualization, the performance of distributed systems, and the correct operation with big data. The author is trying to investigate the question of whether virtual machines are so indispensable, and whether they will find a place in tomorrow. Since there are very high-rated books on the market, one way or anotherrelated to this colossal topic, we hope that the proposed article will seem to you informative and interesting. If you have any suggestions for the publication of such books, we will listen to them with pleasure.

Currently, the most upscale among distributed systems are operating systems for data centers. So, Hadoop is evolving from the MapReduce framework to YARN - a universal platform for scaling applications horizontally.
In order for a rich ecosystem to work on such platforms, in which various programs normally coexist, it is absolutely necessary to properly isolate these applications from each other. The isolation mechanism should ensure compliance with the limit of consumed resources, eliminate unnecessary software dependencies between applications and the host, guarantee security and privacy, localize failures, etc. This problem is easily and beautifully solved using containers. However, the question often arises: why not use a virtual machine (VM)? In the end, all these systems face the same set of problems that are solved by virtualizing traditional enterprise applications.
“Any problems in computer science are solved by adding another level of indirection - except, of course, the problem of an overabundance of levels of indirection” - David Wheeler.
Why not use virtual machines?
Although YARN and similar systems face approximately the same problems that are traditionally solved in practice using virtual machines, VMs are poorly suited for horizontal scaling for a number of reasons.
Costs
The resources consumed by the level of virtualization can easily turn into a significant factor affecting the overall costs of the system. Such overheads may not play a significant role in traditional applications, but when we deal with large distributed applications, the percentage of resource overheads quickly accumulates. The shares of host memory lost on each node in a horizontally scaled cluster lead to tremendous wastes of power. Moreover, the active use of resources by virtual machines prevents dense configurations. As a rule, only a couple of virtual machines can work on one physical machine.
High start-up latency is a major source of costs for virtual machines. Unlike regular applications that start and then just continue to work, very short-term tasks are often performed in new ecosystems. If a typical task within a large highly parallelized task is completed within a couple of minutes, then it is unacceptable to spend a serious percentage of this time only on starting a virtual machine.
Despite the extensive optimizations being undertaken across the stack, from iron to the application layer, the execution time overhead caused by virtual machines is still a problem. Although the hardware capabilities make it possible to cope with the costs of processor virtualization, the cost problem remains acute under workloads related primarily to input / output. So, in the case of Hadoop, the virtualized I / O stack consists of HDFS, a guest file system, a guest driver, a virtual device, an image format interpreter, a host file system, a host driver, and finally a physical device. Cumulative costs are quite significant compared to native execution.
Interestingly, the results of experiments on measuring the performance of tasks performed on a virtualized distributed framework like Hadoop can lead to incorrect conclusions. If the task is illiterate, then it seems that in a virtual infrastructure it is sometimes even faster than on native equipment. However, this is due only to a more complete general utilization of resources in terms of tasks, and not any acceleration of individual tasks due to virtualization per se. After all, properly regulated tasks are ultimately limited by the amount of resources provided by the basic equipment.
How the application hypervisor is playing hide and seek
As a rule, applications and operating systems are developed with the expectation of interaction with each other. In the context of a virtualized application, the hypervisor plays the role of a regular operating system that manages the physical hardware. At the same time, the symbiosis of applications and the OS is destroyed, since an opaque level of virtualization arises between them. In fact, the host, the guest system, and the hypervisor perform only a subset of the functions of a regular operating system. It is not so important whether the type A or type B hypervisor is involved. For example, in the case of Xen, the Xen core is the hypervisor, Dom0 is the host system, and guest systems work on DomUs. In Linux, the Linux OS itself is the host, Qemu / KVM is the hypervisor, which, in turn, provides the work of guest kernels. A multi-level system of programs that perform low-level system functions,
In applications running on a virtual machine, it is impossible to consider the topology and configuration of basic physical resources. A certain component may “seem” to the application as a directly connected block device, but may turn out to be a file located on some distant NFS server. Obfuscation of computer and network topologies complicates resource planning at the application level. In the case of Hadoop, the resource manager will make suboptimal planning decisions, as it will come from misconceptions about physical resources. Data and information about the locality of tasks may be lost, but this is not so bad; main and duplicate blocks may end up in the same failed domain, which will cause data loss to be irrecoverable.
Similarly, the hypervisor does not allow you to "peek" into the application. A rough idea of resources in the absence of information about their semantics at the application level does not allow many optimization options to be performed. For example, reading a specific config value from a file is a read operation performed by the block device at the virtual hardware level. Without a semantic context, optimizations such as prefetching or caching will be ineffective. Another example: the hypervisor reserves large areas of physical memory, even if it is not used by the guest application - the fact is that the hypervisor simply cannot detect unused memory pages in the guest system.
Technical support
A large number of virtual machines and guest operating systems based on them are onerous technical support. The timely application of security patches to each individual virtual machine in the vast dynamic infrastructure, where virtual machines are created and removed literally on the fly, can be an impossible task in a large enterprise. The proliferation of virtual machines is another problem. Moreover, the actual cost of licensing guest operating systems can be fabulous, especially when it comes to horizontal scaling.
Bad pairing between application and operating system
It is generally accepted that virtualization helps to “decouple” applications from hardware. However, virtualization leads to the formation of new close ties between applications and their guest operating systems. Applications run as appendages of a virtual machine, which, in turn, embeds the guest operating system into the black box of the virtualized image. You can migrate the entire virtual machine — for example, to repair equipment — but you cannot update the operating system without disrupting the functioning of the application running in it.
Since the application is always tied to the guest operating system, the resources allocated to the application cannot be scaled on demand. First, resources are added to the guest operating system, and it, in turn, provides them to the application. However, usually guest operating systems require a reboot so that they can recognize additional memory or new kernels.
Virtual Machines: Incorrect Application Abstraction
Ultimately, the customer is interested in a well-functioning application — not operating systems or virtual machines. It is the application that needs to be virtualized. However, a virtual machine cannot directly virtualize applications; To make up for this shortcoming, she needs an additional guest operating system.

For virtualization of the application, virtual machines need an additional level with the guest OS
. Author's illustration.
In the course of many years of work, industrial and research committees have devoted a lot of joint efforts to solving problems associated with virtual machines. Numerous innovations have been proposed. Some of them even developed into stand-alone technologies. However, upon careful examination, it turns out that many of these innovations do not give progress and a transition to a qualitatively new level compared to containers. Such technologies are primarily aimed at eliminating the problems caused by the virtual machines themselves. In principle, a huge segment of industrial development is oriented in the wrong direction: we optimize virtual machines, not applications. Such a fundamentally erroneous model allows only relative optimization to be achieved. In the following examples, only a few widespread tricks are considered,
Paravirtualization
Paravirtualization is one of the most pervasive ways to optimize the performance of virtual machines. Because the hypervisor cannot directly view the guest operating system and its applications, and also affect it or them. Instead, he relies on the guest operating system, which receives prompts from him and performs the operations prescribed by him. The interface between the guest system and the hypervisor is called the “Paravirtualization API” or “Hyper-Call Interface”. Of course, this technique will not work with standard unmodified operating systems. Implementing such changes is not easy, as well as supporting them, adapting to changing versions of the kernel.
Dynamic memory allocation
Operating systems manage physical memory very carefully. Thanks to a whole complex of techniques (lazy allocation, copying during recording, etc.), requests for memory allocation are rejected in all cases except when absolutely necessary. In order to cope with the inability of the hypervisor to access the internal components of the operating system, a technique called “dynamic memory redistribution” is used, it is also called “ballooning”. A special driver is used on the guest system, which allows detecting unused memory areas and transmitting this information to the hypervisor. Unused memory pages are squeezed out of the guest operating system and provided to the host system. Unfortunately, the result is an unpleasant side effect: applications periodically experience artificial memory shortages.
Deduplication
Using multiple instances of the same guest operating system and its standard services in a closed area of the address space of each virtual machine leads to the fact that some content samples are stored on several pages of memory. To reduce these costs, an online page deduplication technique called “Shared Page Sharing” (KSM) has been developed. However, it is fraught with serious performance overheads, especially on those hosts where memory restrictions are not provided and configurations with NUMA (uneven memory access) are used.
We open the black box
Virtual machines regard file system data as monolithic image blobs that the guest file system should interpret. Some work has been done to clarify the opaque image structure of virtual machines for indexing, deduplication, offline competitive patching of basic images, etc. However, it turned out that it is very difficult to take into account all the characteristic features of image formats, segments of the devices allocated to them, file systems and changing disk structures.
Containers: Cost-effective virtualization for horizontally scalable applications
Containers are a very peculiar virtualization mechanism aimed at directly virtualizing the applications themselves, and not the operating system. While a virtual machine provides a virtual hardware interface on which an operating system can run, a container provides a virtual operating system interface on which applications can run. It separates the application from its ecosystem through the consistent interface of the virtual operating system, virtualizing a qualitatively defined and semantically rich interface between the application and the operating system, and not between the operating system and hardware.
A container consists of many namespaces, each of which projects a subset of the host resources onto the application by their virtual names. Computing resources are virtualized by the process namespace, network resources are virtualized by the network namespace, the virtual file system is represented by the mount namespace, etc. Since containerized processes natively work on a host under the control of the virtualization level, those subsystems to which containerized virtualization is applied can be adapted for use in a specific practical context. The extent to which a host and its resources are provided for use by a containerized process can be controlled with meticulous precision. For instance,

Containers project a subset of host resources onto an application using multiple namespaces. Author's illustration
Unlike working with virtual machines, when working with containers, there is no guest operating system level, due to which the containers are lightweight, there is no duplication of functionality, and the costs associated with intermediate levels disappear almost completely. At the same time, the delay at startup also becomes negligible, the scalability increases by an order of magnitude, and system management is also simplified.
Already there are first versions of data centers using technologies such as YARN, Mesos and Kubernetes. To ensure adequate isolation in these data centers, containers are used as the main substrate. This paves the way for a new generation of innovations, that is, for true progress.