Missing Structural Elements in Enterprise Level OpenStack: Part 1 - High Availability

    Author: Dmitry Novakovsky

    Now is a great time to be a company participating in the OpenStack initiative - you get most of the data for marketing and product management, just talking every day with customers and partners. Be that as it may, the competition in this area is quite high, so for the community and for individual vendors it is important to competently create a reserve of functionality and prioritize it, while clearly understanding who and what they want. I will act as the “captain of evidence”, but still I will say that the needs of the Enterprise are very different from the needs of a service provider, government or any IT department operating on the World Wide Web.

    In this post (and in a few upcoming ones), I will share my thoughts on the functionality - actually the “building blocks” - which are still not available in OpenStack, but are necessary for the platform to be successfully used at the Enterprise level. In addition, you will find out whether work is currently underway to bridge this gap and, if so, what solutions exist.

    Missing structural element # 1: high availability / fault tolerance at Enterprise level


    High Availability (or HA): For the Enterprise, these are perhaps the two most important letters in the context of virtualization / cloud. In a nutshell, its presence means that if for some reason a virtual machine (VM) malfunctions, for example, due to an operating system failure, failure of the entire hypervisor node, etc., then The data center / cloud management platform will bring it back to life in no time. This can be done by a quick restart on the same host of the hypervisor or by emergency transfer (evacuation) to another host of the hypervisor. The “extreme” mode for VIP virtual machines is “Fault tolerance”, or the functioning of two VMs on different hypervisors with mirroring of the state of the CPU / memory in such a way that there is always at least one VM that has remained operational,

    Why does an enterprise need high availability support?


    Historically, enterprise-level vSphere success has largely been based on perceiving existing applications as pets. Such applications have been actively developed for many years; they work on bare metal and are maintained in working order by special teams. Applications of this type, as a rule, are not ready to work on the cloud. They have practically no inherent built-in intelligent failover, but they are successfully used to meet the needs of the business and the budget for their development is planned for many years to come.

    In addition to consolidating on fewer physical servers, vSphere improves the “quality of life” of these applications, helping them recover from failures, without requiring them to take any kind of “accounting for the performance of virtualization / cloud services”. To be successful, the OpenStack platform must be able to perform the same function.

    What about high availability on OpenStack?


    The good news is that the “pieces” needed to support a high degree of availability are already there, so building an overall “access-as-a-service” for OpenStack requires less effort than you might expect.

    OpenStack supports several shared + distributed server storage systems that are suitable for dynamic migration / emergency migration (Ceph is our favorite system in Mirantis), and Nova even implements the “nova evacuate” command, which leads to a number of API calls for emergency transfer of the VM to another hypervisor host.

    What is missing is the control + monitoring component (and, of course, the beautiful user interface and powerful PR). Some process, however, must carefully monitor the operation of the VM with support for high availability at various levels (hypervisor availability, nova-compute operability, response to VM ping, etc.), and after making the decision “everything, she died”, run emergency transfer via Nova. In addition, of course, such a system should guarantee the success of the emergency transfer.

    The bad news is that the OpenStack community was (and still remains to some extent) inconsistent in defining the OpenStack development vector in the context of making applications accessible. Fortunately, the latest Atlanta Summit reinforced the need for “Enterprise Conquest” and, while respecting OpenStack’s original principles of “DevOps / Cloud Ready,” many openly community members support the idea of “creating a service that uses Nova API functions for monitoring other services or all VMs and automatically performed certain actions, such as starting another instance from the last snapshot of the volume, creating additional copies, etc. ".

    The most unpleasant (or perhaps just unfortunate) moment is that until the community has developed a consistent position, some potential customers who are considering deploying OpenStack might get the wrong message and think: “OpenStack will never care about ensuring high availability, going beyond the controllers own infrastructure. ” I wonder if we still have time to regain the trust of these people.

    And now comes the moment of truth: who will write the code, and when will it turn into useful functionality?

    Temporary solution


    Someone may argue that a possible solution is to configure Nagios or Zabbix systems that perform intensive polling of pet class virtual machines and scripts that trigger emergency migration. This might work in some kind of weird do-it-yourself environment, but I think that in the context of management it is too cumbersome for the enterprise level. Do not forget that IT is often still the place where costs arise in the enterprise, so we need to facilitate the work of IT employees, and not vice versa. Further, you might also consider using Heat as a “state machine” and Ceilometer as an emergency administrator, but at least at the moment there are no suitable success stories that could be spoken about.

    The real compromise in this case is to start implementing OpenStack with the simultaneous use of the KVM and vSphere hypervisors (provided that the enterprise has certain vSphere licenses). OpenStack can be useful for self-service / collective rent / orchestration and hosting of applications that are ready to work with the KVM-based cloud, and vSphere will do what it does best - act as a host for pet-class applications and take care of that they were satisfied with virtualization like bare metal.

    Fortunately, VMWare has invested heavily in developing the vCenter driver for Nova, and as Kenneth Hui explained in a series of excellent posts, functionality like HA, DRS, and vMotion are functional, even when running under OpenStack. You can even easily take advantage of this setup - see our latest posts on how to use Mirantis OpenStack to build your first OpenStack + vSphere package .

    What other features do you think OpenStack needs in order to succeed at the Enterprise level?

    PS: Keep in mind that high availability support has been included in vSphere since the Essentials Plus Kit, the second least expensive VMWare offering after the ESXi-only Essentials Kit, but you will also need a vCenter license to use it.

    Original article in English .

    Also popular now: