FT Clustering - everRun by Stratus Technologies

    I recently met a new manufacturer of server hardware and virtualization software called Stratus Technologies. And although this market has long been divided among all well-known manufacturers, their solutions still found their niche, since they decided not to reinvent the wheel, but to offer the market highly specialized products that have virtually no analogues.

    Currently, the Stratus Technologies product portfolio consists of 2 solutions:
    1. A hardware solution that replaces a cluster of 2 servers called ftServer.
    2. Software product - everRun hypervisor.

    Getting a software distribution kit for tests is much simpler than a two-node piece of hardware, so in the first place I was able to evaluate the everRun hypervisor. Below you can see the results.
    image

    In the current difficult times, many companies experience a deficit in the budgeting of IT projects, especially in foreign currency. However, the requirements for reliability and fault tolerance of critical systems have not been canceled. And before many, the question arises: how to ensure redundancy of systems?
    Consider a possible scenario: you need to provide a temperature control system in one specific storage of raw materials. Let it be ... a milk tank. From the point of view of performance, one node is quite enough for us, for example, even a desktop instead of a full server. However, the reliability of such a system should be at a high level, which no stand-alone node will ever provide to us.
    Yes, you can always upgrade such a system to a server virtualization cluster, but what can companies do that do not have server virtualization or have no free resources? Nevertheless, expanding the VMWare cluster is quite costly, both in terms of hardware and in terms of software and licensing.

    It is here that a rather logical and, at first glance, obvious idea arises: to take 2 identical nodes from the “unnecessary zoo” (desktop, decommissioned server, etc.), connect them and cluster them. Such a system should have several distinctive features:
    1. The ability to mirror local disk drives for the possibility of migration of the system between nodes;
    2. The ability to mirror the RAM and processor cache to provide FT functions;
    3. This decision should have a reasonable cost.

    And at first glance, Stratus everRun has all of these features.
    Let's take a closer look at what our “hero” can do.

    Description


    The solution is based on Checkpointing technology, which allows you to synchronize cluster nodes in such a way that failure of one of them does not lead to downtime of our application for a split second.
    The main idea underlying the technology: replication of the state of a guest virtual machine between two independent nodes using checkpoints - status checkpoints that include absolutely everything: processor registers, dirty pages, etc. At the time of creation of the checkpoint, the guest VM is frozen and thawed only after confirmation (ACK) of receipt of the checkpoint by the second system. The control point is created from 50 to 500 times per second, depending on the current load of the VM. The speed is determined directly by the algorithm and is not subject to manual tuning. Thus, the virtual machine lives on two servers at the same time. In the event of a failure of the primary node, the VM starts on the backup node from the last checkpoint received, without data loss and any downtime.

    Stratus everRun is a bare-metal hypervisor based on the KVM product. Such a legacy adds a few nuances to its functionality, but more on that later.
    For this product to work, we need 2 identical nodes that will be interconnected by an Ethernet channel.
    At the same time, the requirements for these nodes are quite democratic:
    • From 1 to 2 sockets.
    • From 8 GB RAM.
    • At least 2 hard drives per node (to comply with fault tolerance at the level of the 1st node).
    • Minimum 2 ports 1 GE per node. The first port is reserved for interconnect between nodes (called A-link). The second - under the control network and data network. At the same time, these networks can be separated into different ports, if available. A mandatory requirement for an interconnect channel is a delay of no more than 2 ms (for the implementation of the FT function).

    Licensing is done according to our usual scheme - 1 license for 2 processor pairs. This means that for clustering 2 nodes we only need 1 license. Clustering more than two nodes and more than two processors per node is currently not supported.
    There are different license levels: Express and Enterprise.
    Express has limited functionality and allows for VM fault tolerance at the HA level. This means that there is no mirroring at the processor cache level; the checkpointing technology described above is not applied. Only disk replication and network-level fault tolerance. Failure of one of the nodes will lead to the VM stopping during its restart on the second node.
    The Enterprise license is fully functional and provides FT level security. Failure of one of the nodes will not lead to downtime of the VM. However, this puts an additional burden on the A-link, and therefore, to ensure FT without affecting VM performance, it is recommended to use channel 10 GE as an A-link.

    Now let's move on to testing.

    Deployment


    To test this product, I was allocated 2 budget SuperMicro servers in the following configuration:

    PNDescriptionqty
    SYS-5018R-MSupermicro SuperServer 1U 5018R-M no CPU (1) / no memory (8) / on board C612 RAID 0/1/10 / no HDD (4) LFF / 2xGE / 1xFH / 1noRx350W Gold / Backplane 4xSATA / SAS2
    SR1YCCPU Intel Xeon E5-2609 V3 (1.90Ghz / 15Mb) FCLGA2011-3 OEM2
    LSI00419LSI MegaRAID SAS9341-4I (PCI-E 3.0 x8, LP) SGL SAS 12G, RAID 0,1,10,5, 4port (1 * intSFF8643)2
    KVR21R15D4 / 16Kingston DDR4 16GB (PC4-17000) 2133MHz ECC Reg Dual Rank, x4, 1.2V, w / TSfour
    ST3300657SSHDD SAS Seagate 300Gb, ST3300657SS, Cheetah 15K.7, 15000 rpm, 16Mb bufferfour

    Not the best hardware, but not the desktop either.
    Having mounted them in one rack directly one above the other and having connected through the switch, I proceeded to install the software.
    Already at this stage, we are waiting for interesting nuances:
    • Currently, the compatibility sheet does not have some common blade solutions. What is the reason for incompatibility with the blades is unknown. I tried to install this software on the IBM FlexSystem in my lab and crashed completely.
    • The distribution must be recorded on physical DVD media. No ISO mount via IPMI, no bootable USB. Only DVD, only hardcore. Fortunately, I found an external DVD drive.

    The wizard is based on CentOS and has a fairly familiar interface.



    The installation process itself is quite simple:
    • Install the software on the first node. During the installation process, you must select a section, select the ports for the A-link and the control network and prescribe the necessary IP addresses and root password.
    • Install the software on the second node. In this case, we select only the installation section and ports. It copies all other information from the main (first) node during the installation process.

    After installing the software on the nodes, you need to connect to the main node by the IP address of the control port. We will see the interface for setting the details of access to the web interface and license activation.



    Thus, we get a pool of 5 addresses: 2 addresses of the IPMI ports of the node, 2 addresses of the control ports of the nodes and 1 address of the web interface.

    Interface


    The interface at everRun is quite simple and pretty.



    As you can see in the screenshot of Dashboard, after installation, the second node remained in maintenance mode. What is the reason I did not understand, maybe some inaccuracies in the installation on my part. The problem was solved by putting the nodes into normal mode by hand through the corresponding section of the menu.

    As we can see in the System section, everRun “eats” 2 processor cores for its own needs. In this case, we allow overprovisioning.



    In addition, there are overhead costs for the cores, depending on the level of protection. In the case of HA, a VM with n cores will leave n + 1 cores of the cluster, and in the case of FT, n + 2. Thus, 5 cluster cores will be allocated on HA VM 4 vCPU, and all 8 cores will be allocated on FT VM 6 vCPU (in the case of multi-user installations, n + 2 will be added per user).
    It is noteworthy that the maximum number of vCPUs supported in FT VM is 8, which is quite an interesting indicator. For comparison, vCenter Server (ver 6.x) allows you to create FT VMs only up to 4 vCPU (and in versions 5.x - up to 1 vCPU), which significantly reduces the scope of this function.

    Creating a VM also has a number of nuances, but rather unpleasant. To create a new VM, we need an ISO image of the OS distribution. It must be located on the local cluster volume. You can download it there only through the web-interface by clicking on the corresponding menu button.



    This leads to the fact that you are forced to download an ISO image from a remote PC over the network. This will either clog the entire channel, or will continue indecently for a long time. For example, the image of Win 2012 R2 weighing 4.2 GB, I uploaded ~ 4 hours. This leads to the second unpleasant feature: everRun does not allow you to forward USB media and PCE-E devices into virtual machines. Those. all possible USB tokens or GPUs cannot be used. In defense of the solution, it can be noted that forwarding USB media directly to the cluster node significantly reduces the system fault tolerance (since when the node fails, you will need to manually reconnect these media), and forwarding the GPU and further replicating the load on the GPU in FT mode is extremely costly business in terms of system. But still, this fact drastically narrows the scope of this solution.

    After the distribution has been uploaded to the cluster storage, we proceed to create a VM.



    As you can see from the VM creation wizard, one of the inherited KVM features has become 2 types of created partitions: QCOW 2 and RAW. The main difference is that RAW volumes do not support snapshots and DisasterRecovery.
    When creating a VM, we need to specify not only the size of the partition, but also the size of the "container". A container is a volume that contains both the section itself and its snapshots. Thus, according to the sizing manual, it is recommended that the container size be made from 2.5 to 3.6 sizes of the VM partition depending on the intensity of the data change:
    VolContSize = 2 * VolSize + [(# SnapshotsRetained + 1) * SnapshotSize]

    To create consistent snapshots, it is proposed to use separately downloadable clients for Windows and Linux environments.

    During the test creation of a snapshot of a freshly created VM with a 30 GB partition, the system ate 8.1 GB. What this space took for me remained a mystery.

    After creating the VM, the Load Balancing function is available to us. The whole point of balancing is that it evenly distributes all the VMs across both nodes of the cluster, or in the manual mode it "binds" to a specific node.



    As part of the stress test, I created 2 virtual machines: one in HA mode, the second in FT mode, after which I forcibly extinguished one of the nodes.
    HA VM went into reboot, after which it automatically rose on the second node.
    FT VM continued its normal operation, while the system resource monitor notes an increase in the load on the disk and network subsystems.



    Application


    While testing this solution, I thought about the areas of its application. Since the product is quite specific, with a lot of nuances, it will be able to show itself well only in highly specialized projects.
    One of the most obvious will be industrial use. Often, during production there are certain objects that cannot be connected to the existing IT infrastructure, but which have their own requirements for availability and fault tolerance. In this case, an everRun-based cluster will look quite profitable against other commercial clustering systems due to the ratio of the cost-effectiveness of the solution and its functionality.
    The second type of projects arising from the features of the first will be video surveillance, especially when it is necessary to establish a video surveillance system at remote or isolated sites, for example, at railway interchanges, oil and gas production points, etc. The isolation of these facilities from the company's main infrastructures does not allow them to be included in existing clusters, and the use of stand-alone video surveillance servers increases the risk of their failure and the resulting maintenance costs.

    The pricing policy of Stratus Technologies implies a unified recommended price for the end customer for everRun licenses worldwide. Delivery is carried out in the usual way - the license is sold under a non-exclusive right agreement, a service subscription for a period of at least 1 year is included in it, which implies the entire amount of technical support in 5x8 or 24x7 mode, including updating to any current version during the term of the subscription.

    Summing up, we can note the narrow focus of this product. With a number of advantages, it cannot be used universally due to the above limitations. But if all these restrictions suit you, then I strongly recommend that you study this solution, since the vendor provides a trial license for 30-60 days without any problems.
    And for those who are interested in personal testing, but who do not have the ability to deploy Stratus everRun on local nodes, we can always offer remote testing based on our demo stand.

    Also popular now: