How to move from ESXi to KVM / LXD and not lose your mind

    For a long time, the Maxnet Systems company used the free version of VMware - ESXi, starting with version 5.0, as a hypervisor. The paid version of vSphere scared off the licensing model, while the free version had a number of drawbacks that were not available in the paid version, but you could put up with them. But when in the new versions of ESXi the new web interface refused to work with the old one, and the monitoring of RAID arrays ceased to show signs of life, the company decided to look for a more universal and open solution. The company already had a good experience and a good impression of LXC - Linux Containers. Therefore, it became apparent that the dream hypervisor will be hybrid and combine KVM and LXD for different loads - an evolutionary continuation of LXC. Looking for information on KVM, the company was confronted with misconceptions, rakes and harmful practices,



    About how to cope with moving from ESXi to KVM and not to drive a wheel on a rake, will tell Lev Nikolaev ( maniaque ) - administrator and developer of highly loaded systems, information technology trainer. Let's talk about the Network, repositories, containers, KVM, LXD, LXC, provisioning, and convenient virtual machines.

    Prologue


    We will immediately identify key thoughts, and then we will analyze them in more detail.

    Network. While the speeds of your interfaces do not exceed 1 Gb / s, bridge is enough for you. As soon as you want to squeeze more - it will limit you.

    Repository. Create a shared network storage. Even if you are not ready to use 10 Gbit / s inside the network, even 1 Gbit / s will give you 125 MB / s of storage. For a number of loads, this will be enough with a margin, and the migration of virtual machines will be an elementary matter.

    Container or KVM? Pros, cons, pitfalls. What types of loads are best placed in a container and which ones are best left in a KVM?

    LXD or LXC . Is LXD LXC? Or another version? Or an add-on? What is this all about? Let’s dispel myths and understand the differences between LXD and LXC.

    Convenient provisioning . Which is more convenient: take the same image or install the system from scratch every time? How to do it quickly and accurately every time?

    Convenient virtual machine. There will be scary stories about bootloaders, partitions, LVM.

    Miscellaneous . Many small questions: how to quickly drag a virtual machine from ESXi to KVM, how to migrate well, how to virtualize disks properly?



    Reason for relocation


    Where did we get the crazy idea of ​​moving from ESXi to KVM / LXD? ESXi is popular among small and medium-sized businesses. This is a good and cheap hypervisor. But there are nuances.

    We started with version 5.0 - conveniently, everything works! The next version 5.5 is the same.

    Since version 6.0 it’s already harder. On ESXi, the Web interface did not immediately become free, only from version 6.5, before that a utility for Windows was required. We put up with this. Who runs OS X buys Parallels and installs this utility. This is a well-known pain.

    Monitoring periodically flashed. It was necessary to restart management services in the server console - then CIM Heartbeat appeared again. We endured, as he did not always fall off.

    ESXi 6.5 version - trash, waste and atrocities. Awful hypervisor. And that's why.

    • Angular falls out with an exception at the entrance to the Web interface. As soon as you enter your username and password - immediately an exception!
    • The ability to remotely monitor the status of the RAID array does not work as it is convenient for us. It used to be convenient, but in version 6.5, everything is bad.
    • Weak support for modern network cards from Intel . Network cards from Intel and ESXi cause pain. There is an agony thread on the ESXi support forum about this. VMware and Intel are not friends and relations will not improve in the near future. The sad thing is that even customers of paid solutions experience problems.
    • No migration within ESXi . Unless migration is considered a pause, copy and start procedure. We put the car on pause, quickly copy it and start it in another place. But it’s impossible to call it migration — there is still a simple one.

    Having looked at all this, we got the crazy idea of ​​moving with ESXi 6.5.

    Wish List


    To start, we wrote a wish list for an ideal future that we are going to.

    Management from under SSH , and Web and more optional. The web interface is great, but on a business trip from an iPhone, going into the ESXi web interface and doing something there is inconvenient and hard. Therefore, the only way to manage everything is SSH, there will be no other.

    Windows virtualization. Sometimes customers ask for strange things, and our mission is to help them.

    Always fresh, and the driver can configure the network card . Adequate desire, but unrealized under pure ESXi.

    Live migration, not clustering . We want the ability to drag machines from one hypervisor to another without feeling any delays, downtime or inconvenience.

    The wish list is ready, then a difficult search has begun.

    Flour of choice


    The market revolves around KVM or LXC with different sauces. Sometimes it seems that Kubernetes is somewhere above, where everything is fine, the sun and paradise, and at the lower level there are Morlocks - KVM, Xen or something like that ...

    For example, Proxmox VE is Debian, which was pulled by the kernel from Ubuntu. It looks weird, but does it bring it to production?

    Our neighbors downstairs are Alt Linux. They came up with a beautiful solution: they put together Proxmox VE as a package. They just put the package in one command. This is convenient, but we do not roll Alt Linux into production, so it did not suit us.

    Take KVM


    In the end, we chose KVM. They didn’t take it, Xen, for example, because of the community - KVM has much more. It seemed that we would always find the answer to our question. We later found out that the size of a community does not affect its quality.

    Initially, we calculated that we would take a Bare Metal machine, add the Ubuntu we are working with, and roll KVM / LXD from above. We counted on the ability to run containers. Ubuntu is a well-known system and there are no surprises in terms of solving boot / recovery problems for us. We know where to kick if the hypervisor doesn't start. Everything is clear and convenient for us.

    KVM Crash Course


    If you are from the world of ESXi, then you will find a lot of interesting things. Learn three words: QEMU, KVM, and libvirt.

    QEMU translates the wishes of a virtualized OS into the challenges of a regular process. Works great almost everywhere, but slowly. QEMU itself is a standalone product that virtualizes a bunch of other devices.

    Further on the scene comes a bunch of QEMU-KVM . This is the Linux kernel module for QEMU. Virtualizing all instructions is expensive, so we have a KVM kernel module that translates only a few instructions . As a result, this is significantly faster, because only a few percent of the instructions from the general set are processed. This is all the costs of virtualization.

    If you just have QEMU, starting the virtual machine without binding looks like this:

    $ qemu <миллион параметров>

    In the parameters you describe the network, block devices. Everything is wonderful, but inconvenient. Therefore there is libvirt.

    The goal of libvirt is to be a single tool for all hypervisors . It can work with anything: with KVM, with LXD. It seems that it remains only to learn the syntax of libvirt, but in reality it works worse than in theory.

    These three words are all that is needed to raise the first virtual machine in KVM. But again, there are nuances ...

    libvirt has a config where virtual machines and other settings are stored. It stores the configuration in xml files - stylish, fashionable and straight from the 90s. If desired, they can be edited by hand, but why, if there are convenient commands. Also convenient is that changes to xml files are wonderfully versioned. We use etckeeper- version the directory etc. It is already possible to use etckeeper and it is high time.

    LXC Crash Course


    There are many misconceptions about LXC and LXD.

    LXC is the ability of the modern kernel to use namespaces - to pretend that it is not at all the core that it was originally.

    You can create these namespaces as many as you like for each container. Formally, the core is one, but it behaves like many identical cores. LXC allows you to run containers, but provides only basic tools.

    Canonical, which is behind Ubuntu and aggressively moving containers forward, has released LXD, an analogue of libvirt . This is a binding that makes it easier to run containers, but inside it is still LXC.

    LXD is a container hypervisor that is based on LXC.

    Enterprise reigns in LXD. LXD stores the config in its database - in the directory /var/lib/lxd. There, LXD leads its config to the config in SQlite. Copying it does not make sense, but you can write down the commands that you used to create the configuration of the container.

    There is no unloading as such, but most of the changes are automated by teams. This is an analogue of the Docker file, only with manual control.

    Production


    What we were faced with when we all sailed into operation.

    Network


    How much hellish trash and fuss on the Internet about the network in KVM! 90% of materials say use bridge.

    Stop using bridge!

    What is wrong with him? Lately, I have a feeling that the insanity is happening with containers: put Docker on top of Docker so that you can run Docker in Docker while watching Docker. Most do not understand what bridge is doing.

    It puts your network controller in promiscuous mode and receives all the traffic because it does not know which one and which not. As a result, all the bridge traffic goes through a wonderful, fast network Linux stack, and there is a lot of copying. In the end, everything is slow and bad. Therefore, do not use bridge in production.

    SR-IOV


    SR-IOV is the ability to virtualize within a network card . The network card itself is able to allocate part of itself for virtual machines, which requires some hardware support. This is what will prevent migrate. Migrating a virtual machine where SR-IOV is missing is painful.

    SR-IOV should be used where it is supported by all hypervisors as part of the migration. If not, then macvtap is for you.

    macvtap


    This is for those whose network card does not support SR-IOV. This is the light version of the bridge: different MAC addresses are hung on one network card, and unicast filtering is used : the network card does not accept everything, but strictly according to the list of MAC addresses.

    More bloody details can be found in Toshiaki Makita 's great talk , Virtual Switching Technologies and Linux Bridge . He is full of pain and suffering.

    90% of the materials on how to build a network in KVM are useless.

    If someone says bridge is awesome, don't talk to that person anymore.

    With macvtap CPU saves about 30% due to the smaller number of copying. But promiscuous mode has its own nuances. You cannot connect to the network interface of the guest machine from the hypervisor itself - from the host. A Toshiaki report details this. But in short - it will not work.

    From the very hypervisor rarely go on SSH. It is more convenient to start a console there, for example, a Win-console. It is possible to “watch” traffic on the interface - you cannot connect via TCP, but traffic on the hypervisor is visible.

    If your speeds are above 1 Gigabit - choose macvtap.

    At interface speeds of up to or around 1 Gigabit per second, bridge can also be used. But if you have a 10 Gb network card and want to dispose of it somehow, then only macvtap remains. There are no other options. Except SR-IOV.

    systemd-networkd


    This is a great way to store network configuration on the hypervisor . In our case, this is Ubuntu, but for other systems, systemd works.

    We used to have a file /etc/network/interfacesin which we all kept. One file is inconvenient to edit every time - systemd-networkd allows you to split the configuration into a scattering of small files. This is convenient because it works with any versioning system: it was sent to Git and you see when and what change happened.

    There is a flaw that our networkers discovered. When you need to add a new VLAN in the hypervisor, I go and configure. Then I say: "systemctl restart systemd-networkd". At this moment, everything is fine with me, but if BGP sessions from this machine are raised, they break. Our networkers do not approve of this.

    For the hypervisor, nothing bad happens. Systemd-networkd is not suitable for border boarders, servers with elevated BGP, and for hypervisors - excellent.

    Systemd-networkd is far from final and will never be completed. But this is more convenient than editing one huge file. An alternative to systemd-networkd in Ubuntu 18.04 is Netplan. This is a “cool” way to configure the network and step on the rake.

    Network device


    After installing KVM and LXD on the hypervisor, the first thing you will see is two bridges. One made KVM for himself, and the second - LXD.

    LXD and KVM are trying to deploy their network.

    If you still need a bridge - for test machines or to play, kill the bridge, which is turned on by default and create your own - the one you want. KVM or LXD do it terribly - slip dnsmasq, and the horror begins.

    Storage


    It doesn't matter which implementations you like - use shared storage.

    For example, iSCSI for virtual machines. You will not get rid of the “point of failure”, but you can consolidate storage at one point . This opens up new interesting opportunities.

    To do this, you must have at least 10 Gb / s interfaces inside the data center. But even if you have only 1 Gbit / s - do not worry. This is approximately 125 MB / s - quite good for hypervisors that do not require high disk load.

    KVM can migrate and drag storage. But, for example, in workload mode, transferring a virtual machine to a couple Terabytes is a pain. For migration with a common storage, only RAM is enough, which is elementary. This reduces migration time .

    In the end, LXD or KVM?


    Initially, we assumed that for all virtual machines where the kernel matches the host system, we will take LXD. And where we need to take another core - take KVM.

    In reality, the plans did not take off. To understand why, take a closer look at LXD.

    Lxd


    The main plus is saving memory on the core. The kernel is the same and when we launch new containers the kernel is the same. On this, the pros ended and the cons began.

    Block device with rootfs must be mounted. It's harder than it sounds.

    There is really no migration . It is, and is based on the wonderful gloomy instrument criu, which our compatriots saw. I am proud of them, but in simple cases criu does not work.

    zabbix-agent behaves strangely in a container . If you run it inside the container, then you will see a series of data from the host system, and not from the container. So far nothing can be done.

    When looking at the list of processes on the hypervisor, it is impossible to quickly understand which container a particular process is growing from.. It takes time to figure out what namespace is there, what and where. If the load somewhere jumped more than usual, then quickly do not understand. This is the main problem - the limitation in response capabilities. A mini investigation is conducted for each case.

    The only plus of LXD is saving core memory and reducing overhead.

    But Kernel Shared Memory in KVM already saves memory.

    So far I see no reason to introduce serious production and LXD. Despite Canonical's best efforts in this area, LXD's production brings more problems than solutions. In the near future the situation will not change.

    But, it cannot be said that LXD is evil. He is good, but in limited cases, which I will discuss a little later.

    Criu


    Criu is a gloomy utility.

    Create an empty container, it will arrive with a DHCP client and tell it: “Suspend!” Get the error because there is a DHCP client: “Horror, horror! He opens the socket with the sign "raw" - what a nightmare! " Worse than ever.

    Impressions of containers: no migration, Criu works every other time.

    I “like” the recommendation from the LXD team what to do with Criu so that there are no problems:

    - Take a fresher version from the repository!

    And can I somehow put it from the package so as not to run into the repository?

    conclusions


    LXD is wonderful if you want to create a CI / CD infrastructure. We take LVM - Logical Volume Manager, make a snapshot from it, and start the container on it. Everything works great! In a second, a new clean container is created, which is configured for testing and rolling chef - we actively use it.

    LXD is weak for serious production . We cannot figure out what to do with LXD in production if it does not work well.

    Choose KVM and only KVM!

    Migration


    I will say this briefly. For us, migration turned out to be a wonderful new world that we like. Everything is simple there - there is a team for migration and two important options:

    virsh migrate  qemu+ssh:///system --undefinesource -persistent

    If you type in “KVM migration” in Google and open the first material, you will see a command for migration, but without the last two keys. You will not see a mention that they are important: “Just execute this command!” Run the command - and it really migrates, but only how?

    Important migration options.

    undefinesource - remove the virtual machine from the hypervisor from which we are migrating. If you reboot after such a migration, then the hypervisor that you left will restart this machine. You will be surprised, but this is normal.

    Without the second parameter - persistent - the hypervisor where you moved does not at all consider this to be a permanent migration. After reboot, the hypervisor will not remember anything.

    - virsh dominfo  | grep persistent

    Without this parameter, the virtual machine is circles on the water. If the first parameter is specified without the second, then guess what will happen.

    There are many such moments with KVM.

    • Network: they always tell you about bridge - it's a nightmare! You read and think - how so ?!
    • Migration: they won’t say anything intelligible either, until you beat your head against this wall.

    Where to begin?


    To start late - I'm talking about something else.

    Provisioning: how to deploy it


    If you are satisfied with the standard installation options, the preseed mechanism is great.

    Under ESXi, we used virt-install. This is a regular way to deploy a virtual machine. It is convenient in that you create a preseed file in which you describe the image of your Debian / Ubuntu. Start a new machine by feeding it an ISO distribution kit and a preseed file. Then the car rolls itself. You connect to it via SSH, hook it into a chef, roll cookies - that's it, rush to the prod!

    But if you have enough virt-install, I have bad news. This means that you have not reached the stage when you want to do something else. We got over and realized that virt-install is not enough. We came to some “golden image”, which we clone and then launch virtual machines.

    And how to arrange a virtual machine?


    Why did we come to this image, and why is provisioning important? Because there is still a weak understanding in the community that there are big differences between a virtual machine and a regular machine.

    A virtual machine does not need a complicated boot process and a smart bootloader . It is much easier to attach the disks of a virtual machine to a machine that has a complete set of tools than in recovery mode trying to get out somewhere.

    A virtual machine needs simplicity of a device . Why do I need partitions on a virtual disk? Why do people take a virtual disk and put partitions there, not LVM?

    Virtual machine needs maximum extensibility. Usually virtual machines grow. This is a “cool” process - increasing the partition in the MBR. You delete it, at that moment wiping the sweat from your forehead and think: “Just not write now, just not write!” - and recreate with the new parameters.

    LVM @ lilo


    As a result, we came to LVM @ lilo. This is a bootloader that allows you to configure from a single file. If to edit the GRUB config you are editing a special file that controls the template engine and builds the monstrous boot.cfg, then with Lilo - one file, and nothing more.

    Partitionless LVM makes the system perfect and easy. The problem is that GRUB can’t live without MBR or GPT and it is freezing. We tell him: “GRUB settle here,” but he cannot, because there are no partitions.

    LVM allows you to quickly expand and make backups. Standard dialogue:

    - Guys, how do you make a virtual backup?

    - ... we take a block device and copy.

    - Have you tried to deploy back?

    - Well, no, everything works for us!

    You can lick a block device in a virtual machine at any time, but if there is a file system, then any record in it requires three movements - this procedure is not atomic.

    If you are doing a snapshot of the virtual machine from the inside, then it can talk to the file system so that it comes to the desired consistent state. But this is not suitable for everything.

    How to build a container?


    To start and create a container, there are regular tools from the templates. LXD offers the Ubuntu 16.04 or 18.04 template. But if you are an advanced fighter and do not want a regular template, but your custom rootfs, which you can customize for yourself, the question arises: how to create a container from scratch in LXD?

    Container from scratch


    Preparing rootfs . Debootstrap will help with this: we explain which packages are needed, which are not, and install.

    Explain to LXD that we want to create a container from specific rootfs . But first, create an empty container with a short command:

    curl --unix-socket /var/lib/lxd/unix.socket -X POST -d '{"name": "my-container", "source": {"type": "none"}}' lxd/1.0/containers

    It can even be automated.

    A thoughtful reader will say - where is rootfs my-container? Where is it indicated in what place? But I didn’t say that that’s all!

    Mount rootfs container there , where he would live. Then we indicate that the rootfs container will live here:

    lxc config set my-container raw.lxc "lxc.rootfs=/containers/my-container/rootfs"

    Again this is automated.

    Container life


    The container does not have its own kernel , so loading it is easier : systemd, init, and flew!

    If you do not use regular tools for working with LVM, then in most cases, to start the container, you will need to mount the container rootfs in the hypervisor.

    I sometimes find articles advising autofs. Do not do so. Systemd has automount units that work, but autofs doesn't. Therefore, systemd automount units can and should be used, but autofs is not worth it.

    conclusions


    We like KVM with migration . With LXD, this is not the way to go yet, although for testing and building the infrastructure we use it where there is no production load.

    We love the performance of KVM . It’s more familiar to look at top, see there a process that is related to this virtual machine, and understand who and what we are doing. This is better than using a set of strange utilities with containers to find out what kind of underwater knocks there are.

    We are delighted with migration. This is largely due to the shared storage. If we migrated by dragging disks, we would not be so happy.

    If you, like Leo, are ready to talk about overcoming the difficulties of operation, integration or support, then now is the time to submit a report to the autumn DevOpsConf conference . And we in the program committee will help to prepare the same inspiring and useful presentation as this.

    We are not waiting for the Call for Papers deadline and have already accepted several reports to the conference program . Subscribe to the newsletter and the telegram channel and be up to date on the news about preparations for DevOpsConf 2019 and do not miss new articles and videos.

    Also popular now: