Intel Virtual World. Practice

In this article, I want to explore the practical aspects of creating a simple hypervisor based on Intel VMX hardware virtualization technology.

Hardware virtualization is a highly specialized area of ​​system programming and does not have a large community, in Russia for sure. I hope that the article’s material will help those who want to discover hardware virtualization and the opportunities it provides. As mentioned at the beginning, I want to consider the practical aspect without immersing in theory, so it is assumed that the reader is familiar with the x86-64 architecture and has at least a general understanding of the mechanisms of VMX. Sources to the article .

Let's start by setting the tasks for the hypervisor:

  1. Run before loading the guest OS
  2. Supports one logical processor and 4 GB of guest physical memory
  3. Ensuring correct operation of the guest OS with devices projected in the physical memory area
  4. VMexits processing
  5. Guest OS from the first commands should be performed in a virtual environment.
  6. Output of debug information via COM port (universal method, easy to implement)

As a guest OS, I chose Windows 7 x32, in which the following restrictions were set:

  • Only one log CPU is involved.
  • The PAE option is disabled, which allows a 32-bit OS to use more physical memory than 4GB
  • BIOS in legacy mode, UEFI disabled

Description of the loader

In order for the hypervisor to start when the PC starts up, I chose the easiest way, namely, I recorded my bootloader in the MBR disk sector on which the guest OS is installed. It was also necessary to place the hypervisor code somewhere on the disk. In my case, the original MBR reads the bootloader starting from sector 2048, which gives a conditionally free area for writing to (2047 * 512) KB. This is more than enough to accommodate all components of the hypervisor.

Below is a diagram of the location of the hypervisor on the disk, all values ​​are given in sectors.

The loading process is as follows:

  1. loader.mbr reads the loader.main loader.main code from the disk and transfers control to it.
  2. loader.main performs the transition to long mode, and then reads the table of loadable loader.table elements, based on which further loading of the hypervisor components into memory is performed.
  3. After the loader has completed its work in physical memory, the hypervisor code is located at 0x100000000, this address was chosen so that the range from 0 to 0xFFFFFFFF can be used for direct mapping to the guest physical memory.
  4. The original Windows mbr is loaded at the physical address 0x7C00.

I want to draw attention to the fact that after the switch to long mode, the loader can no longer use the BIOS services for working with physical disks, so I used the Advance Host Controller Interface to read the disk.

More details about which can be read here .

Hypervisor Job Description

After the hypervisor gets control, its first task is to initialize the environment in which it has to work, for this, the following functions are called:

  • InitLongModeGdt () - creates and loads a table of 4 descriptors: NULL, CS64, DS64, TSS64
  • InitLongModeIdt (isr_vector) - initializes the first 32 interrupt vectors with a common handler, or rather with its stub
  • InitLongModeTSS () - initializes the task status segment
  • InitLongModePages () - initialization of page addressing:

    [0x00000000 - 0xFFFFFFFF] - page size 2MB, cache disable;
    [0x100000000 - 0x13FFFFFFF] - page size 2 MB, cache write back, global pages;
    [0x140000000 - n] - not present;
  • InitControlAndSegmenRegs () - reloading segment registers

Next, you need to make sure that the processor supports VMX, the check is performed by the CheckVMXConditions () function :

  • CPUID.1: ECX.VMX [bit 5] should be set to 1
  • In the MSR register IA32_FEATURE_CONTROL bit 2 must be set - enables VMXON outside SMX operation and bit 0 - Lock (actual when debugging in Bochs)

If everything is in order and the hypervisor is running on a processor that supports hardware virtualization, go to the initial initialization of VMX, look at the InitVMX () function :

  • VMXON and VMCS (virtual-machine control data structures) memory sizes of 4096 bytes are created. In the first 31 bits of each of the areas is recorded VMCS revision identifier taken from MSR IA32_VMX_BASIC.
  • A check is performed that in the system registers CR0 and CR4 all bits are set in accordance with the requirements of the VMX.
  • The logical processor is transferred to the vmx root mode with the VMXON command (as an argument, the physical address of the VMXON region).
  • The VMCLEAR (VMCS) command sets the VMCS launch state to Clear, and the command sets the implementation-specific values ​​to VMCS.
  • The VMPTRLD (VMCS) command loads the current VMCS pointer address of the VMCS passed as an argument.

The execution of the guest OS will begin in real mode from the address 0x7C00 at which, as we remember, the loader.main loader places win7.mbr. In order to recreate a virtual environment identical to the one in which mbr is usually executed, the InitGuestRegisterState () function is called, which sets the vmx non-root registers as follows:

CR0 = 0x10
CR3 = 0
CR4 = 0
DR7 = 0
RSP = 0xFFD6
RIP = 0x7C00
RFLAGS = 0x82
ES.base = 0
CS.base = 0
SS.base = 0
DS.base = 0
FS.base = 0
GS.base = 0
LDTR.base = 0
TR.base = 0
ES.limit = 0xFFFFFFFF
CS.limit = 0xFFFF
SS.limit = 0xFFFF
DS.limit = 0xFFFFFFFF
FS.limit = 0xFFFF
GS.limit = 0xFFFF
LDTR.limit = 0xFFFF
TR.limit = 0xFFFF
ES.access rights = 0xF093
CS.access rights = 0x93
SS.access rights = 0x93
DS.access rights = 0xF093
FS.access rights = 0x93
GS.access rights = 0x93
LDTR.access rights = 0x82
TR.access rights = 0x8B
ES.selector = 0
CS.selector = 0
SS.selector = 0
DS.selector = 0
FS.selector = 0
GS.selector = 0
LDTR.selector = 0
TR.selector = 0
GDTR.base = 0
IDTR.base = 0
GDTR.limit = 0
IDTR.limit = 0x3FF

It should be noted that the limit field of the descriptor cache for the DS and ES segment registers is 0xFFFFFFFF. This is an example of using unreal mode — an x86 processor feature that allows you to bypass the segment limit in real mode. More information about this can be found here .

Being in vmx not-root mode, the guest OS may encounter a situation where it is necessary to return control to the host to vmx root mode. In such a case, a VM exit occurs during which the current state of the non-root vmx is maintained and the vmx-root is loaded. Initialization of vmx-root is performed by the InitHostStateArea () function , which sets the following register value:

CR0 = 0x80000039
CR3 = PML4_addr
CR4 = 0x420A1
RSP = адрес на начало фрейма STACK64
RIP = адрес обработчика VMEXIT_handler
ES.selector  = 0x10
CS.selector = 0x08
SS.selector = 0x10
DS.selector = 0x10
FS.selector = 0x10
GS.selector = 0x10
TR.selector = 0x18
TR.base = адрес TSS
GDTR.base = адрес GDT64
IDTR.base = адрес IDTR

Next, the guest physical address space is created (the InitEPT () function ). This is one of the most important moments when creating a hypervisor, because an incorrectly specified size or type on any of the memory locations can lead to errors that may not immediately manifest themselves, but are more likely to cause unexpected brakes or hangs in the guest OS. In general, there is little pleasant and it is better to pay enough attention to the memory setting.

The following image shows the model of the guest physical address space:

So, what we see here:

  • [0 - 0xFFFFFFFF] the whole range of guest address space. Default type: write back
  • [0xA0000 - 0xBFFFFF] - Video ram. Type: uncacheable
  • [0xBA647000 - 0xFFFFFFFF] - Devices ram. Type: uncacheable
  • [0xС0000000 - 0xCFFFFFFF] - Video ram. Type: write combining
  • [0xD0000000 - 0xD1FFFFFF] - Video ram. Type: write combining
  • [0xFA000000 - 0xFAFFFFFF] - Video ram. Type: write combining

I used the information from the RAMMap utility (Physical Ranges tab) to create such areas. I also used the data from the Windows Device Manager. Of course, on the other PC, the address ranges are likely to be different. Regarding the type of guest memory, in my implementation, the type is determined only by the value specified in the EPT tables. This is simple, but not entirely correct, and in general it is necessary to take into account the type of memory that the guest OS wants to install in its page addressing.

After the creation of the guest address space is completed, you can go to the VM Execution control field settings (InitExecutionControlFields () function )). This is a fairly large set of options that allow you to specify the operating conditions of the guest OS in the vmx not-root mode. You can, for example, track calls to I / O ports or monitor changes in MSR registers. But in our case, I only use the ability to control the setting of certain bits in the CR0 register. The fact is that 30 (CD) and 29 (NW) bits are common for both vmx non-root and vmx root modes, and if the guest OS sets these bits to 1, this will negatively affect performance.

The process of setting up the hypervisor is almost complete, it remains only to establish control over the transition to guest mode vmx non-root and return to host mode vmx root. Settings are set in the following functions:

InitVMEntryControl () settings for switching to non-root vmx:

  • Load Guest IA32_EFER
  • Load Guest IA32_PAT

InitVMExitControl () settings for switching to vmx root:

  • Load Host IA32_EFER;
  • Save Guest IA32_EFER;
  • Load Host IA32_PAT;
  • Save Guest IA32_PAT;
  • Host.CS.L = 1, Host.IA32_EFER.LME = 1, Host.IA32_EFER.LMA = 1;

Now that all settings have been made, the VMLaunch () function places the processor in non-root vmx mode and starts the guest OS. As I mentioned earlier, in the settings of the vm execution control conditions can be set, on occurrence of which the hypervisor returns control to itself in the vmx root mode. In my simple example, I give the guest OS complete freedom of action, but in some cases, the hypervisor will still have to intervene and adjust the OS.

  1. If the guest OS tries to change the CD and NW bits in the CR0 register, the VM Exit handler
    corrects the data written to CR0. The CR0 read shadow field is also modified so that when reading CR0, the guest OS gets the recorded value.
  2. Execute the xsetbv command. This command always calls VM Exit, regardless of the settings, so I just added its execution in vmx root mode.
  3. Running the cupid command. This command also calls unconditional VM Exit. But I made a small change to its handler. If, as an argument in eax, the values ​​0x80000002 are 0x80000004, cpuid returns not the name of the processor brand, but the line: VMX Study Core:) The result can be seen in the screenshot:


Written as an example to an article, the hypervisor is fully capable of maintaining stable operation of the guest OS, although of course it is not a complete solution. Intel VT-d is not used, support of only one logical processor is implemented, there is no control over interruptions and operation of peripheral devices. In general, I have used almost nothing from the rich set of tools that Intel provides for hardware virtualization. However, if the community is interested, I will continue to write about Intel VMX, especially since there is something to write about.

Yes, I almost forgot, it is convenient to debug the hypervisor and its components using Bochs. For the first time it is an indispensable tool. Unfortunately, loading a hypervisor in Bochs is different from loading on a physical PC. At one time I did a special assembly to simplify this process, I will try to put the source code in order and also upload it with the project in the near future.

That's all. Thanks for attention.

Also popular now: