
Explore the internal mechanisms of Hyper-V

If the work of the hacker, or rather, the research programmer, would have happened as shown in classic films: came, tapped the keys, everything flashed green on the screen, the passwords cracked, and the money suddenly moved from point A to point B, then it would have been live would definitely be easier and more fun. But in reality, any serious hack is always preceded by a thorough and boring analytical work. Here we will deal with it, and roll out the results to your court in the form of a cycle of two articles. Make sure that you have a sufficient supply of beer and cigarettes - reading such materials is dangerous for an unprepared brain :).
The detection of a bug that subsequently received the number MS13-092 (an error in the Hyper-V component of Windows Server 2012 that allows sending a hypervisor to BSOD from a guest OS or executing arbitrary code in other guest OSs running on a vulnerable host server) was a very unpleasant surprise for Microsoft engineers. Before that, for almost three years no one had discovered a vulnerability in Hyper-V. Before her was only MS10-102, which was found at the end of 2010. Over the past four years, the popularity of cloud services has grown dramatically, and researchers are increasingly interested in the security of the hypervisors that underlie cloud systems. However, the number of publicly available works is extremely small: researchers are reluctant to spend their time studying such complex and poorly documented architectural solutions.
INFO
Before reading this article, we recommend that you familiarize yourself with the ERNW report, the Hyper-V debugging for beginners material, and the official Hypervisor TLFS document.
VMBus
At the time of writing, Windows Server 2012 R2 Update 1 (machine type is Generation 1) was used as a Hyper-V server and guest OS, but other versions of Windows operating systems were used to reflect some features of the bus, which will be clearly indicated in article. It is better to deploy a test environment in VMware Workstation 2014 July TechPreview or later, since in earlier versions the bug in Workstation does not allow debugging of virtual machines over the network (or you must force UEFI to be specified in the virtual machine configuration). It will also be assumed in the future that the stand is deployed on the Intel hardware platform and the hypervisor functions are implemented in hvix64.exe.
Terms and Definitions
- Root partition (parent partition, root OS) - Windows Server 2012 R2 with Hyper-V component enabled;
- Guest OS - Hyper-V virtual machine with installed Windows Server 2012 R2;
- TLFS - Hypervisor Top-Level Functional Specification: Windows Server 2012 R2;
- LIS - Linux Integration Services;
- ACPI - Advanced Configuration and Power Interface
About Vmbus
MSDN Article Hyper-V Architecture
In short, VMBus is a technology for interaction between guest operating systems and root OS. Accordingly, both the guest and root OS have components that implement this interaction through the interfaces provided by the hypervisor and described in TLFS 4.0. Microsoft is developing guest components for the Linux family of operating systems that are integrated into the Linux kernel and posted separately on GitHub: github.com/LIS/LIS3.5 .
Starting with Windows Server 2008, features have been added to the Windows kernel that optimize the operation of the operating system in a virtual Hyper-V environment. For comparison, the kernel of Windows Server 2008 (x64) implements a total of 25 functions with the Hvl prefix, which identifies their belonging to the integration library with the hypervisor; in Windows Server 2012 R2, 109 Hvl functions are already present.
Let's see how the components of the VMBus bus interact with the hypervisor, root OS and guest OS. First, take a look at the LIS source code and see that VMBus is a device that supports ACPI. ACPI allows you to standardize the hardware platform for various operating systems and is implemented in Hyper-V (as, incidentally, in other popular virtualization platforms), which allows you to use standard utilities to obtain the information necessary for research.
ACPI devices can be viewed using the ACPI tool utility, which is part of the old version of AIDA64 (it was removed in later versions). With its help, two devices are detected in _SB.PCI0.SBRG: VMB8 and VMBS (see Figure 1).

Fig. 1. VMB8 and VMBS devices
We dump the ACPI DSDT (Differentiated System Description Table) table, which contains information about peripherals and additional functions of the hardware platform, using the same ACPI Tool and decompile the AML disassembler into ASL. We get the dump shown in Fig. 2.

Fig. 2. Dump ASL
A superficial reading of the Advanced Configuration and Power Interface Specification 5.0 made it clear that if the guest OS is Windows 6.2 and higher, then the VMB8 device will be involved, otherwise VMBS. The only difference between these devices is the _UID (Unique ID) object, which is present in VMB8. If you believe the specification for ACPI, then the presence of this object in the table is optional and is required only if the device cannot otherwise provide the operating system with a permanent unique identifier for the device. Also, the resources that the device uses are known — interrupts 5 and 7.
For comparison: in a virtual machine such as Generation 2, there is only a VMBS device located in _SB_.VMOD.VMBS (but with an _UID object) that uses only interrupt 5 (see Fig. 3).

Fig. 3. ASL part of the Gen2 dump
Virtual Interrupt Handling
On Windows, interrupt processing is performed by procedures registered in the interrupt scheduling table (IDT). There is no direct connection between the IRQ 5 and 7 detected by us in ACPI DSDT and the IDT handlers, and in order to match the interrupt to its handler, Windows uses an interrupt arbiter (in general, there are several classes of arbiters, in addition to IRQ, - DMA, I / O, memory).
WWW
All about referees on the MSDN blog
goo.gl/FuvG4R
goo.gl/V3UV8z
goo.gl/h1vXaf
Information on registered arbitrators can be seen in WinDBG using the! Acpiirqarb command.
kd \>! acpiirqarb - For guest Windows Server 2012 R2 Gen1 (Fig. 4):

Fig. 4.! Acpiirqarb Windows Server 2012 R2 Gen1 Guest
The output of the command shows that for IRQ 7 the address of the handler will be in 0x71 IDT element, for IRQ 5 it will be in 0x81. The generation of interrupt handler numbers occurs in the acpi! ProcessorReserveIdtEntries function at the stage of constructing the device tree by the PnP manager when the functional device driver is not yet loaded. ISR registration in IDT occurs already at later stages, for example, when the IoConnectInterrupt procedure is performed by the device driver itself. However, looking at the IDT elements, we see that the ISRs for interrupts 0x71 and 0x81 are not registered:
kd\> !idt -a
…………………………………………………………………………………………………………………………….
71: fffff80323f73938 nt!KxUnexpectedInterrupt0+0x388
81: fffff80323f739b8 nt!KxUnexpectedInterrupt0+0x408
…………………………………………………………………………………………………………………………….
In Windows Server 2012 R2 Gen2 for IRQ 5, the 0x90th IDT element was mapped.
kd\> !acpiirqarb
Processor 0 (0, 0):
Device Object: 0000000000000000
Current IDT Allocation:
0000000000000000 - 0000000000000050 00000000 \
A:0000000000000000 IRQ(GSIV):10
0000000000000090 - 0000000000000090 D ffffe001f35eb520 (vmbus)
A:ffffc00133972660 IRQ(GSIV):5
…………………………………………………………………………………………………………………………….
However, as the debugger shows, the ISR procedure for vector 0x90 is also not defined:
kd\> !idt -a
90: fffff8014a3daa30 nt!KxUnexpectedInterrupt0+0x480
In Windows 8.1 x86, we see a slightly different picture:
kd\> !acpiirqarb
Processor 0 (0, 0):
Device Object: 00000000
Current IDT Allocation:
…………………………………………………………………………………………………………………………….
0000000000000081 - 0000000000000081 D 87f2f030 (vmbus) A:881642a8
IRQ(GSIV):fffffffe — такие значения обычно характерны для
MSI-устройств.
…………………………………………………………………………………………………………………………….
00000000000000b2 - 00000000000000b2 S B 87f31030 (s3cap) A:8814b840
IRQ(GSIV):5
At the same time, for interrupt number 0x81, the ISR procedure vmbus! XPartPncIsr is defined:
kd\> !idt
81: 81b18a0c vmbus!XPartPncIsr (KINTERRUPT 87b59e40 - (см. рис. 5))
b2: 81b18c58 nt!KiUnexpectedInterrupt130
s3cap is an auxiliary driver for working with the S3 Trio emulated Hyper-V graphics card.

Vmbus interrupt object
Thus, the ISR vmbus! XPartPncIsr is registered in IDT only in Windows 8.1 x86 (presumably, in other x86 operating systems that Microsoft supports as guest OS for Hyper-V, the same method is used). The vmbus! XPartPncIsr procedure is used to handle interrupts generated by the hypervisor.
On x64-bit systems, starting from Windows 8 \ Windows Server 2012, integration with the hypervisor is implemented in a slightly different way. The operating system interrupt handlers generated by the hypervisor have been added to the IDT of operating systems. Briefly consider how the IDT is formed at the boot stage of Windows.
After initializing the Windows bootloader, winload.efi IDT looks like this (script output to pykd at the WinDBG breakpoint in winload.efi when loading the operating system with the / bootdebug option):
kd\> !py D:\\hyperv4\\idt\_winload\_parse.py
isr 1 address = winload!BdTrap01
isr 3 address = winload!BdTrap03
isr d address = winload!BdTrap0d
isr e address = winload!BdTrap0e
isr 29 address = winload!BdTrap29
isr 2c address = winload!BdTrap2c
isr 2d address = winload!BdTrap2d
Then, at the time winload! OslArchTransferToKernel IDT is executed, control is transferred to the Windows kernel, where in the nt! KiInitializeBootStructures IDT function the values are initialized from the KiInterruptInitTable table:
kd\> dps KiInterruptInitTable L40
……………………………………………………………………………………….
fffff800\`1b9553c0 00000000\`00000030
fffff800\`1b9553c8 fffff800\`1b377160 nt!KiHvInterrupt
fffff800\`1b9553d0 00000000\`00000031
fffff800\`1b9553d8 fffff800\`1b3774c0 nt!KiVmbusInterrupt0
fffff800\`1b9553e0 00000000\`00000032
fffff800\`1b9553e8 fffff800\`1b377810 nt!KiVmbusInterrupt1
fffff800\`1b9553f0 00000000\`00000033
fffff800\`1b9553f8 fffff800\`1b377b60 nt!KiVmbusInterrupt2
fffff800\`1b955400 00000000\`00000034
fffff800\`1b955408 fffff800\`1b377eb0 nt!KiVmbusInterrupt3
……………………………………………………………………………………….
Accordingly, the 0x30–0x34 system interrupt handlers after initialization will look like this:
kd\> !idt
……………………………………………………………………………………….
30: fffff8001b377160 nt!KiHvInterrupt
31: fffff8001b3774c0 nt!KiVmbusInterrupt0
32: fffff8001b377810 nt!KiVmbusInterrupt1
33: fffff8001b377b60 nt!KiVmbusInterrupt2
34: fffff8001b377eb0 nt!KiVmbusInterrupt3
……………………………………………………………………………………….
The second-generation virtual machine in Hyper-V can only be created on the basis of OSs containing the five additional handlers described above in the kernel. In order to generate interrupts, Intel introduces a virtual interrupt delivery hardware function, but Hyper-V does not use this feature to transfer control to these handlers. Instead, the bit corresponding to the vector number is activated in the hypervisor in a special memory area using instructions of the form lock bts [rcx + 598h], rax, where rax is the number of the interrupt vector (0x30–0x32), so, perhaps, developers of Hyper- V considered the option of registering the vmbus! XPartPncIsr procedure as a handler as a less efficient solution than the option of generating interrupts by means of APIC virtualization based on data in SINTx virtual registers.
The specified handlers are registered in the IDT even if the operating system runs outside of the Hyper-V environment. Each handler calls HvlRouteInterrupt, passing the index as a parameter (see Figure 6).

Fig. 6. Additional system kernel handlers of the Windows
HvlRouteInterrupt is as follows (Fig. 7).

Fig. 7. HvlRouteInterrupt
This function calls the handler from the HvlpInterruptCallback pointer array, depending on the index value. This array in the root OS looks like this:
5: kd\> dps HvlpInterruptCallback
fffff802\`fff5cc30 fffff800\`dc639d50 winhvr!WinHvOnInterrupt
fffff802\`fff5cc38 fffff800\`dd5a9ec0 vmbusr!XPartEnlightenedIsr
fffff802\`fff5cc40 fffff800\`dd5a9ec0 vmbusr!XPartEnlightenedIsr
fffff802\`fff5cc48 fffff800\`dd5a9ec0 vmbusr!XPartEnlightenedIsr
fffff802\`fff5cc50 fffff800\`dd5a9ec0 vmbusr!XPartEnlightenedIsr
fffff802\`fff5cc58 00000000\`00000000
XPartEnlightenedIsr at the index passed from KiVmbusInterruptX adds one of two possible functions from the array of DPC structures to vmbusr in the DPC queue: vmbusr! ParentInterruptDpc or vmbusr! ParentRingInterruptDpc (Fig. 8).

Fig. 8. DPC objects The
number of DPC structures in the array is determined by the vmbusr! XPartPncPostInterruptsEnabledParent function and depends on the number of logical processors in the root OS. For each logical processor, a DPC is added with vmbusr! ParentInterruptDpc and vmbusr! ParentRingInterruptDpc. The vmbusr! ParentRingInterruptDpc function determines the address of the DPC procedure for nt! KeInsertQueueDpc based on which processor is currently running.
In the guest OS, VMBus registers only one handler in the HvlpInterruptCallback array:
1: kd\> dps HvlpInterruptCallback
fffff803\`1d171c30 fffff800\`6d7c5714 winhv!WinHvOnInterrupt
fffff803\`1d171c38 fffff800\`6d801360 vmbus!XPartEnlightenedIsr
fffff803\`1d171c40 00000000\`00000000
The HvlpInterruptCallback array is populated using the nt! HvlRegisterInterruptCallback function exported by the kernel. The WinHvOnInterrupt handler is registered when the winhvr.sys driver is loaded (winhvr! WinHvpInitialize-> winhvr! WinHvReportPresentHypervisor-> winhvr! WinHvpConnectToHypervisor-> nt! HvlRegisterInterruptCallback)
. nt! HvlRegisterInterruptCallback).
Let's try to figure out how the hypervisor transfers control to the system interrupt handlers. To do this, refer to the Virtual Interrupt Control section in TLFS. In short, Hyper-V manages interrupts in the guest OS through a synthetic interrupt controller (SynIC), which is an extension of the virtualized local APIC and uses an additional set of registers mapped to memory (memory mapped registers). That is, each virtual processor, in addition to the usual APIC, has an additional SynIC. SynIC contains two pages: SIM (synthetic interrupt message) and SIEF (synthetic interrupt event flags). SIEF and SIM are arrays of 16 elements, the element size is 256 bytes. The physical addresses (to be more precise - GPA) of these arrays are located in the MSR registers SIEF and SIMP, respectively. The addresses of these pages for each logical processor will be different. Also, 16 SINTx registers are defined for SynIC. Each of the elements of the SIEF and SIM arrays is mapped to the corresponding SINTx register. WinDBG displays the contents of SINTx registers with the! Apic command (starting with WinDBG 6.3).

! apic in root OS

! apic in guest OS
The SINT0 and SINT1 registers are configured by the nt! HvlEnlightenProcessor function by writing parameters to the MSR registers 40000090h and 40000091h respectively. SINT4 and SINT5 are configured by the vmbusr.sys driver: vmbusr! XPartPncPostInterruptsEnabledParent-> winhvr! WinHvSetSint-> winhvr! WinHvSetSintOnCurrentProcessor. SINT2 in the guest OS is configured by the vmbus.sys driver in the winhv! WinHvSetSintOnCurrentProcessor function.
Each SINTx has an 8-bit Vector field. The value of this field determines which interrupt handling procedure will be transferred to control when making hypercalls, in the parameters of which PortID (HvSignalEvent, HvPostMessage) is set.
SINTx can be set implicitly (for example, for the interception message it will always be controlled by the SINT0 register and correspondingly located in the first element of the SIM page), explicitly (for timer messages) or specified in the parameters of the port created using the HvCreatePort hypercall. One option is PortTypeInfo. If the port type is HvPortTypeMessage or HvPortTypeEvent, then in the PortTypeInfo parameter there is a TargetSint containing the SINT number to which the port will be bound and the value of which can be from 1 to 15 (SINT0 is reserved for messages from the hypervisor and cannot be specified as TargetSint when creating the port).
Analysis of the values of active SINT registers in the root OS shows that only three out of five system interrupt handlers (KiHvInterrupt, KiVmbusInterrupt0, KiVmbusInterrupt1) will be involved in the work. For what purposes the system handlers KiVmbusInterrupt2 and KiVmbusInterrupt3 were added to the kernel, it was not possible to detect. Perhaps they will be needed on servers with a large number of logical processors (for example, 64), but, unfortunately, this could not be verified in the test environment. It is also seen from the values of the SINTx registers that the nt! KiHvInterrupt handler (vector 30) will be called both when interrupts from the hypervisor are generated and through ports created with the TargetSint parameter equal to 1.
Windows and TLFS
As an example, consider the parameters of the ports created when each of the services of the guest components of Hyper-V integration is activated. In fig. Figure 11 shows the characteristics of the ports created for the operation of integration services (one port for each component).

Fig. 11. Ports of integration services
Interaction between the root OS and the guest OS during the work of the Integration Services components occurs through the 5th element of the SIEF array, that is, the handler in the root OS will be KiVmbusInterrupt1.
The number of each next created port is equal to the previous one, increased by 1. That is, if you disable all integration services and then turn them on again, then the port numbers created for these services will be in the range from 0x22 to 0x27.
You can see the port parameters if you connect the debugger directly to the hypervisor and monitor the data passed to the HvCreatePort hypercall handler, or connect the debugger to the kernel and monitor the parameters of the WinHvCreatePort function in the winhvr.sys driver.
The remaining ports that are created when the guest OS is turned on (the number of ports depends on the configuration of the guest operating system) is shown in Fig. 12. The numbering is in the order in which the ports were created when the Windows Server 2012 R2 virtual machine was turned on with the default hardware configuration.

Fig. 12. Ports created when starting the virtual machine00000000000000.
It is important to note the fact that the zero SIM slot in both the guest and the parent OS is reserved for sending messages from the hypervisor. The format of such messages is documented in TLFS. When transferring data through the remaining slots, a different data format is used. VMBus messages are not documented, but the necessary information to work with them is present in the LIS source code.
Some information about the processing of VMBus messages by the vmbusr.sys driver (see Figure 13). The vmbusr! ChReceiveChannelMessage function processes such messages in the root OS, which analyzes the contents of the 4th SIM slot and determines the code of the VMBus message. If it is 0 (CHANNELMSG_INVALID) or greater than 0x12, then the function returns an error and 0xC000000D (STATUS_INVALID_PARAMETER). Otherwise, the function processes the message sent by the guest or root OS. For example, when you enable the Guest Services component, the root OS sends the guest OS the CHANNELMSG_OFFERCHANNEL message, in response, the guest OS sends the CHANNELMSG_GPADL_HEADER message, then the root OS sends CHANNELMSG_GPADL_CREATED, receives the CHANNELMSG_OPENCHANNEL message and at the end of the dialog sends the guest OS a CHANNELMSG_OPENCHANNEL_RESULT message with the result code of the channel creation operation. It is worth noting that before processing each valid message, the ChReceiveChannelMessage function checks the transmitted message (ChpValidateMessage), in particular for the sender (root OS or guest OS), the minimum message body size. Each type of message has its own verification conditions. In fig. 13, those messages will be marked that will be processed if they are sent by the guest OS (they may be interesting, for example, to create a fuzzer). that before processing each valid message, the ChReceiveChannelMessage function checks the transmitted message (ChpValidateMessage), in particular for the sender (root OS or guest OS), the minimum message body size. Each type of message has its own verification conditions. In fig. 13, those messages will be marked that will be processed if they are sent by the guest OS (they may be interesting, for example, to create a fuzzer). that before processing each valid message, the ChReceiveChannelMessage function checks the transmitted message (ChpValidateMessage), in particular for the sender (root OS or guest OS), the minimum message body size. Each type of message has its own verification conditions. In fig. 13, those messages will be marked that will be processed if they are sent by the guest OS (they may be interesting, for example, to create a fuzzer).

Fig. 13. VMBus messages processed by the vmbusr.sys driver
In order to understand what messages are exchanged between the root OS and the guest OS, we will write a driver that replaces the addresses of the handlers from the HvlpInterruptCallback array in the root OS with its own handlers. But more about that in the next article.
Conclusion
In the first part of the article, we analyzed the changes in the kernel of the operating system made by Microsoft to optimize the work in a virtual environment, affecting the operation of VMBus. In this issue] [we examined the theory, and the next part will be published the practical part of the study, so be patient. First published in the Hacker magazine from 11/2014. Subscribe to Hacker


