Usikoff October 28, 2015 at 09:36

Testing NVIDIA GRID + VMware Horizon

Today, there are already a lot of articles on testing virtualization technology for graphic workstations using NVIDIA GRID technology. There were implementations on both Citrix and VMware.

But I did not find an objective comparison head-on with the local Quadro performance.

In our configurator , server models with support for GRID technology have been open for a long time , because GRID cards (previously VGX) appeared a long time ago.

The first tests did not live up to expectations, I waited for the drivers to finish, and gradually stopped monitoring progress in this area.

The idea of testing this technology came back at the time of the implementation of one project, when the client needed to optimize the existing server park for the virtualization of workplaces using specialized 3D software.
The servers were in the following configuration:

- CSE-745TQ chassis
- X9DR3-F motherboard
- 2pcs E5-2650 CPU
- 128GB RAM
- 8 SAS disks in RAID5

Equipped servers with GRID K2 cards. VMware was chosen as the hypervisor. Installed specialized software on virtual machines and tested.

During the work of the customer’s benchmark, I observed 3D graphics performance unprecedented for a virtual environment. The obtained results prompted me to continue the study. For further GRID tests, I decided to use SPECviewperf as a fairly objective benchmark.

I also wanted to evaluate the total cost of the solution for comparison with the implementation on the basis of a personal workstation.
Fortunately, the Quadro K5000 and Quadro K420 cards were in stock.

For starters, I ran tests locally on Windows 7 - got the performance results of the Quadro K5000 and K420. Since the GRID K2 card includes 2 chips similar to the K5000 chip, I will need this data to compare the performance of virtual machines in different division modes of GPUs.

At the very beginning, there were difficulties with cooling the card: GRID K2 has passive cooling, and the 745 case cannot provide the necessary card purging using regular methods. I had to install a 90mm fan on the rear panel of the case, drowning out completely empty expansion slots. In order not to take risks, he set the speed to maximum. The noise from it was significant, but the cooling was excellent.

The network has many step-by-step guides for setting up and installing software, so I’ll briefly go over the main points.
After installing ESXI, we install the necessary drivers and modules for GRID. Raise vCenter and Horizon. We create a Windows virtual machine (for example), update VMware software and install all Windows updates. But further settings depend on which mode of GPU virtualization we choose.

There are 3 options:

vSGA (distribution of GPU resources between virtuals through the VMware driver) - this mode is not interesting (high density, but extremely low performance) and will not be considered.
vDGA (physical forwarding of the GPU to a virtual machine using the native NVIDIA driver) - this mode is also of little interest, since it does not provide high density. We consider it only for comparison with the local operation of the Quadro K5000 and the vGPU K280Q profile.
vGPU (forwarding part of the GPU resources to a virtual machine using a special NVIDIA driver) is the most interesting and long-awaited implementation option, and the main hope was assigned to it, since vGPU allows providing unprecedented density of virtual machines using 3D hardware acceleration.

vDGA

To use this virtualization mode, you must forward our GRID K2 card into the virtual environment. Through the vsphere client in the host settings, we select devices for transferring resources to virtual machines.

After rebooting the host in the virtual machine, select the new PCI device.

We start the car, install the NVIDIA GRID and Horizon Agent drivers. After reboots, a physical card appears in the system with the native NVIDIA driver and a bunch of virtual devices (audio playback devices, a microphone, etc.).

Next, go to the Horizon setup. Using the web-based interface, we create a new pool from ready-made virtual machines.

We don’t enable hardware rendering, since we have passthrough mode. We configure access rights to the pool. At this stage, any device with the Horizon client installed can use this virtual machine.

I tried connecting via iPhone 5.

3D was displayed with slight delays, but streaming video was terribly slow. Since everything was flying when connecting to LAN, I concluded: either the brakes are caused by the wireless network, or the phone’s processor can’t cope with unpacking PCoIP.
I ran tests on the iPhone 5s - the result was better. But the iPhone 6 showed an excellent result.

The performance on Android smartphones did not differ much from the iPhone 5, and the work was not very comfortable. But, in any case, the flexibility of accessing the virtual machine is obvious. You can use an existing fleet of workstations / office computers. Or you can transfer to thin clients. Horizon-client is installed on almost any popular OS.

There is also a ready-made build on linux . It already includes the necessary client and it costs about $ 40.

So, in vDGA mode, we can create 2 virtual machines for each K2 card, which will be used exclusively by each GPU.
Performance with the SPECviewperf test is very high for a virtual environment, but still lower than with local tests on the Quadro K5000. I will present all the results at the end of the article for an objective assessment.

vGPU

To use this virtualization mode, you need to uncheck / leave blank the checkbox in the host settings, which are responsible for forwarding resources to the virtual environment.

In the virtual machine settings, select Shared PCI Device.

There are several profiles to choose from.

K200 is a stripped-down K220Q, lower resolution and video memory capacity.
K220Q-K260Q - the main profiles that you can choose to perform specific 3D-tasks.
K280Q is a controversial profile, according to the maximum number of virtual machines it is the same vDGA (2 pcs per K2 card), but lower in performance. The only, in my opinion, plus of this profile is that it can be used in conjunction with another vGPU profile. It should be noted that no more than 2 types of profiles can be allocated to one GRID card. Moreover, it is impossible to combine vGPU and vDGA modes for obvious reasons - they have a different way of interacting with the virtual environment.

Having decided on the profiles and created the necessary number of virtual machines or templates, we proceed to the creation of the pool / pools.
This time, in the render settings, select NVIDIA GRID VGPU.

After installing the original NVIDIA drivers and the Horizon agent on the virtual machine, the virtual machines are available to work through the Horizon client. A video card in vGPU mode will be detected as an NVIDIA GRID device with a profile name.

Testing SPECviewperf V12.0.2

Visually everything looked very cool, especially in K280Q mode

For comparison, the same test in K220Q mode.

Already not so peppy, but in any case worthy for a virtual environment.

Below is a summary table for each SPEC testing module for all virtualization modes + the results of local Quadro K5000 and K420 tests.

Test Results Charts

After analyzing the results, you can see that not in all modes there is a linear increase in performance for certain 3D applications. For example, for Siemens NX there is no difference between the K240Q, K260Q and K280Q profiles (most likely the CPU became a bottleneck). And the Medical module showed the same result not only in K240Q, K260Q and K280Q modes, but also in vDGA mode and even with local Quadro K5000 tests. Maya, in turn, shows a significant leap between the K240Q and K260Q modes (probably this is due to the amount of video memory), and Solid Works showed the same result in all full-fledged profiles.

These results do not fully reflect the performance of the solution, but, in any case, will help to choose the optimal configuration and correctly position the solution for specialized 3D-tasks.

Testing 3ds Max

Since the SPEC test for 3ds Max currently only supports version 2015, and requires its installation, I limited myself to a manual test of the trial version of 2016.

All vGPU modes behave worthy - restrictions as always: the less video memory allocated - the less polygons can be processed.

Work in the youngest mode (K220Q - 16 users per card) was no worse than working on the younger Quadro. With an increase in the number of polygons, FPS remained at a comfortable level of 20-30 frames per second.

Realistic mode (automatic rendering in the preview window) worked without delay, when the model stopped, the texture update was fast enough. In general, I did not find anything that would cause discomfort in the work.

Testing KOMPAS-3D

For the sake of interest, I conducted a benchmark test with Compass-3D. Graphics performance in all modes did not differ much - ranged from 29 to 33 in their "parrots". ASCON experts said that this is the average result of such a decision on Citrix. The test passed somehow quickly, the model spun at great speed (there wasn’t such smoothness as in SPEC), apparently this is a feature of the test. Therefore, I tried to rotate it manually. It spun smoothly and comfortably, despite the fact that the model is complex.

PCoIP Hardware Acceleration

After analyzing the results of SPEC, I found that in some test modes the processor could become a bottleneck. I conducted tests with decreasing the number of cores per virtual machine. In a single-core virtual machine, the results were significantly worse, despite the fact that SPEC loaded only one thread and rarely 100%.

I realized that in addition to the main tasks, the central processor is engaged in encoding the PCoIP stream to send it to the client. Given that PCoIP cannot be called a “lightweight” protocol, the processor load should be substantial. To offload the CPU, I tried using the Teradici PCoIP Hardware Accelerator APEX 2800 .

Having installed the driver on ESXI and virtual machines, I repeated several tests. The results were impressive:

Test Results Charts

In some tests, the performance increased up to 2 times when using the APEX 2800. This card is capable of unloading up to 64 active displays.

Estimation of solution cost

For the final comparison of virtualization solutions for graphic workstations with personal workstations, it is necessary to determine the cost of one workplace for both implementations.

He made several calculations in different versions: a virtual workstation is more expensive than a physical graph station from 1.5 to 4 times, depending on the number of virtual machines. The most budget virtual machine was in the configuration: 32 virtual machines, 1-Core, 7GB RAM, K220Q 0.5GB (equivalent to Quadro K420).

For those who want to see real numbers, I am enclosing links to the GRID solution configurator and the workstation configurator .

Naturally, a miracle did not happen and is unlikely to ever happen - an increase in density entails an increase in the cost of the solution. But we note the obvious advantages of this technology:

- Security (the first plus of the client-server architecture - all data is stored centrally and isolated)
- High density (up to 64 users of graphical applications on 4U rack space)
- Maximum utilization of computing power (minimized system downtime compared to personal workstation)
- Ease of administration (all on one hardware and in one place)
- Reliability(fault tolerance at the component level + the ability to build a fault-tolerant cluster)
- Resource allocation flexibility (in a matter of seconds the user receives the resources necessary to perform resource-intensive tasks without changing the hardware)
- Remote access (access from anywhere in the world via the Internet)
- Cross - platform client part ( connection from any device with any operating system supporting Horizon Client the VMware)
- energy efficiency (by increasing the number of jobs, the server power consumption and thin customers is several times lower than that of all local workstation)

conclusions

As a result of testing, I noted for myself several potentially bottlenecks in the system:

CPU frequency . The low frequency of classic Intel Xeon processors, as tests have shown, often became the bottleneck of the system. Therefore, it is necessary to use high-frequency versions of processors.
PC over IP . When performing tasks associated with displaying streaming video or 3D animation, packing PCoIP consumes a significant part of the resources of the central processor of the virtual machine. Therefore, the use of a hardware PCoIP accelerator will significantly increase the performance of the computing subsystem.
Disk subsystem. It is no secret that the spindle disk in the personal system, in most cases, is a bottleneck. In the server, this problem is even more acute, because the simultaneous start of a dozen virtual machines makes any array think, and an increase in the number of disks at a certain point can no longer affect the situation. Therefore, it is necessary to build hybrid disk subsystems using SSDs. With a higher number of virtual machines, it is necessary to completely consider the option of storage.

That's it. Of course, these were only preliminary tests in order to assess the potential and understand the positioning of this solution. The next steps will be: a full cycle of testing the original configuration with constant and variable load on all virtual machines, building a cluster to increase fault tolerance and, perhaps, something else - what would you advise ...

Conclusion

I had an idea to open access for all of you to the virtualization server so that interested users can evaluate the performance themselves. But I ran into a difficulty: what to choose as a tool for evaluating performance? SPEC - You have already seen everything, and indeed, a synthetic test is not the best way to manually check.

In this regard, I want to conduct a couple of surveys that will help me find out your opinion and prepare the best platform for testing.

Thank you for your attention, the links below for our activity:
Official site of
the Canal Youtube
Vkontakte and Facebook
Twitter and Instagram

Only registered users can participate in the survey. Please come in.

Is GRID ready for mass use?

18.4% Yes, ready. The price / performance ratio has reached the required level for mass implementation. 19
45.6% Ready for use only in a specialized field with increased access security requirements. 47
30% Not ready. Too high cost of the workplace. 31
5.8% Not ready. Graphics performance too low. 6

Would you like to take part in testing GRID technology?

57.6% Yes, for fun. 45
6.4% Yes, I work in heavy graphics applications and want to compare performance. I’ll write the name of the software in the comments and, if possible, give a link to download a trial version. 5
25.6% No, and so everything is clear. 20
10.2% No, it is not interesting to me 8

Tags: