kabal375 July 29, 2015 at 12:04

VMware vSphere 5.5 and 6.0 Performance - Settings, Considerations. Perfomance best practices

Having studied the documents Perfomance Best Practices for vSphere 5.5 and Perfomance Best Practices for vSphere 6.0 , I did not find any special differences in the configuration, as well as something additional specific to vSphere 6.0.

Most of what is written fits into standard recommendations of the format “use compatible and certified equipment” and “when sizing a VM, allocate resources (vCPU, vRAM) in the amount of no more than necessary”.

Nevertheless, I decided to issue the basic things as a separate post, restructuring it a bit, eliminating the “water” and some references and comments that are too specific and are more harmful than useful for most implementations. The bottom line was recommendations, tips and considerations that were tested and tested in practice and applicable to 90% of VMware vSphere and standalone ESXi infrastructures. Diluted by general considerations and additions.

ESXi host

General recommendations

Well, you should use compatible equipment. Processors with hardware support for virtualization. Processors in different cluster hosts must be at least one manufacturer (Intel / AMD) and preferably at the same technology level and generation. Otherwise, problems with vMotion and performance. And uniformity in the hardware configuration of the cluster is generally desirable - it is easier to manage (Host Profiles) and diagnose.
Install the latest BIOS and firmware versions on all the hardware. This will not necessarily affect the performance increase, but in case of problems at the junction of iron-software, the vendor's technical support will still require updating the firmware. For nearly a year, IBM and I were repairing the Blade basket — while we updated the firmware on all the involved and not so components, they released their new versions and had to go in the second round.

BIOS

Make sure that all necessary and useful components and technologies are included - processor cores, Turbo Boost, if any, virtualization technologies (VT-x, EPT, AMD-V, etc.).
The NUMA Node Interleaving option is disabled or the Enable NUMA option is enabled. This item is often misleading. ESXi is a NUMA-awared OS, moreover, it can translate the NUMA architecture into virtual machines, so the inclusion of the ability to recognize NUMA nodes has a positive effect on overall performance in most cases. However, the option “NUMA Node Interleaving”, being in the “Enable” state, actually combines the nodes into a single space, that is, it disables the recognition of NUMA nodes.
NUMA
/ NUMA - Non-Uniform Memory Access - server system architecture in which the memory access time is determined by its location relative to the processor. Each processor has its own set of memory modules, where access is faster, this kit forms a NUMA node. /
Disconnect unused devices. This frees up interrupts and offloads the processor a bit.
Enable Hyper-Threading (if any). In this case, vCPUs are assigned according to the logical cores, and in most cases this leads to optimized use of processor resources.
CPU Power Saving - OS Controlled mode. Or something similar. If supported. ESXi - has a complete set of tools for managing server power. And it’s better to leave this issue to the hypervisor if you want to be sure that it will not negatively affect performance.

Hypervisor

It should be borne in mind that for the processor and memory for each virtual machine there is a certain overhead - an additional amount of both necessary for the VM itself:

- for the vmx process (VM eXecutable);
- for the vmm process (VM Monitoring) - monitoring the status of the virtual processor, mapping - virtual memory, etc .;
- for the operation of virtual VM devices;
- for the operation of other subsystems - kernel, management agents.

The overhead of each machine most of all depends on the amount of its vCPU and the amount of memory. It is not big in itself, but it is worth keeping in mind. For example, if the entire host memory is occupied or reserved by virtual machines, then the response time at the hypervisor level may increase, and there will also be problems with the operation of, for example, technologies such as DRS.

Virtual machines

The main recommendation is minimum sizing. In the sense - to allocate to the virtual machine no more memory and processors than it really needs to work. For in a virtual environment, more resources often lead to worse performance than less. It is difficult to understand and accept right away, but it is. The main reasons:

- overhead described in the previous section;
- NUMA. If the number of vCPUs corresponds to the number of cores in the NUMA socket and the memory size also does not go beyond the limits of the NUMA node, then the hypervisor tries to localize the VM inside one NUMA node. That means access to memory will be faster;
- CPU Scheduler. If a host has a lot of VMs with a large number of vCPUs (more in total than the number of physical cores), then the likelihood of a phenomenon such as Co-Stop is growing - the braking of some vCPUs due to the inability to ensure their synchronous operation within a separate VM, because physical cores are not enough for a simultaneous cycle;
- DRS. Machines with a small number of processors and memory are easier and faster to transfer from host to host. In the event of a sudden load jump, it will be easier to rebalance the cluster if it consists of small VMs and not multi-gigabyte monsters;
- Localization of the cache. Inside the VM, the guest OS can transfer single-threaded processes between different processors and lose the processor cache.

Conclusions and recommendations:

Better one processor loaded at 80% than 4 at 20%.
If the server’s peak load occurs once a quarter, and the rest of the time it runs on 10% of its resources, it is better to cut them (resources) 8 times at once, and add the required amount once a quarter.
Try to fit the VM in terms of the number of vCPUs and memory into the boundaries of the NUMA-node.
If the VM goes beyond the limits of the NUMA node (wide VM), configure the number of processors that is a multiple of the number of cores in the NUMA node. If we have 4 cores in one socket, then the multiple processors recommended for VMs will be 4, 8, 12 ...
When using multiple vCPUs, it is better to configure them as separate virtual sockets with one virtual core in each. Well, or with the number of cores, which is an integer divider from the number of cores in NUMA. If there are 4 cores in the physical socket, then in the virtual one the correct value will be 1, 2, 4. But not 3 or 6.
Disable unused virtual equipment of the virtual machine (COM, LPT, USB ports, Floppy Disks, CD / DVD, network interfaces, etc.)
Use paravirtual equipment (VMware Paravirtual for SCSI controller and VMXNET for network adapter). This reduces the processor load and response time, but may require a driver to install the OS.

Guest OS

Use the latest versions of VMware Tools. After each update, bring ESXi in line.
And in general, have VMware Tools installed.
Disable screen savers and any animation and prettiness in general. If possible, turn off the schedule. This significantly reduces the load on the processor.
Avoid running intensive tasks at the same time (such as anti-virus scanning, backups, and especially defragmentation). Defragmentation is best turned off altogether. The rest, if you can not avoid, smear for different machines at different times.
Synchronize guest OS time using ntp services or VMware Tools, but not both tools at once. But at least one. Since it should be borne in mind that the time in the guest OS is not an exact value, since it depends on the processor, and the processor resource of the VM can receive not evenly and not always in the right amount.
vNUMA. It should be taken into account that for VMs with vCPUs greater than eight, the NUMA architecture forwards into the VM is activated. For some NUMA-awared applications, such as Exchange or MS SQL, this is useful. However, vNUMA is determined at the first boot of the OS and does not change until the number of processors changes. Therefore, if there are hosts in the cluster with a different number of cores in sockets, and therefore with a different NUMA architecture, then when a VM moves from host to host, performance may drop due to the fact that vNUMA does not match NUMA on the new host.

Storage and storage

The main thing to consider is that your storage must support the vStorage API for Array Integration (VAAI). In this case, the following will be supported:

- Offload of the processes of copying, cloning and transferring VMs between LUNs of one storage or between storages supporting the technology. That is, the process will be executed for the most part by the repository itself, and not by the host processor and network.
- Acceleration of zeroing blocks when creating Thick Eager Zeroed disks and during the initial filling of Thick Lazy Zeroed and Thin disks.
- Atomic Test and Set (ATS) - blocking not only the LUN, when changing metadata, but only one sector on the disk. Considering that metadata changes during such processes as turning on / off a VM, migration, cloning and expansion of a thin disk, a LUN with a large number of VMs on it may not get out of SCSI Lock.
- Unmap - “release” of thin LUN blocks when deleting / transferring data (applies only to LUN, but not vmdk).

Considerations and recommendations:

Independent Persistent Mode vmdk-disk - the most productive, since changes are made directly to the disk, not logged. But such a disk is not subject to snapshots, it cannot be rolled back.
When using iSCSI, it is recommended to configure jumbo frames (MTA = 9000) on all interfaces and network equipment.
MultiPathing - for most RoundRobin cases - OK. Fixed can give great performance, but this is after thoughtful planning and manual configuration of each host to each LUN. The MRU can be delivered with an active-passive configuration if some paths disappear from time to time - so as not to jump back and forth.

Virtualization infrastructure

DRS and clusters

To manage resources and control their use, it is better to use Shares to a greater extent, while Limits and Reservation - to a lesser extent. Limits severely limit VMs, even if the cluster has free resources. Reservation, by contrast, eats up a lot of resources, even if they are not used. In addition, when upgrading physical equipment, Shares settings automatically distribute new capacities proportionally. And forgotten limits and reservation can lead to the fact that part of the machine is receiving less resources, although there are already more than enough of them in the cluster.
You should not build complex multi-level hierarchical structures from Resource Pools. There are folders for hierarchies. And also keep on the same level (for example, at the root) and Resource Pools and virtual machines. Because the calculation of Shares for these types of objects is carried out differently and unexpected performance differences may appear.
Once again - the closer the hosts are to each other in configuration, the better. Ideally, all hosts of the same type are in clusters. Without EVC, even on hosts with the same vendor processors, but with a different set of technologies, VMs can only be moved in the off state.

vMotion and Storage vMotion

By default, for every active vMotion process, the hypervisor eats up 10% of one processor core. Both at the receiver and at the source. That is, if all the processor resources on the host are in redundancy, there may be problems with vMotion. (They will definitely be with DRS).

Storage vMotion actively reads from the source datastore, and writes to the target. In addition, both datastores have synchronous recording of changes inside the VM. Hence the conclusion - if we move the VM from a slow datastore to a fast one, the effect will be noticeable only at the end of the migration. And if from fast to slow, then degradation of productivity will occur immediately.

vCenter Server

Minimum hop between vCenter server and its database. It is desirable that both servers are on the same network.
And vCenter and its database should not lack resources.
In the database, the database itself generates mainly random reading, and the logs generate sequential writes. Hence, it is better to put both on different LUNs. It is even better if the paths to them are different (different controllers or arrays). Or different drives, at least.
Also stand alone is tempDB.
It is worth regularly updating the statistics of tables and indexes in the database and getting rid of fragmentation.
Close client sessions (both thick and web), if they are not used, consume a lot of resources for updating inventory.
If there are more than 30 hosts and / or 300 VMs in the infrastructure, it is better to remove the database of the Update Manager component separately from the vCenter database.
If the infrastructure has more than 100 hosts and / or 1000 VMs, the Update Manager itself should be taken out separately
If the infrastructure has more than 32 hosts and / or 4000 VMs, it is worthwhile to separate the Web-client Server component.

Tags: