The nuances of a bunch of ESXi, FlexFabric, 10 Gbit and NFS

In this article, I would like to present useful information collected during the design and implementation of a fault-tolerant virtualization environment. Particular attention is paid to the nuances of the HP Virtual Connect FlexFabric and the configuration of the VMware vSphere 5.5 hypervisor when using Ethernet 10 Gbit and NFS as the Datastore.

Network interaction diagram


Iron

The HP BladeSystem c7000 blade basket with a pair of HP VC FlexFabric 10/24 modules.
HP BL460c Gen8 servers with an integrated HP FlexFabric 10Gb 536FLB network card.
Network switches "Cisco Nexus 5k".
Storage system "NetApp FAS8020".

Link Aggregation


Link aggregation is one of the main means of achieving high resiliency in a virtual environment, so you need to use this at all levels of traffic flow. In our case:
ESXiFlexfabricNetworkStorage
vSphere vSwitch in Port ID modeShared Uplink Set + Auto Mode (LACP)EtherChannel (LACP)lif

Jumbo frames


To get the most out of 10 Gbit, it is recommended that you enable Jumbo Frames at all levels:
VMkernel portvSwitchFlexfabricNetworkStorage
Value90009000921692169000

The MTU value in HP Virtual Connect is “wired” to 9216 and cannot be changed .

How to install Jumbo Frames on the ESXi .

Cisco Discovery Protocol


Unfortunately, CDP is not supported by the HP VC FlexFabric modules, so it makes no sense to enable its support on the hypervisor.
Excerpt from the documentation: “ Virtual Connect does not support CDP. VC does support the industry standard protocol called Link Layer Discovery Protocol (LLDP) by default. LLDP is functionally equivalent to CDP, although the two protocols are not compatible. "

Flow control


Regarding the use of the Flow Control mechanism, we decided to adhere to the NetApp recommendation and disable it at all “levels” (ESXi, FlexFabric, Network, Storage).

NetApp Recommendation: “Modern network equipment and protocols handle port congestion better than those in the past. NFS and iSCSI as implemented in ESXi use TCP. TCP has built-in congestion management, making Ethernet flow control unnecessary. Furthermore, Ethernet flow control can actually introduce performance issues on other servers when a slow receiver sends a pause frame to storage and stops all traffic coming out of that port until the slow receiver sends a resume frame. Although NetApp has previously recommended flow control set to send on ESXi hosts and NetApp storage controllers, the current recommendation is to disable flow control on ESXi, NetApp storage, and on the switches ports connected to ESXi and NetApp storage. "

In the configuration of “HP VC FlexFabric” modules, Flow Control is enabled by default only on downlink (“Auto” value), and on uplink disabled.

“ON” - all ports will advertise support for flow control (if autoneg), or flowcontrol turned on (non-autoneg).
“OFF” - all ports will advertise * no * support for flow control (if autoneg), or flowcontrol turned off (non-autoneg).
“Auto” - all uplink / stacking links will behave like “OFF”, and all server links behave like “ON”.


Shutdown command: #set advanced-networking FlowControl = off

Interesting articles on this topic:
Virtual Connect FlexFabric interconnect modules and Ethernet Flow Control
NETAPP vs VMWARE FLOW CONTROL DILEMMA
Configuring Flow Control on VMware ESXi and VMware ESX

Smart link


Smart Link mode must be enabled for all vNet (Ethernet Network) in the FlexFabric configuration. This is necessary for the hypervisor to correctly balance the virtual switch.

Document excerpt: “ HP's Virtual Connect supports a feature called Smart Link, a network enabled with Smart Link automatically drops link to the server ports if all uplink ports lose link. This feature is very similar to the Uplink Failure Detection (UFD) that is available on the HP GbE2, GbE2c and most ProCurve switches. I believe there is a similar feature available on Cisco switches called Link State Tracking. "

Virtual switches


It is recommended to separate virtual machine traffic from management traffic (vMotion, Management, NFS, FT). To increase the reliability of the virtual environment, we used a standard switch for management traffic, rather than a distributed one, although it has several advantages (for example, LACP support).

vSphere vSwitch Load Balancing


In such a configuration, it is recommended that virtual switches use the load balancing mode “Route based on the originating virtual port id”.

The “Route Based on IP Hash” mode ( NetApp recommendation ) cannot be used, since this requires combining its uplinks (virtual switch) into a trunk using the 802.3ad protocol, and HP VC FlexFabric does not provide such an opportunity for downlink to servers.

The remaining settings are Load Balancing Policy:
Network failure detection: Link status only.
Notify switches: Yes.
Failback: yes.

VMkernel Port


For each service (vMotion, Management, NFS, FT), it is recommended to create a separate VMkernel Port. Vmk for NFS traffic (Available Services remains empty) must be created on the same subnet as the NFS exports. In our case:
VMkernel portAvailable servicesNetwork labelVLAN IDMTU
vmk0vMotionvMotion19000
vmk1ManagementManagement29000
vmk2-Nfs39000

For vMotion vmkernel adapter, HP recommends setting the “failover order” mode to Active / Standby:
“As this Scenario is based on an Active / Active configuration, to ensure that ALL VMotion traffic between servers within the enclosure is contained to the same module, on each server edit the VMotion vSwitch properties and move one of the Adapters to Standby. This will ensure that ALL VMotion traffic will occur on the same Virtual Connect module. ”

NFS Advanced Settings


It is recommended that you change the default values ​​of some settings for working with NFS exports. For each host vSphere Advanced Settings set the following values:

NFS.HeartbeatFrequency = 12
NFS.HeartbeatTimeout = 5
NFS.HeartbeatMaxFailures = 10
Net.TcpipHeapSize = 32
Net.TcpipHeapMax = 512
NFS.MaxVolumes = 256
NFS.MaxQueueDepth = 64

recommendations described in the following docs:
Best Practices for running VMware vSphere on Network Attached Storage
Increasing the default value that defines the maximum number of NFS mounts on an ESXi / ESX host

Other nuances


  1. The network card of the blade server must be compatible with Virtual Connect modules. Compatibility can be checked at HP QuickSpecs.
  2. It is advisable to update the Firmware of Virtual Connect modules to the latest available version, but you should do this very carefully, checking the compatibility of FW blade servers and recycle bins.
  3. SFP transceivers do not come with Virtual Connect modules, so plan your physical switching scheme in advance and buy the right transceivers.
  4. Virtual Connect allows you to guarantee and set bandwidth limits for subnets (at the vNet / Ethernet Network / VLAN level). This should be used, for example, to limit the VLAN with ESXi Management traffic to 1 Gbit and guarantee the NFS VLAN from 4 Gbit to 10 Gbit.

Literature


VMware vSphere 5 on NetApp Clustered Data ONTAP
Best Practices for Running VMware vSphere on Network-Attached Storage (NAS)
HP Virtual Connect FlexFabric Cookbook
FC Cookbook for HP Virtual Connect
HP Virtual Connect for the Cisco Network Administrator
HP Virtual Connect Manager Command Line Interface
HP Forum "HP BladeSystem Virtual Connect"

Also popular now: