Invalid Ethernet

    In continuation of the previous article " Ethernet & FC ", I would like to give specific recommendations on optimizing the Ethernet network for working with NetApp FAS storage systems. Although, I believe, many of the things described here may be useful for other solutions.

    Invalid Ethernet

    For those who strongly doubt the “reliability” of Ethernet. Not that I want to convince me to completely switch from FC 8G to 10 GBE , but I want to remove a certain areola of mistrust and misunderstanding of the technology. Any technical specialist should always approach the issue rationally and with a “cold head”. At the same time, statements that “Ethernet is losing frames” can hardly be called “not biased approach” and “not subjective thinking”. I propose to consider where this firm opinion about Ethernet unreliability came from, in order to either debunk all doubts or confirm them with concrete justification.
    So it all started with the birth of the standard, when Ethernet was 10Mbit / s which used a shared coaxial cable environment. Moreover, the transmission of information was “half-duplex”, i.e. at one moment, time could be carried out by one node or the transmission or reception of information. The more network nodes in one such domain, the more collisions there were, exacerbating the situation by “half duplex”, frames were really lost here. Then Ethernet took a step further and began to use twisted pair, giving a backlog for the future, but stupid hubs combined all network nodes in one collision domain in the same way and the situation essentially did not change. Smart devices appeared, with the proud name “switches”, they didn’t just duplicate frames from one port to another, they climbed into the frame and remembered the addresses and ports from where the frames came and transmitted them only to the recipient port. And everything would be fine, but the collisions still remained in some form even in 100Mbit / s networks, despite the fact that the collision domain was split up and reduced only to the node with the switch port, which in the single-duplex mode “ran across” the collision when they tried to send each other their frame at the same time. What happened next - there was a "duplex" (10BASE-T, IEEE 802.3i), i.e. each node could simultaneously receive and transmit frames on different links: two pairs of RX and TX for the node and two for the switch. IN 1 What happened next - there was a "duplex" (10BASE-T, IEEE 802.3i), i.e. each node could simultaneously receive and transmit frames on different links: two pairs of RX and TX for the node and two for the switch. IN 1 What happened next - there was a "duplex" (10BASE-T, IEEE 802.3i), i.e. each node could simultaneously receive and transmit frames on different links: two pairs of RX and TX for the node and two for the switch. IN 1Half-duplex GBE no longer exists at all. What does it mean? Collisions disappeared forever ... They are no more. There are two childhood Ethernet diseases that are closely related to each other; this is that the switches could “forget” the frames when the buffer overflows, and this usually happened due to the “loopbacks” (Ethernet Loop). These problems were solved accordingly: 1) DCB add-on for the Ethernet protocol, also known as Lossless Ethernet, is a set of protocols allowing you, as in the case of FC, not to lose frames. 2) Just added more memory to the data center level switches . 3) A network of 10 GBE and Cisco in particular took a step further and proposed TRILL in its Nexus lineswitches for data centers . The TRILL protocol and FabricPath, which simply determines the purpose of the new hop count field in the Ethernet frame, by analogy with the time to live field in the IP packet to prevent loopbacks, as well as some other functions "borrowed" from IP, thus eliminating Ethernet from recent childhood illnesses.

    Jumbo frame

    In case of using NFS , iSCSI , CIFS protocols, it is recommended to include jumbo frame on switches and hosts whenever possible . NetApp storage currently supports MTU 9000, which is currently the maximum value for Ethernet 10GB. In this case, the jumbo frame should be included on the entire route of Ethernet frames: from source to destination. Unfortunately, not all switches and not all network adapters of hosts support the “maximum” MTU at the moment , so when I apply some HP blade chassis with servers and built-in 10GB switches support a maximum of 8000 MTU, for such cases, on the storage side, it is necessary to select the most appropriate MTU value . Since there is some confusion about what MTUs are , there are difficulties understanding which MTU value to configure. So for example, for normal operation of NetApp storage systems with the MTU 9000 value set on the Ethernet interface, it will “normally” work with switches whose MTU value is set to one of the values: 9000 (Catalyst 2970/2960/3750/3560 Series), 9198 (Catalyst 3850 ), 9216 (Cisco Nexus 3000/5000/7000/9000, Catalyst 6000/6500 / Cisco 7600 OSR Series), on others this value should generally be 9252. As a rule, setting MTUon the switch to the maximum allowable value (greater than or equal to 9000), everything will work. For clarification, I recommend reading the corresponding article Maximum Transmission Unit (MTU). Myths and reefs .

    Jumbo Frames in Cisco UCS

    Perform the configuration from the command line on each Fabric Interconnect:
    system jumbomtu 9216
    policy-map type network-qos jumbo
    class type network-qos class-default
    mtu 9216
    system qos
    service-policy type network-qos jumbo
    copy run start

    Or from the UCS Manager GUI
    In the UCS Manager settings when working with Ethernet, configure MTU in the tab “Lan> Lan Cloud> QoS System Class”, prescribe MTU to one selected class.

    Then we create a “QoS policy”.

    We create a vNIC template . We bind

    to the server’s network interface.


    The flowcontrol settings should correspond for both: storage ports and the switch ports connected to them. In other words, if the flowcontrol on the storage ports is set to none , then the flowcontrol on the switch must be set to off and vice versa. Another example: if the storage system sends a flowcontrol send , then the switch must be configured to receive them ( flowcontrol receive on ). Inconsistency of flowcontrol settings leads to a break in established protocol sessions, for example CIFS oriSCSI , communication will be present, but due to constant breaks in sessions it will work very slowly during increased load on the link, and during small loads the problem will not appear at all.
    • As a general rule, if possible, do not include flowcontrol , TR-3428 .
    • For 10GB networks it is highly discouraged to enable flowcontrol .
    • For 1GB networks, you can enable flowcontrol (as an exception to the rule): the storage sends the flow control, and the switch accepts - set the flowcontrol to send on the storage and on the switch to Desired (or send / tx off & receive / rx on ).
    • For 100 MB networks (as an exception to the rule), you can enable flowcontrol for receiving and transmitting on both: the storage and switch send and receive flow control commands.
    • Those who are interested in why such recommendations are here .
    • See also TR - 3802
    • Examples of storage and switch settings can be found in the relevant articles.

    Do not confuse “Normal” FlowControl with PFC (IEEE 802.1Qbb) for DCB (Lossless) Eternet.

    Spanning tree protocol

    If you use NetApp with “classic Ethernet” (that is, Ethernet that is “not of the“ Datacenter ”level, so to speak), it is highly recommended to enable RSTP , and configure the Ethernet ports to which end nodes ( storage and hosts) are connected with portfast enabled , TR-3749 . Ethernet networks of the “Datacenter” level do not need Spanning Tree at all; Cisco Nexus series switches with vPC technology can serve as an example of such equipment .

    Converged network

    Considering the “universality” of 10 GBE , when FCoE , NFS , CIFS , iSCSI can go through the same physics , along with the use of technologies such as vPC and LACP , as well as the ease of maintenance of Ethernet networks, it distinguishes the protocol and switches from FC thus providing the opportunity “Maneuver” and investment preservation in case of changing business needs.

    FC8 vs 10GBE: iSCSI, CIFS, NFS

    Internal tests of NetApp storage systems (for other storage vendors this situation may differ) FC 8G and 10 GBE iSCSI , CIFS and NFS show almost the same performance and latency , typical for OLTP and server and desktop virtualization, i.e. for loads with small blocks and random reading.
    I recommend that you familiarize yourself with the article describing the similarities, differences and prospects of Ethernet & FC .

    In the case when the customer’s infrastructure involves two switches, we can talk about the same complexity of configuration as a SANand Ethernet networks. But for many customers, the SAN network does not boil down to two SAN switches where “everyone sees everyone,” as a rule, the configuration does not end there, in this regard, Ethernet maintenance is much simpler. Typically, SAN customer networks are many switches with redundant links and links to remote sites, which is by no means trivial to maintain. And if something goes wrong, Wireshark 's traffic isn’t listening.

    Modern converged switches such as the Cisco Nexus 5500 are capable of switching both Ethernet and FC traffic, allowing greater flexibility in the future with a two-in-one solution.


    Also, don’t forget about the possibility of port aggregation using EtherChannel LACP . You also need to understand that aggregation does not magically combine Ethernet ports, but only distributes (balances) traffic between them, in other words, two aggregated 10 GBE ports do not always reach 20 GBE . It is worth noting here that depending on whether the storage is in a separate IP subnet from the hosts, you need to choose the right balancing method. In the case when the SHD is located on a separate subnet from the hosts, you cannot choose MAC balancing (Destination), since it will always be the same - MACgateway address. In the case when there are fewer hosts than the number of aggregated links on the storage system , balancing does not work optimally in view of the perfection and limitations of network load balancing algorithms. And vice versa: the more network nodes use the aggregated link and the more “correctly” the balancing algorithm is selected, the greater the maximum bandwidth of the aggregated link approaches the sum of the bandwidths of all links. For more information about LACP balancing, see the article “Link Aggregation and Balancing IP Traffic ”.
    Document TR-3749 describes a VMWare ESXi setup nuances with NetApp storage systems and Cisco switches.
    LACP setup example
    on NetApp 7-Mode
    vif create lacp  -b ip {Port list}

    on NetApp Clustered ONTAP
    ifgrp create -node  -ifgrp  -distr-func {mac | ip | sequential | port} -mode multimode_lacp 
    ifgrp add-port -node  -ifgrp  -port {Port 1}
    ifgrp add-port -node  -ifgrp  -port {Port 2}

    Please note that portfast (spanning-tree port type edge) must be configured before NetApp is connected!
    On a Cisco Catalyst switch:
    cat(config)#interface gi0/23
    cat(config)#description NetApp e0a Trunk
    cat(config)#switchport mode trunk
    cat(config)#switchport trunk allowed vlan 10,20,30
    cat(config)#switchport trunk native vlan 123
    cat(config)#flowcontrol receive on
    cat(config)#no cdp enable
    cat(config)#spanning-tree guard loop
    cat(config)#!portfast must be configured before netapp connection
    cat(config)#spanning-tree portfast
    cat(config)#int port-channel1
    cat(config-if)#description LACP multimode VIF for netapp1
    cat(config-if)#int gi0/23
    cat(config-if)#channel-protocol lacp
    cat(config-if)#channel-group 1 mode active

    On the Cisco Nexus 5000 Switch:
    n5k(config)#interface EthernetX/X/X
    n5k(config-if)#switchport mode trunk
    n5k(config-if)#switchport trunk allowed vlan XXX
    n5k(config-if)#spanning-tree port type edge
    n5k(config-if)#channel-group XX mode active
    n5k(config)#interface port-channelXX
    n5k(config-if)#switchport mode trunk
    n5k(config-if)#switchport trunk allowed vlan XX
    n5k(config-if)#!portfast must be configured before netapp connection
    n5k(config-if)#spanning-tree port type edge

    I am sure that over time I will have something to add to this article on network optimization, after a while, so look here from time to time.

    Please send comments on errors in the text and suggestions to the PM .

    Also popular now: