A Little About Cisco Networking Performance



    This year we published two articles related to comparing the functionality of Cisco routers and firewalls, as well as an overview of the separation of control and data plane in network equipment. In the comments to these articles, the issue of network equipment performance was raised. Namely, how the performance of Cisco routers of different generations depends on the inclusion of certain services on them. The topic of Cisco ASA firewall performance was also discussed. In this regard, there was a desire to look at these issues from a practical perspective, backing up certain points with numbers. I’ll tell you about what happened and what didn’t work out under the cut.

    By performance we mean the bandwidth of the device, measured in Mbps. The test bench was two laptops with iPerf3 installed. The test procedure is quite simple. iPerf3 started in packet transfer mode over TCP. 5 threads were used. I did not set myself the goal of determining the actual performance of the devices. This task requires more sophisticated equipment, as it is necessary to recreate the traffic patterns of a real network. Yes, and it would be necessary to measure the number of processed packets. Our main task was to assess the impact of using various services on the operation of the device, as well as comparing the results obtained on various devices. Thus, the selected toolkit at first glance seemed quite suitable for the tasks.

    Cisco Integrated Services Router (ISR) Generation 1 and 2

    To begin with, the two lower-end routers Cisco 871 and 881 were taken from the box. These are routers of different generations (871 is older than G1, and 881 is newer to G2), which are usually placed in small offices, for example, in remote branches of the company.

    The routers under study have similar features in terms of software and hardware architecture: the operating system is Cisco IOS, the “brain” of devices is the SoC MPC 8272 in 871 and the SoC MPC 8300 in 881.

    The following operating modes were checked for each router:
    • Routing using Cisco Express Forwarding (CEF) technology.
    • Routing without the use of optimizing technologies (Process Switching).
    • Routing (CEF) and applied access list (ACL) on one of the interfaces.
    • Routing (CEF) and ACLs on one of the interfaces with the log option.
    • Routing (CEF) and the included address translation service (NAT *).
    • Routing (CEF) and firewall services included (CBAC for 871 and ZPF for 881).
    • Routing (CEF), ITU and NAT.
    * During testing, both static and dynamic NAT were configured. Both options showed approximately the same effect on the performance of the device.

    Testing involved traffic routing (L3 switching) based on CEF and Process Switching. Both modes of operation on the studied devices are software package processing. The difference is exactly how the router decides where to send the packet. In the case of Process Switching, the router for each packet determines where to send it and generates / modifies the necessary headers as part of a separate process based on the routing table and L2 tables. So-called processor processing takes place. In the case of CEF, the router uses the FIB (prefix table) and Adjacency (neighbor table of data) tables prepared in a special way, which can significantly reduce the CPU load and increase the packet processing speed inside the device.

    For a more visual comparison, data on different devices are plotted on one graph (Figure 1).


    Note the main points:
    1. Since the interfaces on the devices are of the FastEthernet type, the maximum point-to-point throughput received through iPerf3 did not exceed 95 Mbps. At the same time, the CPU load for some modes of testing did not reach its peak values, which means that the figure of 95 Mbps for these routers is not the limit.
    2. Router 881 looks better, as it has more advanced hardware stuffing (primarily a general-purpose processor, then a CPU).
    3. As expected, we see a noticeable degradation in performance when services are turned on.
    4. When disabling CEF, we have a significant decrease in performance, since the router does not process each packet in the most optimal way.
    5. Enabling the log option in the ACL increases the load on the device (CPU utilization in this case is 99%), which negatively affects performance. This is due to the fact that the log option forces the router to process each packet that falls into the marked ACL line in Process Switching mode, which significantly increases the processor load.

    I propose to consider in more detail the CPU utilization in case of routing in CEF and Process Switching mode. CEF routing:

    Router881#sh processes cpu sorted
    CPU utilization for five seconds: 47%/42%; one minute: 40%; five minutes: 35%
     PID Runtime(ms)     Invoked      uSecs   5Sec   1Min   5Min TTY Process
      89      143724        8597      16717  1.51%  1.42%  1.43%   0 COLLECT STAT COU
       5       25792         638      40426  1.43%  0.29%  0.20%   0 Check heaps
      97       45204      180099        250  0.63%  0.57%  0.47%   0 Ethernet Msec Ti
      …
    

    The total CPU utilization is 47%. Of these, 42% is spent on processing interrupts caused by packet transmission. Packet transmission interrupts are of two types: interruption of receipt and interruption of packet transmission. Interruption of packet receipt is initiated by the interface processor when the packet is received through the router interface and it is ready for processing. Having received such an interrupt, the CPU stops processing the current processes, and begins to process the packet. Since CEF mode is enabled, the CPU decides where to send the packet based on the CEF tables (FIB and Adjacency) during the interrupt. Those. he does not need to send a packet for processor processing, which means that processor capacities are significantly saved. In this regard, only 5% of the CPU load is spent on processes in the router. Interruption of packet sending is transmitted to the CPU, when the packet was sent by the interface processor further along the communication channels. The CPU responds to this interrupt by updating the counters and freeing up the memory allocated to store the packet. In terms of contribution to the overall load of the device, this interruption is less interesting.

    Process Switching Routing:

    Router881#sh processes cpu sorted
    CPU utilization for five seconds: 99%/27%; one minute: 82%; five minutes: 48%
     PID Runtime(ms)     Invoked      uSecs   5Sec   1Min   5Min TTY Process
     129       98988        6013      16462 69.91% 55.95% 19.35%   0 IP Input
      89      145568        9248      15740  1.11%  1.11%  1.33%   0 COLLECT STAT COU
      97       45480      193804        234  0.23%  0.23%  0.35%   0 Ethernet Msec Ti
      …
    

    Now the total CPU utilization is 99%. And only 27% goes to interruptions. The remaining 72% is spent on process execution. The IP Input process takes up almost 70% of the CPU time. This process is responsible for the processor processing of packets, i.e. those packets that cannot be processed during interruption (for example, CEF is disabled or its tables do not have the necessary information for transmission, packets are addressed directly to the router or are broadcast traffic, etc.). And since CEF and Fast Switching are disabled in our example (I did not mention this method due to its irrelevance), after the packet was interrupted by the CPU, the CPU sends the packet for processing. The interrupt is completed and the CPU processes the packet directly as part of one of its processes.

    It will also be interesting to look at CPU utilization in the case of an ACL with the log option.

    Router881#sh processes cpu sorted
    CPU utilization for five seconds: 99%/37%; one minute: 80%; five minutes: 52%
     PID Runtime(ms)     Invoked      uSecs   5Sec   1Min   5Min TTY Process
     129      297672       15360      19379 60.83% 48.79% 29.67%   0 IP Input
      89      150496       10973      13715  0.72%  0.93%  1.22%   0 COLLECT STAT COU
      97       46036      232697        197  0.16%  0.17%  0.21%   0 Ethernet Msec Ti
      …
    

    The log option in the ACL forces the router to send each packet for processing, a sign of which, as in the previous example, is the high CPU utilization by the IP Input process.

    Cisco ASA 5500

    Let us now look at a device such as the Cisco ASA 5505 firewall. We can say that the ASA 5505 is a device similar to the Cisco 881 router in terms of positioning (for small offices and branches). These devices are from about the same price segment and have relatively similar hardware characteristics. The ASA 5505 uses an AMD Geode CPU with a clock frequency of 500 MHz. The most important difference is the operating system. The ASA 5505 uses ASA OS. We talked about the differences between routers and ASA in terms of functionality in a separate article. Now let's look at the performance of the ASA and the impact of various services on it.

    Since there are no pure routing on the ASA and no dedicated traffic routing optimization technologies, only the following operation modes were checked:
    • Firewall.
    • Firewall and enabled address translation service (NAT).
    • Firewall and ACL on one of the interfaces with the log option.

    For a more visual comparison, data on devices such as the ASA 5505 and router 881 are plotted on the same graph (Figure 2).


    The diagram shows that the throughput of the ASA 5505 in all operating modes is limited only by the technical aspects of the stand. Moreover, if we look at the CPU utilization, then for all the options it is almost identical:

    cbs-asa-vpn# sh proc cpu-usage non-zero sorted
    PC         Thread       5Sec     1Min     5Min   Process
    0x082a2849   0xa86e0994    31.1%    25.4%    13.4%   Dispatch Unit
    0x09bcebdb   0xa86d094c     6.4%     5.1%     5.9%   esw_stats
    0x08e68295   0xa86ced10     0.2%     0.1%     0.2%   ci/console
    0x0919171d   0xa86c9404     0.2%     0.2%     0.2%   IP SLA Mon Event Processor
    0x08f0591c   0xa86ce68c     0.1%     0.1%     0.1%   update_cpu_usage
    

    The following conclusions can be made:
    1. With relatively similar pricing and hardware settings, the ASA 5505 offers more performance than the 881 router.
    2. ASA performance is practically independent of services (in any case, it was not possible to identify it within the framework of this stand).
    3. The logging option in the ACL does not degrade performance. Due to this specific implementation of the routing function in the device.

    Thus, the ASA OS seems more balanced in terms of the impact of services on device performance.

    Cisco ISR 4000

    Go ahead. I suggest looking at how services impact the performance of Cisco ISR 4000 routers. This is the newest line of Cisco routers for small and medium installations. As we recall, these routers use the Cisco IOS XE operating system, which can operate in multi-threaded mode. In terms of hardware, these routers use multi-core processors.

    And so we get out of the box the youngest Cisco ISR 4000 - 4321. We activate the performance license on it to get the declared maximum performance of 100 Mbit / s, and begin to test. It is important to note that ISR 4000 routers always use a shaper that limits the maximum performance of the device. Two thresholds are used: the base (for 4321 is 50 Mbit / s) and advanced (for 4321 is 100 Mbit / s; it is activated by the performance license) of performance. This scheme of work is aimed at obtaining predicted values ​​of the device’s performance, not allowing it to “choke” on a large amount of traffic.

    First, we check the performance of pure routing in CEF mode without additional services. We start iPerf3 and we receive 95 Mbps. Expected. We are looking at the CPU load at this moment:

    cbs-rtr-4321#show proc cpu sorted                                                                
    CPU utilization for five seconds: 1%/0%; one minute: 1%; five minutes: 1%
     PID Runtime(ms)     Invoked      uSecs   5Sec   1Min   5Min TTY Process 
     658     8563421   409607083         20  0.47%  0.48%  0.48%   0 IP SLAs XOS Even 
      79     1123726    12408975         90  0.15%  0.06%  0.07%   0 IOSD ipc task    
       2      120745      326115        370  0.07%  0.00%  0.00%   0 Load Meter       
     667         420        1850        227  0.07%  0.03%  0.04%   2 SSH Process      
       …
    

    This is the result! CPU utilization 1%. Cool! But not everything is so perfect. Understanding this phenomenon comes after a more detailed study of the specifics of IOS XE.

    IOS XE is an operating system created on the basis of Linux, carefully finished and optimized by the vendor. The traditional Cisco IOS operating system runs as a separate Linux process (IOSd). The most interesting thing is that in IOS XE we have a separate main process that performs the functions of a data plane. Those. we have a clear separation of control and data plane at the program level. The process responsible for the control plane is called linux_iosd-imag. This is actually the usual iOS for us. The process responsible for the data plane is called qfp-ucode-utah. QFP, is it familiar? Just remember the QuantumFlow Processor network processor in ASR 1000 routers. Since IOS XE originally appeared on these routers, the process responsible for sending packets received the abbreviation qfp in its name. Further for ISR 4000, Apparently, they didn’t change anything, with the only difference being that the ISR 4000 QFP is virtual (runs on separate general-purpose processor cores). In addition to the voiced processes in IOS XE, there are other auxiliary processes.

    Thus, to see how much processor power is loaded, we analyze the output of the following IOS XE-specific commands:

    cbs-rtr-4321#show platform software status control-processor brief                             
    Load Average
     Slot  Status  1-Min  5-Min 15-Min
      RP0 Healthy   1.14   1.05   1.01
    Memory (kB)
     Slot  Status    Total     Used (Pct)     Free (Pct) Committed (Pct)
      RP0 Healthy  3950540  3888836 (98%)    61704 ( 2%)   2517892 (64%)
    CPU Utilization
     Slot  CPU   User System   Nice   Idle    IRQ   SIRQ IOwait
      RP0    0   5.28  10.57   0.00  79.84   4.19   0.09   0.00
             1   1.80   1.60   0.00  95.99   0.50   0.10   0.00
             2  41.00   2.70   0.00  56.30   0.00   0.00   0.00
             3  23.02  76.97   0.00   0.00   0.00   0.00   0.00
    

    Our router uses four cores (CPU 0, 1, 2, and 3). The team allows us to get information on loading each of them.

    Note

    You can see the hardware hardware of the router by displaying standard Linux information from the dmesg: more flash: / tracelogs / dmesg file.

    The processor used in the ISR 4321 router is:
    CPU0: Intel® Atom (TM) CPU C2558 @ 2.40GHz stepping 08


    The following command allows us to see the utilization of processor capacities by various processes:

    cbs-rtr-4321#show platform software process slot RP active monitor cycles 1 interval 1 top - 15:03:45 up 18 days, 21:00,  0 users,  load average: 1.13, 1.05, 1.01
    Tasks: 316 total,   2 running, 314 sleeping,   0 stopped,   0 zombie
    Cpu(s):  8.8%us, 22.3%sy,  0.0%ni, 68.8%id,  0.0%wa,  0.1%hi,  0.0%si,  0.0%st
    Mem:   3950540k total,  3889372k used,    61168k free,   199752k buffers
    Swap:        0k total,        0k used,        0k free,  1608388k cached
      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
     3111 root      20   0 1041m 589m 333m S  150 15.3  28747:48 qfp-ucode-utah     
     1915 root      20   0 1957m 182m 124m S   10  4.7   2216:08 fman_fp_image      
    22575 root      20   0  360m  74m  30m S    2  1.9 392:16.70 bsm                
    23130 root      20   0 46828  25m  11m S    2  0.7  23:08.43 cmand              
    26108 root      20   0 2378m 896m 374m S    2 23.2 881:05.01 linux_iosd-imag    
    27088 root      20   0  2204 1096  728 R    2  0.0   0:00.02 top                
        1 root      20   0  1820  520  440 S    0  0.0   0:10.97 init               
        2 root      20   0     0    0    0 S    0  0.0   0:00.00 kthreadd           
    …      
    

    In this example, IOS eats only 2%, and QFP - 150% (which is equivalent to disposing of one core completely and one half more).

    So what does the show processes cpu command show then? It displays the load of the virtual CPU that has been allocated to the IOSd process. For this process, one of the CPU cores is allocated on the ISR 4000 routers.

    From all this we can conclude that in IOS XE the architecture of packet processing has changed significantly compared to regular IOS. IOS no longer handles absolutely all packages. Only packets that require processor processing are processed by this process. But even in this case, IOS XE uses the newer Fastpath mechanism, which implements the transmission of packets for processing by a separate thread inside IOSd, rather than through interrupts. Interruptions in IOSd occur only when processing through Fastpath is not possible.

    Let's get back to our task. Check the following modes of operation:
    • Routing using CEF technology.
    • Routing and applied access list (ACL) on one of the interfaces.
    • Routing (CEF) and ACLs on one of the interfaces with the log option.
    • Routing (CEF) and NAT address translation service enabled.
    • Routing (CEF) and Firewall Services Included (ZPF).
    • Routing (CEF), ITU and NAT.

    It should be noted that you cannot disable CEF on 4321 (and on the entire ISR 4000 line). Now this is the basic routing technology.

    The test results are shown in Figure 3. For greater clarity, the bandwidth values ​​are plotted on one graph (and they are the same in all cases) and the CPU load by the QFP process. The IOSd process is not interesting due to the fact that in all modes the load of the virtual CPU inside IOSd is minimal - 1%.


    During testing, it was not possible to identify the dependence of the performance of the ISR 4321 router on the inclusion of services. There is a slight increase in CPU utilization, but very little. It is also worth noting that the inclusion of the log option in the ACL no longer leads to dramatic performance losses, since the packet is not sent to the processor.

    Summary

    On the example of several devices of different generations and types, we tried to consider how performance depends on the inclusion of various services. In general, the results obtained fit into previously known facts. We have not discovered America. Brief conclusions obtained as a result of testing can be formulated as follows:

    1. There is a significant degradation in the performance of ISR G1 and G2 routers when services are enabled.
    2. ASA performance is less affected by services. At a comparable price with a router, we get more performance.
    3. The impact of enabling services on ISR 4000 performance is minimal.

    Thanks for attention. I hope that some information from the article will help in working with Cisco equipment.

    Also popular now: