
A Little About Cisco Networking Performance

This year we published two articles related to comparing the functionality of Cisco routers and firewalls, as well as an overview of the separation of control and data plane in network equipment. In the comments to these articles, the issue of network equipment performance was raised. Namely, how the performance of Cisco routers of different generations depends on the inclusion of certain services on them. The topic of Cisco ASA firewall performance was also discussed. In this regard, there was a desire to look at these issues from a practical perspective, backing up certain points with numbers. I’ll tell you about what happened and what didn’t work out under the cut.
By performance we mean the bandwidth of the device, measured in Mbps. The test bench was two laptops with iPerf3 installed. The test procedure is quite simple. iPerf3 started in packet transfer mode over TCP. 5 threads were used. I did not set myself the goal of determining the actual performance of the devices. This task requires more sophisticated equipment, as it is necessary to recreate the traffic patterns of a real network. Yes, and it would be necessary to measure the number of processed packets. Our main task was to assess the impact of using various services on the operation of the device, as well as comparing the results obtained on various devices. Thus, the selected toolkit at first glance seemed quite suitable for the tasks.
Cisco Integrated Services Router (ISR) Generation 1 and 2
To begin with, the two lower-end routers Cisco 871 and 881 were taken from the box. These are routers of different generations (871 is older than G1, and 881 is newer to G2), which are usually placed in small offices, for example, in remote branches of the company.
The routers under study have similar features in terms of software and hardware architecture: the operating system is Cisco IOS, the “brain” of devices is the SoC MPC 8272 in 871 and the SoC MPC 8300 in 881.
The following operating modes were checked for each router:
- Routing using Cisco Express Forwarding (CEF) technology.
- Routing without the use of optimizing technologies (Process Switching).
- Routing (CEF) and applied access list (ACL) on one of the interfaces.
- Routing (CEF) and ACLs on one of the interfaces with the log option.
- Routing (CEF) and the included address translation service (NAT *).
- Routing (CEF) and firewall services included (CBAC for 871 and ZPF for 881).
- Routing (CEF), ITU and NAT.
Testing involved traffic routing (L3 switching) based on CEF and Process Switching. Both modes of operation on the studied devices are software package processing. The difference is exactly how the router decides where to send the packet. In the case of Process Switching, the router for each packet determines where to send it and generates / modifies the necessary headers as part of a separate process based on the routing table and L2 tables. So-called processor processing takes place. In the case of CEF, the router uses the FIB (prefix table) and Adjacency (neighbor table of data) tables prepared in a special way, which can significantly reduce the CPU load and increase the packet processing speed inside the device.
For a more visual comparison, data on different devices are plotted on one graph (Figure 1).

Note the main points:
- Since the interfaces on the devices are of the FastEthernet type, the maximum point-to-point throughput received through iPerf3 did not exceed 95 Mbps. At the same time, the CPU load for some modes of testing did not reach its peak values, which means that the figure of 95 Mbps for these routers is not the limit.
- Router 881 looks better, as it has more advanced hardware stuffing (primarily a general-purpose processor, then a CPU).
- As expected, we see a noticeable degradation in performance when services are turned on.
- When disabling CEF, we have a significant decrease in performance, since the router does not process each packet in the most optimal way.
- Enabling the log option in the ACL increases the load on the device (CPU utilization in this case is 99%), which negatively affects performance. This is due to the fact that the log option forces the router to process each packet that falls into the marked ACL line in Process Switching mode, which significantly increases the processor load.
I propose to consider in more detail the CPU utilization in case of routing in CEF and Process Switching mode. CEF routing:
Router881#sh processes cpu sorted
CPU utilization for five seconds: 47%/42%; one minute: 40%; five minutes: 35%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
89 143724 8597 16717 1.51% 1.42% 1.43% 0 COLLECT STAT COU
5 25792 638 40426 1.43% 0.29% 0.20% 0 Check heaps
97 45204 180099 250 0.63% 0.57% 0.47% 0 Ethernet Msec Ti
…
The total CPU utilization is 47%. Of these, 42% is spent on processing interrupts caused by packet transmission. Packet transmission interrupts are of two types: interruption of receipt and interruption of packet transmission. Interruption of packet receipt is initiated by the interface processor when the packet is received through the router interface and it is ready for processing. Having received such an interrupt, the CPU stops processing the current processes, and begins to process the packet. Since CEF mode is enabled, the CPU decides where to send the packet based on the CEF tables (FIB and Adjacency) during the interrupt. Those. he does not need to send a packet for processor processing, which means that processor capacities are significantly saved. In this regard, only 5% of the CPU load is spent on processes in the router. Interruption of packet sending is transmitted to the CPU, when the packet was sent by the interface processor further along the communication channels. The CPU responds to this interrupt by updating the counters and freeing up the memory allocated to store the packet. In terms of contribution to the overall load of the device, this interruption is less interesting.
Process Switching Routing:
Router881#sh processes cpu sorted
CPU utilization for five seconds: 99%/27%; one minute: 82%; five minutes: 48%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
129 98988 6013 16462 69.91% 55.95% 19.35% 0 IP Input
89 145568 9248 15740 1.11% 1.11% 1.33% 0 COLLECT STAT COU
97 45480 193804 234 0.23% 0.23% 0.35% 0 Ethernet Msec Ti
…
Now the total CPU utilization is 99%. And only 27% goes to interruptions. The remaining 72% is spent on process execution. The IP Input process takes up almost 70% of the CPU time. This process is responsible for the processor processing of packets, i.e. those packets that cannot be processed during interruption (for example, CEF is disabled or its tables do not have the necessary information for transmission, packets are addressed directly to the router or are broadcast traffic, etc.). And since CEF and Fast Switching are disabled in our example (I did not mention this method due to its irrelevance), after the packet was interrupted by the CPU, the CPU sends the packet for processing. The interrupt is completed and the CPU processes the packet directly as part of one of its processes.
It will also be interesting to look at CPU utilization in the case of an ACL with the log option.
Router881#sh processes cpu sorted
CPU utilization for five seconds: 99%/37%; one minute: 80%; five minutes: 52%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
129 297672 15360 19379 60.83% 48.79% 29.67% 0 IP Input
89 150496 10973 13715 0.72% 0.93% 1.22% 0 COLLECT STAT COU
97 46036 232697 197 0.16% 0.17% 0.21% 0 Ethernet Msec Ti
…
The log option in the ACL forces the router to send each packet for processing, a sign of which, as in the previous example, is the high CPU utilization by the IP Input process.
Cisco ASA 5500
Let us now look at a device such as the Cisco ASA 5505 firewall. We can say that the ASA 5505 is a device similar to the Cisco 881 router in terms of positioning (for small offices and branches). These devices are from about the same price segment and have relatively similar hardware characteristics. The ASA 5505 uses an AMD Geode CPU with a clock frequency of 500 MHz. The most important difference is the operating system. The ASA 5505 uses ASA OS. We talked about the differences between routers and ASA in terms of functionality in a separate article. Now let's look at the performance of the ASA and the impact of various services on it.
Since there are no pure routing on the ASA and no dedicated traffic routing optimization technologies, only the following operation modes were checked:
- Firewall.
- Firewall and enabled address translation service (NAT).
- Firewall and ACL on one of the interfaces with the log option.
For a more visual comparison, data on devices such as the ASA 5505 and router 881 are plotted on the same graph (Figure 2).

The diagram shows that the throughput of the ASA 5505 in all operating modes is limited only by the technical aspects of the stand. Moreover, if we look at the CPU utilization, then for all the options it is almost identical:
cbs-asa-vpn# sh proc cpu-usage non-zero sorted
PC Thread 5Sec 1Min 5Min Process
0x082a2849 0xa86e0994 31.1% 25.4% 13.4% Dispatch Unit
0x09bcebdb 0xa86d094c 6.4% 5.1% 5.9% esw_stats
0x08e68295 0xa86ced10 0.2% 0.1% 0.2% ci/console
0x0919171d 0xa86c9404 0.2% 0.2% 0.2% IP SLA Mon Event Processor
0x08f0591c 0xa86ce68c 0.1% 0.1% 0.1% update_cpu_usage
The following conclusions can be made:
- With relatively similar pricing and hardware settings, the ASA 5505 offers more performance than the 881 router.
- ASA performance is practically independent of services (in any case, it was not possible to identify it within the framework of this stand).
- The logging option in the ACL does not degrade performance. Due to this specific implementation of the routing function in the device.
Thus, the ASA OS seems more balanced in terms of the impact of services on device performance.
Cisco ISR 4000
Go ahead. I suggest looking at how services impact the performance of Cisco ISR 4000 routers. This is the newest line of Cisco routers for small and medium installations. As we recall, these routers use the Cisco IOS XE operating system, which can operate in multi-threaded mode. In terms of hardware, these routers use multi-core processors.
And so we get out of the box the youngest Cisco ISR 4000 - 4321. We activate the performance license on it to get the declared maximum performance of 100 Mbit / s, and begin to test. It is important to note that ISR 4000 routers always use a shaper that limits the maximum performance of the device. Two thresholds are used: the base (for 4321 is 50 Mbit / s) and advanced (for 4321 is 100 Mbit / s; it is activated by the performance license) of performance. This scheme of work is aimed at obtaining predicted values of the device’s performance, not allowing it to “choke” on a large amount of traffic.
First, we check the performance of pure routing in CEF mode without additional services. We start iPerf3 and we receive 95 Mbps. Expected. We are looking at the CPU load at this moment:
cbs-rtr-4321#show proc cpu sorted
CPU utilization for five seconds: 1%/0%; one minute: 1%; five minutes: 1%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
658 8563421 409607083 20 0.47% 0.48% 0.48% 0 IP SLAs XOS Even
79 1123726 12408975 90 0.15% 0.06% 0.07% 0 IOSD ipc task
2 120745 326115 370 0.07% 0.00% 0.00% 0 Load Meter
667 420 1850 227 0.07% 0.03% 0.04% 2 SSH Process
…
This is the result! CPU utilization 1%. Cool! But not everything is so perfect. Understanding this phenomenon comes after a more detailed study of the specifics of IOS XE.
IOS XE is an operating system created on the basis of Linux, carefully finished and optimized by the vendor. The traditional Cisco IOS operating system runs as a separate Linux process (IOSd). The most interesting thing is that in IOS XE we have a separate main process that performs the functions of a data plane. Those. we have a clear separation of control and data plane at the program level. The process responsible for the control plane is called linux_iosd-imag. This is actually the usual iOS for us. The process responsible for the data plane is called qfp-ucode-utah. QFP, is it familiar? Just remember the QuantumFlow Processor network processor in ASR 1000 routers. Since IOS XE originally appeared on these routers, the process responsible for sending packets received the abbreviation qfp in its name. Further for ISR 4000, Apparently, they didn’t change anything, with the only difference being that the ISR 4000 QFP is virtual (runs on separate general-purpose processor cores). In addition to the voiced processes in IOS XE, there are other auxiliary processes.
Thus, to see how much processor power is loaded, we analyze the output of the following IOS XE-specific commands:
cbs-rtr-4321#show platform software status control-processor brief
Load Average
Slot Status 1-Min 5-Min 15-Min
RP0 Healthy 1.14 1.05 1.01
Memory (kB)
Slot Status Total Used (Pct) Free (Pct) Committed (Pct)
RP0 Healthy 3950540 3888836 (98%) 61704 ( 2%) 2517892 (64%)
CPU Utilization
Slot CPU User System Nice Idle IRQ SIRQ IOwait
RP0 0 5.28 10.57 0.00 79.84 4.19 0.09 0.00
1 1.80 1.60 0.00 95.99 0.50 0.10 0.00
2 41.00 2.70 0.00 56.30 0.00 0.00 0.00
3 23.02 76.97 0.00 0.00 0.00 0.00 0.00
Our router uses four cores (CPU 0, 1, 2, and 3). The team allows us to get information on loading each of them.
Note
You can see the hardware hardware of the router by displaying standard Linux information from the dmesg: more flash: / tracelogs / dmesg file.
The processor used in the ISR 4321 router is:
CPU0: Intel® Atom (TM) CPU C2558 @ 2.40GHz stepping 08
The following command allows us to see the utilization of processor capacities by various processes:
cbs-rtr-4321#show platform software process slot RP active monitor cycles 1 interval 1 top - 15:03:45 up 18 days, 21:00, 0 users, load average: 1.13, 1.05, 1.01
Tasks: 316 total, 2 running, 314 sleeping, 0 stopped, 0 zombie
Cpu(s): 8.8%us, 22.3%sy, 0.0%ni, 68.8%id, 0.0%wa, 0.1%hi, 0.0%si, 0.0%st
Mem: 3950540k total, 3889372k used, 61168k free, 199752k buffers
Swap: 0k total, 0k used, 0k free, 1608388k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3111 root 20 0 1041m 589m 333m S 150 15.3 28747:48 qfp-ucode-utah
1915 root 20 0 1957m 182m 124m S 10 4.7 2216:08 fman_fp_image
22575 root 20 0 360m 74m 30m S 2 1.9 392:16.70 bsm
23130 root 20 0 46828 25m 11m S 2 0.7 23:08.43 cmand
26108 root 20 0 2378m 896m 374m S 2 23.2 881:05.01 linux_iosd-imag
27088 root 20 0 2204 1096 728 R 2 0.0 0:00.02 top
1 root 20 0 1820 520 440 S 0 0.0 0:10.97 init
2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd
…
In this example, IOS eats only 2%, and QFP - 150% (which is equivalent to disposing of one core completely and one half more).
So what does the show processes cpu command show then? It displays the load of the virtual CPU that has been allocated to the IOSd process. For this process, one of the CPU cores is allocated on the ISR 4000 routers.
From all this we can conclude that in IOS XE the architecture of packet processing has changed significantly compared to regular IOS. IOS no longer handles absolutely all packages. Only packets that require processor processing are processed by this process. But even in this case, IOS XE uses the newer Fastpath mechanism, which implements the transmission of packets for processing by a separate thread inside IOSd, rather than through interrupts. Interruptions in IOSd occur only when processing through Fastpath is not possible.
Let's get back to our task. Check the following modes of operation:
- Routing using CEF technology.
- Routing and applied access list (ACL) on one of the interfaces.
- Routing (CEF) and ACLs on one of the interfaces with the log option.
- Routing (CEF) and NAT address translation service enabled.
- Routing (CEF) and Firewall Services Included (ZPF).
- Routing (CEF), ITU and NAT.
It should be noted that you cannot disable CEF on 4321 (and on the entire ISR 4000 line). Now this is the basic routing technology.
The test results are shown in Figure 3. For greater clarity, the bandwidth values are plotted on one graph (and they are the same in all cases) and the CPU load by the QFP process. The IOSd process is not interesting due to the fact that in all modes the load of the virtual CPU inside IOSd is minimal - 1%.

During testing, it was not possible to identify the dependence of the performance of the ISR 4321 router on the inclusion of services. There is a slight increase in CPU utilization, but very little. It is also worth noting that the inclusion of the log option in the ACL no longer leads to dramatic performance losses, since the packet is not sent to the processor.
Summary
On the example of several devices of different generations and types, we tried to consider how performance depends on the inclusion of various services. In general, the results obtained fit into previously known facts. We have not discovered America. Brief conclusions obtained as a result of testing can be formulated as follows:
- There is a significant degradation in the performance of ISR G1 and G2 routers when services are enabled.
- ASA performance is less affected by services. At a comparable price with a router, we get more performance.
- The impact of enabling services on ISR 4000 performance is minimal.
Thanks for attention. I hope that some information from the article will help in working with Cisco equipment.