Cisco revealed the features of the 400-gigabit NPU

    The exhibition Hot Chips, held in August this year, the chief engineer of the company Cisco Jaime Markevich (Jamie Markevitch) spoke about the network processor features c capacity of 400 Gb / s, which is currently being supplied to clients. / Flickr / Timothy Lorens / CC The chip is made according to the 22-nanometer process technology and has 672 cores, each of which processes up to four threads. The Network Processor (NPU) contains 9.2 billion transistors and 353 MB of SRAM. SRAM acts as the L0 cache, which stores instructions and data for each thread. There is also an L1 cache for a cluster of 16 cores.






    The NPU has 42 core clusters that are connected to the L2 instruction cache through the L2 command cache. It also integrates caches of different levels, storage of data packets, accelerators, built-in and dynamic memory into a single "network". This network operates at a frequency of 1 GHz and has a bandwidth of more than 9 Tb / s.


    The block diagram of the

    Cisco chip did not tell about the instruction set that is used in the NPU. However, experts made the assumption that this is a custom set designed specifically for working with a network, and not ARM, MIPS, Power, or X86.

    NPU core streams provide data packet processing throughout its “life” in the chip. This eliminates idle or "juggling" packets between the cores. Therefore, 2688 packets can be processed simultaneously. Packets are stored off-chip in DRAM, but processed in real time in SRAM. Moreover, accelerators can access a DRAM copy, regardless of the cores that work with the SRAM original.

    Since different packages require different characteristics, all cores vary in performance to ensure maximum efficiency. At the same time, the Cisco NPU supports the usual programming methods - C or assembler.

    The network processor processes packets at 800 Gb / s, or 400 Gb / s in full duplex mode. In turn, the throughput of the SERDES interface is 6.5 Tb / s. Most connections are used to connect DRAM and TCAM - the latter stores access lists (ACLs). It is also used to buffer packets, so it is sometimes not enough - then part of the data is stored in DRAM.

    Most of the NPU logic runs at 760 MHz or 1 GHz. MAC interfaces support ports at speeds from 10 to 100 Gb / s.

    The network processor is equipped with an integrated traffic manager, which manages 256 thousand requests at the same time and can withstand a load of half a trillion objects. Accelerators take care of processing IPv4 and IPv6 prefixes, compressing and hashing IP ranges, delivering packets, and collecting statistics.

    External DRAM has 28 SERDES lines that run at 12.5 Gb / s. SERDES uses a proprietary serial protocol for accessing memory - it is capable of conducting up to a billion random accesses per second and supports data transfer at speeds up to 300 Gb / s.

    The logic is connected to DRAM through a parallel I / O interface - it has a maximum speed of 1250 Mb / s. Interestingly, only the processor is implemented on the 22-nanometer process technology. DRAM was performed using the 30 nm process technology, while SERDES and BIST were performed at 28 nm.

    “We determined what operations are usually carried out on such devices, and optimized the chip to work with random operations at high speed. It can be used as a buffer in which the number of readings will be equal to the number of records, as well as to search for data in databases when the number of updates is not so large, ”said Jamie Markevitch, Cisco chief engineer.

    The demonstration of the "insides" of the network processor is not a unique phenomenon, but a rare one. Manufacturers usually do not disclose such information, although exceptions do occur. In January, Barefoot Networks talked about the features of the Tofino chip, Innovium in March - about Teralynx and Mellanox Technologies in July - about Spectrum-2 .

    About Hot Chips

    Hot Chips is a symposium on high-performance processors. It was first held back in 1989. This year, in addition to Cisco, the event was attended by many large manufacturers. In particular, Microsoft presented their developments in the field of augmented reality and talked about the processor for the Xbox One X Scorpio. The speech of the Chinese company Baidu was dedicated to augmented reality, and a Google representative spoke about optimizing iron for neural networks.

    PS What else are we writing in our blog:


    Also popular now: