Network on a chip - mini-Internet inside the processor

    We have already come to terms with the fact that the processor clock speed has stopped and manufacturers have taken the path of parallelizing computing. However, the number of cores of a typical general-purpose processor, quickly defeating marks 2 and 4, stopped at around 8. Some even intended to bury Moore's law.

    This stagnation has an objective reason. If the difference between 2, 4 or 8 cores is rather quantitative, then the already 16-core processor is faced with the fundamental limitations of the traditional architecture. The fact is that over the past few decades, the bus served as the basis for communication between the individual IP-blocks of the chip. While there were few blocks, she managed, but when the nucleus began to multiply, this architecture exhausted itself. A bus is a common data transmission medium to which several processor units are connected. At each moment in time, one block can transmit data, and all the rest can receive. If several blocks need to be transmitted at the same time, a collision occurs, and therefore a delay. When the number of cores exceeds eight, the delays become unacceptably large, almost completely crossing out the advantages of parallel operation of several cores.

    The number of cores can be increased a little more by dividing the bus into several segments connected by bridges, but this is more like a “crutch” that does not scale well and does not solve the main problem. The real solution that allows you to combine hundreds of blocks on one chip is a well-known packet-switched network, or Network on Chip .

    The transition from the bus to the network is quite natural. This is how telecommunication networks developed: radio is a typical “bus”, telephone networks are channel switching using matrix switches, and Internet is packet switching. Computer peripherals also developed - the modern PCI Express bus is actually not a bus at all, but a network with a star topology. Processors are also developing - first direct connections between the blocks, then buses and matrix switches, and finally networks.

    In NoC architecture, each core or block of a processor is connected to a router through which it communicates with other blocks. Routers themselves are networked, through which data packets travel from one block to another, just like packets on a regular computer network. This greatly simplifies the topology of the microcircuit and removes limitations on scaling - unlike the bus, many blocks are able to communicate simultaneously without interfering with each other. Computer modeling and prototypes of multi-core processors show that with a large number of cores this architecture is superior to traditional in many respects.

    Naturally, directly transferring the logic and protocols of the Internet into the chip would be unreasonable and inefficient. Here are completely different technological limitations and tasks:

    1. Very stringent latency and power requirements. Switches must work with nanosecond delays and be very economical. The energy costs of transferring data between units make up a significant part of the total consumption of modern chips.
    2. Simplicity and minimalism. The switches on the chip should take up little space, which means they cannot have complex logic and a large buffer size.
    3. Parallel, not serial connection. At the physical level inside the chip, it is more advantageous to betray bits not sequentially on a single conductor, but on 32 or 64 parallel channels.


    NoC research is carried out by leading companies and universities in the world. So, in 2007, Intel developed an experimental processor with 80 cores and a performance of 1 teraflops with an energy consumption of only 62 watts. In 2010, the 48-core “ Single chip cloud computer ” was introduced .

    In April of this year, the work of a group of MIT scientists was published , which created a prototype 16-core processor, which applied specific optimization for NoC-systems - virtual bypassing and low-swing signaling. These technologies allowed us to approach the theoretical limits of bandwidth and latency and significantly reduce power consumption.

    How do they work? A normal router saves the received packet to the buffer, analyzes its header and decides where to send it next. Virtual bypassing allows you to transfer a packet almost without delay, due to the fact that the header is sent in advance, and the switch manages to make the necessary switching circuits by the time the packet body arrives. Thus, the packet goes non-stop, bypassing the buffer. Low-swing signaling is the reduction of the difference between the voltages 0 and 1 in the conductor, due to which it was possible to further reduce energy consumption. In total, these improvements increase throughput and efficiency by more than one and a half times.

    In addition to improving features such as power consumption and speed, NoC architecture provides another important advantage. It easily allows you to combine not only homogeneous cores, but generally any blocks on a single chip. As in computer networks, the physical and transport layers work the same for all types of data and protocols. Without any particular problems, you can replace one or several of the universal computing cores with any other IP block, for example, a graphic core, a specialized signal processor, or the controller of a device. And, just like in networks, you can implement Quality of Service support at the chip level, which can be useful for real-time and virtualization systems.

    NoC for combining processor cores still has an experimental status, however, for combining dissimilar blocks in systems based on a NoC chip, they have been developed and applied for quite some time. Solutions from companies like Sonics or Arteris are used in Samsung, Qualcomm, and even Intel chips. Perhaps, soon, network architecture will begin to supplant buses from the "holy of holies" microelectronics - multicore central processors. And then the number of cores will again begin to grow rapidly. So Moore’s law is too early to bury.


    Additional sources on the topic:

    1. Intel Presentation
    2. List of NoC Research Groups
    3. Tilera - NoC Processor Developer
    4. Comparison of NoC and bus architecture
    5. NoC Review Lecture



    Also popular now: