Life in the era of "dark" silicon. Part 3



    Other parts: Part 1 . Part 2 .

    This post is a continuation of the story "Life in the era of" dark "silicon . " In the previous part, the story was about using universal logic in the dark regions of silicon. This time, consider using specialized logic.

    “The Specialized Horseman” or the use of specialized solutions.


    “We will use all of that dark silicon
    area to build specialized
    cores, each of them tuned for
    the task at hand (10-100x more
    energy efficient), and only turn
    on the ones we need ...”


    As more and more microprocessor transistors become “dark”, the area they occupy becomes an exponentially cheaper resource in terms of heat dissipation and power consumption. One of the possible ways to use this area to increase energy efficiency through parallelization has been described previously. However, this approach has several limitations. Even under ideal conditions, only a 2–2.5-fold decrease in energy consumption is possible, with an increase in the occupied area by 2–3 times. In addition, not all nodes can be parallelized in principle, and not in any program you can find data parallelism ...
    One of the approaches that allow more efficient use of space in exchange for energy efficiency is the use of dark silicon for the implementation of specialized units (coprocessors), each of which on a specific task is either much faster or much energy efficient (about 100-1000 times) than general-purpose processors [1]. Well, the implementation of the necessary actions can be distributed between coprocessors and general purpose cores in the most preferred way. At the same time, currently unused coprocessor units can be completely turned off to save energy.
    Prospects for the full use of specialized nodes in the future are visible to the naked eye: specialized accelerators for tasks such as processing, graphics, computer vision, video encoding and others are already widespread today. These accelerators can improve orders of magnitude productivity and energy efficiency, especially for high-parallel computing.
    Researchers [2] have extrapolated this trend and express their expectations that in the near future we will see systems that for the most part consist of specialized units, and not of general purpose nuclei. In the literature, such systems are called Coprocessor Dominated Architectures, or CoDAs.
    If you look at the Intel Medfield microprocessor circuit shown below, you can see that this “near future” is much closer than it seems :). This is the same processor that is used in Intel Mint - the first smartphone based on x86 architecture. And, as you can see, in addition to the processor core itself, the crystal includes many different specialized blocks.


    Intel Medfield Platform

    However, the increase in the use of specialized nodes to deal with the problem of dark silicon leads to the fact that developers (and not only them) are faced with many problems, collectively called the "crisis of the Tower of Babel"(tower-of-babel problem). This is a reference to the modern interpretation of the biblical story of the “Babel crowding”, when, due to the mixing of languages, people stopped understanding each other and could not continue building. Due to the use of accelerators, our idea of ​​general-purpose computing is becoming more fragmented and the traditionally clear boundaries between software developers, software, and hardware, which ultimately performs calculations, are blurring more and more. Here are some examples.
    Already, we can see how specialized languages, such as CUDA, which are actually monopolized by one company, are designed for a specific hardware and cannot even be transferred to similar architectures (AMD). (CUDA has alternatives, but it's not about that)
    There are problems of over-specialization of accelerators, which make them inapplicable even for those closely related to their main purpose. For example, there are cases when double calculations performed for scientific purposes give incorrect results on GPUs whose Floating-Point Units specialize in graphic tasks.
    There are also known problems with the implementation of developments due to excessive efforts to program a heterogeneous hardware. For example, the Sony Playstation 3’s popularity is slowly growing due to the difficulty of porting games and using the programming capabilities of the Cell processor architecture.
    And finally, specialized hardware nodes are at risk of obsolescence, because standards are sometimes revised (for example, updating the JPEG standard), and changing their hardware implementation other than replacing the device will fail.

    Isolate people from system complexity . All of the factors listed above indicate a potentially exponential increase in the effort required by a person to develop CoDAs and to program them. The fight against the crisis of the Tower of Babel requires the emergence of new approaches to the means by which specialization is expressed and how it is used in future processing systems. We need new scalable architectural solutions that universally use specialized nodes to minimize power consumption and maximum performance.

    Overcoming the restrictions imposed on specialization by Amdal's law . Amdahl's law illustrates the limitation of the growth in productivity of a computer system with an increase in the number of computers. Its interpretation in relation to specialization and energy consumption means that if, for example, only half of the calculations performed can be transferred to accelerators, then energy consumption can be reduced by no more than two times (due to accelerators). This serves as an additional obstacle to specialization and makes us look for approaches that would allow us to save energy on most calculations, not only regular, parallel and predictable, but also irregular.
    Now research is being conducted in the field of automated generation of accelerators from fragments of program code. The goal is to detect frequently used, slow or energy-consuming sections of the program - “hot spots”. And then, to synthesize a description of a specialized kernel that performs the same actions, but much faster or spending less energy. Such specialized kernels are called conservation cores or c-cores .



    An example of this approach to building CoDA systems aimed at both regular and irregular calculations is the UCSD GreenDroid processor [3]. The approach is based on the detection of "hot spots" in the Android mobile environment and the use of hundreds of conservation cores, to which the compiler transfers the execution of "hot" sections of code. This approach allows you to achieve 8-10 times the gain in energy efficiency without additional effort on the part of the programmer. (Although, of course, this topic deserves a separate post :))


    The conservation cores life cycle in GreenDroid

    Unlike NTV processors, with this approach there is no need to look for additional concurrency to cover performance losses in serial execution. As a result, c-cores will likely be used in a wider range of tasks, including sequential tasks. However, for highly parallel loads, NTV processors may have an advantage.

    This concludes the story about what is “dark” silicon and what are the main approaches to its use. Despite the fact that silicon is getting darker with each generation of Moore's law, for researchers in this field, the future looks bright and exciting. Over time, dark silicon will change the entire computing stack. And these changes will bring many opportunities for further research and improvement.
    * And there is also such a “non-classical” theme related to “dark” silicon, energy and temperature issues, as system-level optimization. If this topic is of interest, there will be a related continuation.
    UPD: here comes the sequel .

    Sources


    1. Venkatesh, Sampson, Goulding, Garcia, Bryksin, Lugo-Martinez, S. Swanson, and MB Taylor. “Conservation cores: Reducing the energy of mature computations.” In ASPLOS, 2010.
    2. N. Hardavellas, M. Ferdman, B. Falsa_, and A. Ailamaki. “Toward dark silicon in servers." IEEE Micro, 2011.
    3. N. Goulding-Hotta et al “The GreenDroid mobile application processor: An architecture for silicon's dark future." Micro, IEEE, March 2011.

    Also popular now: