Why Intel processors consume more than expected: requirements for heat sink and turbo mode

Transfer

Recently, a community of fans of self-assembly PC permeated the topic of energy consumption. Intel's newest eight-core processors have a TDP of 95 watts, however, users watch how they consume 150-180 watts, which makes no sense at all. In this manual, we will explain to you why this is happening, and why it causes so many problems for review authors of iron.

What is TDP (Thermal Design Power, Heat Sink Requirements)

For each processor, Intel guarantees a certain operating frequency with a certain power, often having in mind a certain cooler. Most people equate TDP to maximum power consumption, given that in calculations the thermal power of the processor that needs to be dissipated is equal to the power it consumes. And usually TDP denotes the magnitude of this power.

But, strictly speaking, TDP refers to the cooler's power dissipation capabilities. TDP is the minimum cooler capability that guarantees the indicated efficiency. Some of the energy is dissipated through the socket and the motherboard, which means that the cooler rating may be lower than TDP, but in most discussions TDP and power consumption usually meant the same thing: how much power the processor consumes under load.

As part of the TDP system can be installed in the firmware. If the processor used TDP as the maximum power limit, then we would see how the same measurement program generates similar graphics for high-power processors with multiple cores.

In recent years, Intel has used this definition of TDP. For any given processor, Intel guaranteed an operating frequency (base frequency) for a specific power - TDP. This means that the processor type 65 W Core i7-8700, with the usual frequency of 3.2 GHz, and 4.7 GHz in turbo mode , is guaranteed to consume up to 65 W only when operating at a frequency of 3.2 GHz. Intel does not guarantee the effective operation of the above 3.2 GHz and 65 watts.

In addition to the baseline, Intel also uses turbo mode. Something like the Core i7-8700 can show 4.7 GHz in turbo mode, and consume much more power than a processor running 3.2 GHz. The turbo mode for all cores on a Core i7-8700 processor operates at 4.3 GHz - much more than the guaranteed 3.2 GHz. The situation is complicated when the turbo mode does not fall to the base frequency. That is, if the processor works with a constant excess of TDP, the 65 W cooler you bought (or the one that came with it) will become a bottleneck. If you need more performance, this cooler should be thrown out and get something better.

However, the manufacturer does not tell you this. If the cooling for the turbo mode is not enough, and the processor reaches the temperature ceiling, then most of the modern process will go into power limit mode, reducing the speed in order to remain within the specified power consumption. As a result, a fast processor does not reach the limits of its capabilities.

So TDP means nothing? Why did this become a problem just now?

Over the past decade, the method of using the term TDP has not changed, but the processors began to use their energy budget differently. The recent appearance of six- and eight-core consumer processors with 4 GHz frequencies means that the new processors with a large load exceed the stated TDP. In the past, we have seen how quad-core processors with a rated 95 W rating used only 50 watts, even under full load in turbo mode. And if we add the cores, and we don’t change the TDP designation on the package, then something should change.

Secret numbers that are not on the package

Inside each processor, Intel determines several energy levels based on capabilities and expected operating modes. However, all these energy levels and capabilities can be adjusted at the firmware level, with the result that OEMs decide how these processors will work in their system. As a result, the value of energy consumption by the processor in the system turns out to be a very vague indicator.

For simplicity, you can follow three important values. Intel calls them PL1 (energy level 1), PL2 (energy level 2) and T (Tau).

PL1 - effective uniform expected energy consumption in the long term. In essence, PL1 is usually defined as a TDP of a processor. That is, if TDP is 80 watts, then PL1 is 80 watts.

PL2 - short-term maximum processor power consumption. This value is higher than PL1, and the processor enters this state under load, which allows it to use turbo modes up to the maximum value PL2. This means that if Intel has identified several turbo modes on the processor, they will work only when PL2 reaches maximum power consumption. In PL1 mode, the turbo does not work.

Tau is a temporary variable. It determines how long the processor must remain in PL2 mode before rolling back to PL1. Tau does not depend on the power and temperature of the processor (it is expected that when the temperature limit is reached, another set of extremely low voltage and frequency values will be used, and the PL1 / PL2 system will stop working).

Here are the official definitions from Intel:

Let's look at the situation of a heavy load on the processor.

First he starts work in PL2 mode. If the load is single-threaded, we must reach the upper value of the turbo, which is indicated in the specification. Typically, the power consumption of a single core does not come close to the PL2 value of the entire chip. If we continue to load the cores, the processor will respond, reducing the frequency of the turbo mode in accordance with the core values determined by Intel. If the power consumption of the processor reaches the value of PL2, then its frequency is changed so as not to go beyond the scope of PL2.

When the system is under serious load for a long period of time, “Tau” seconds, the firmware should switch to PL1 as a new power limit. Turbo tables are no longer used - they only work with PL2 mode.

If the consumption goes beyond PL1, then the frequency and voltage change so that the energy consumption remains within these limits. That is, the processor completely reduces the frequency from the state of PL2 to the state of PL1 for the duration of the load. This means that the temperature of the processor should decrease, and this should increase the lifetime of the processor.

The PL1 mode works until the load disappears, and the kernel goes into a state of inactivity for a certain amount of time (usually up to 5 seconds). After this, the PL2 mode can be turned on again when another large load appears.

Here are some examples of quantities - Intel lists several variations in the specifications of various processors. For example, I took the Core i7-8700K. For this example, the following is true:

 PL1 = TDP = 95 Вт

 PL2 = TDP * 1.25 = 118.75 Вт

 Tau = 8 сек

In this case, the system should be able to accelerate to 119 watts for eight seconds, and then roll back again to 95 watts. This has been the case for several generations of Intel processors, and for the most part, it didn’t matter much, since the processor’s overall power consumption often turned out to be much lower than the PL1 value even under full load.

However, all the nonsense begins when motherboard manufacturers come into play, since PL1, PL2 and Tau can be configured in the firmware. For example, in the graph above, you can remove restrictions from PL2, and assign PL1 to 165 watts and 95 watts to PL1.

World of random numbers

Basically I will talk about consumer electronics. Often, PL1, PL2 and Tau are carefully monitored in cooling-limited conditions like laptops or small PCs. I am familiar with several powerful, and at the same time stylish PC variants, in which PL2 was also equated with TDP, so that the processor could accelerate a bit, but not to such an extent that the load of one or two cores went beyond the TDP.

However, in our reviews of the CPU, after the distribution of six-core processors, we often began to see numbers much larger than PL1 or PL2, and this consumption lasts for an arbitrarily long time, unless it goes beyond the limits of temperature. Why is this happening?

In any modern BIOS, especially among the major manufacturers of motherboards, there will be settings for limiting the power (short-term and long-term) and duration. In most cases, by default, the user does not know what value they are set to, since Auto will be written there, which is the code designation “we know which value to assign them, do not worry”. Manufacturers will write the values to memory and use them, but the user will only see Auto. As a result, you can assign PL2 to 4096 W and make Tau very large, for example, 65535, or -1 (infinity - depends on the BIOS version). This means that the CPU will operate in turbo mode without interruption until it exceeds the temperature limits.

Why do manufacturers do this? There may be many reasons for this, although the specific reasons may differ from particular manufacturers.

First, it means that the user can maintain turbo mode all the time, and each core will work in turbo mode every second. The results of performance measurements will reach the sky, in reviews or when a user is measured by indicators, everything looks great,

Secondly, products are developed for this. Intel often with each launch determines the specification of the default motherboard (they even had their own motherboards, which they sold at retail), with a certain number of power phases and with an expected lifetime. Manufacturers can obviously implement their options: more power phases, more powerful phases, a special power supply to improve efficiency, etc. If their board can support the turbo mode of all cores continuously, then why not?

Thirdly, manufacturers of more expensive board models know that enthusiasts will use improved cooling systems for them. If the processor consumes more than 160 watts, and the user has a decent cooling system, then the turbo mode on all cores will improve the product experience. Intel standards are defined for company recommended coolers.

So how is it right who to trust, what's the difference?

Intel sets standards for its parts. PL1, PL2, Tau, motherboard layout, firmware settings - for everything there are default values recommended by Intel. Some of them are public, for example, those that Intel indicates in documents, some are confidential (and Intel will not tell us about them, no matter how much we ask). However, these are still recommended values. And according to the results, motherboard manufacturers can do whatever they like. And they do.

As a result, for example, it becomes more difficult for me to test equipment because of this. Different users will want our settings to be:
1. Recommended by Intel,
2. As out of the box,
3. Turned to the maximum.

And, of course, the recommendations of Intel will give much lower indicators than out of the box, and the option “turned to the maximum” speaks for itself.

It is worth noting that so far in all tests in all reviews of the CPU, hardware was running on the “out of the box” settings, and not the “recommended by Intel” settings.

To give some context on the measurement values, we used a powerful CPU and
obtained the following results in a 25-30 second test with full load:

Anandtech	PL2	Tau	PL1	Result
Unlimited	4096W	999s	4096W	100%
Intel Spec, 165W	207W	8s	165W	98%
Constant 165W	165W	1s	165W	94%
Intel Spec, 95W	118W	8s	95W	84%
Constant 95W	95W	1s	95W	71%

Recently, it was noticed that some motherboard manufacturers are changing their strategy for PL1 / PL2 / Tau, and are cutting Tau to something sensible, like 30 seconds. When running speed measurements on such motherboards, users get less results than usual, although these results are closer to Intel specifications.

The fact is that when the motherboard has the value Auto, the manufacturer usually does not disclose the exact value of this value. As a result, it is very difficult to describe the operation of such equipment. And these values may vary depending on the installed processor.

We usually conduct tests with out-of-box settings, with the exception of memory, with which we use the values recommended by the manufacturer. We believe that this is the most honest way to tell readers about what speed they can count on when virtually no settings have changed. In reality, this usually means that PL2 is set to some very great value, and Tau - to very long. We are constantly confronted with the turbo mode, while the temperature remains within the established limits.

Today's situation, and what can we do with it

I have long wanted to write a similar article, at least since the launch of Kaby Lake. Most processors in consumer motherboards work with unlimited PL2, and this has been considered normal for years. And only as a result of testing the Core i9-9900K, we began to notice something strange. In our article last week on the new Xeon E, it says that our Supermicro motherboard literally follows the recommendations from Intel. It may seem obvious that a more commercial / server board will follow the specifications from Intel, but I personally saw this live for the first time. It is obvious that consumer fees for such specifications do not work, and did not work. I would say

So what do we do about it? I would say that Intel needs to place two power symbols on the boxes:

TDP peak for PL2
TDP long term for PL1.

In this way, Intel and others can explain peak consumption and base frequency.

If users want consumer motherboards to change, it will be more difficult to do. All manufacturers want to get ahead of each other, so we are faced with such things as the option Multi-Core Turbo, enabled by default. Manufacturers prefer the path of "unlimited PL2" because it allows them to crawl to the top of the speed charts. But in laptops with limited cooling capabilities, their PL1, PL2 and Tau versions are often set, and often they strictly correspond to these parameters.

The question is, how important are Intel's specifications for Intel desktop processors? If we need to follow these recommendations literally, maybe we will take one more step and use only stock coolers?

Tags: