Updated: HWInfo Application beta Introduces Power Reporting Deviation Sensor (Cheats)

Published by

teaser

Motherboard manufacturers will always tweak anything processor related to become a bit faster compared to the competition, and well, just as a feature to offer the best performing motherboard. And as such, they can apply a trick. The HWInfo tool now introduces Power Reporting Deviation.



Power Reporting Deviation is a new feature available on AMD Ryzen CPUs that tells how much the CPU telemetry seen by the CPU differs from real world (expected) data. This value has a useful meaning only under full CPU load and values around 100% (95 - 105 %) mean the telemetry is working correctly. On systems with a higher deviation under full load this means the CPU thinks it's working at lower or higher power than expected for the given SKU, hence out of specification. This is usually caused by the mainboard vendor (often intentionally) providing wrong calibration data in BIOS (AGESA) to fool the CPU to run at a higher power than the limit for the SKU.

The Stilt (overclocker) wrote a piece on understanding all this, which we'll show below:

Ryzen CPUs for AM4 platform rely on external, motherboard sourced telemetry to determine their power consumption. The voltage, current and power telemetry is provided to the processor by the motherboard VRM controller through the AMD SVI2 interface. This information is consumed by the processors power management co-processor, that is responsible for adjusting the operating parameters of the CPU and ensuring, that neither the CPU SKU, platform or infrastructure specific limits are being violated.

The weakness of this method is, that the telemetry essentially uses an undefined scale for the current (and hence power) measurements. This means that the motherboard VRM controller will send an integer between 0 - 255 to the CPU, and based the reference value known by the co-processor firmwares, this integer is converted to a figure, that represents a physical current drawn by the CPU. Based on the accurately known current flow and the voltage, it is possible to calculate to CPU power draw in Watts (V * I).

The reference value mentioned earlier is generally different for each of the motherboard make and model, unless there are boards which have an identical power circuitry. Because of that, it is on the motherboard manufacturers responsibility to find the correct value for their motherboard design through the means of calibration, and then to declare it properly in AGESA, during the bios compile time. In case the motherboard design specific, correct value differs greatly from the declared value, there will be a bias in the power consumption seen by the CPU. In case the declared value is greater than the actual value, the power consumption seen by the CPU is greater than it actually is. Likewise, if the declared value would be an understatement... the CPU would think it consumes less power than it actually does.

Since at least two of the largest motherboard manufacturers, still insist on using this exploit to gain an advantage over their competitors despite being constantly asked and told not to, we thought it would be only fair to allow the consumers to see if their boards are doing something they're not supposed to do. The issue with using this exploit is, that it messes up the power management of the CPU and potentially also decreases its lifespan because it is running the CPU outside the spec, in some cases by a vast margin. Also, it can cause issues when this exploit goes undetected by a hardware reviewer, since both the performance and the sofware based power consumption figures will be affected by it.

For example, if we take a Ryzen 7 3700X CPU that has 65W TDP and 88W default power limit (PPT), and use it on a board which has declared only 60% of its actual telemetry reference current, we'll end up with effective power limit of ~ 147W (88 / 0.6) despite running at stock settings (i.e. without enabling manual overclocking or AMD PBO). While the 3700X SKU used in this example typically cannot even reach this kind of a power draw before running into the other limiters and limitations, the fact remains that the CPU is running far outside the spec without the user even acknowledging it. This exploit can also cause additional cost and work to the consumer, who starts wondering about the abnormally high CPU temperatures and starts troubleshooting the issue initially by remounting the cooling and usually, eventually by purchasing a better CPU cooler(s).

HWiNFO will display "Power Reporting Deviation" metric under the CPUs enhanced sensors. The displayed figure is a percentage, with 100.0% being the completely unbiased baseline. When the motherboard manufacturer has both properly calibrated and declared the reference value, the reported figure should be pretty close to 100% under a stable, near-full-load scenario. A ballpark for a threshold, where the readings become suspicious is around ±5%. So, if you see an average value that is significantly lower than ~ 95% there is most likely intentional biasing going on. Obviously, the figure can be greater than 100%, but for the obvious reasons it rarely is ;)

As stated before, this metric is only valid during a relatively stable near-full-load condition. That is due to the typical measurement accuracy of the VRM controller telemetry, and also due to the highly advanced and fast power management on Ryzen CPUs, that not only result in extremely low idle, but also in extremely rapidly changing power consumption. A suggested workload to get a stable and reproducable deviation metric is Cinebench R20 NT, with the HWiNFO sample rate set to less or equal to 1000ms.

As of now, outside of certain MSI motherboards, the biasing isn't end-user controllable. In case there is clear evidence of biasing taking place on certain motherboards or their bios versions, please contact the manufacturer and ask them to remove the telemetry biasing from the bios. The biasing can be implemented in different ways, it can be tied to a specific setting(s) (known as an "auto-rule") in the bios or be fixed in a certain bios version or in all available bios versions.

Here is an practical example recorded on MSI X570 Godlike motherboard, using the most recent 1.93 beta-bios version.
For this bios version MSI has declared 280A reference current, when the correct value that produces near 100% result (i.e. no deviation) and also a matching power draw compared to other boards (same CPU and workload) is 300A. This means that the board allows 7.14% (300/280) higher power draw for the CPU than AMD specifications state. Compared to the worst violators (up to 50%) this is minor infraction, so MSI deserves a benefit of a doubt whenever this is intentional or a honest error.

With the proper 300A setting, the average HWiNFO "CPU Power Reporting Deviation" during Cinebench R20 NT is 99.2%.
With this setting, the average CPU core frequency is 4027.4MHz, power consumption seen by the CPU 140.964W (of 142W limit) and peak CPU temperature of 73°C.

With 225A setting (75% of the actual), the average HWiNFO "Power Reporting Deviation" during Cinebench R20 NT is 75.3%.
With this setting, the average CPU core frequency is 4103.5MHz, power consumption seen by the CPU 125.241W (of 142W limit) and peak CPU temperature of 80°C.

With 150A setting (50% of the actual), the average HWiNFO "Power Reporting Deviation" during Cinebench R20 NT is 50.2%. With this setting, the average CPU core frequency is 4106.6MHz, power consumption seen by the CPU 91.553W (of 142W limit) and peak CPU temperature of 79°C. This setting is already limited by maximum voltage allowed by the silicon fitness (FIT), so there were pretty much no addition performance gains, or ill-effects for that matter to be had.

I'd like to stress that despite this exploit is essentially made possible by something AMD has included in the specification, the use of this exploit is not something AMD condones with, let alone promotes.
Instead they have rather actively put pressure on the motherboard manufacturers, who have been caught using this exploit.

In short: Some motherboard manufacturers intentionally declare an incorrect (too small) motherboard specific reference value in AGESA. Since AM4 Ryzen CPUs rely on telemetry sourced from the motherboard VRM to determine their power consumption, declaring an incorrect reference value will affect the power consumption seen by the CPU. For instance, if the motherboard manufacturer would declare 50% of the correct value, the CPU would think it consumes half the power than it actually does. In this case, the CPU would allow itself to consume twice the power of its set power limits, even when at stock. It allows the CPU to clock higher due to the effectively lifted power limits however, it also makes the CPU to run hotter and potentially negatively affects its life-span, same ways as overclocking does. The difference compared to overclocking or using AMD PBO, is that this is done completely clandestine and that in the past, there has been no way for most of the end-users to detect it, or react to it.

Update: AMD posted a response.

"We are aware of the reports claiming that select motherboards may be under-reporting certain power telemetry data that could alter the performance and/or behavior of AMD Ryzen processors under certain conditions. We are looking into the accuracy of these reports. 

"We want to be clear with our customers: AMD Ryzen processors contain a diverse array of internal safeguards that operate independently of external data sources. These safeguards enforce the safety and reliability of the processor during stock operation. Based on our initial assessment, we do not believe that altering external telemetry in the manner described by those public reports would have a material impact on the longevity or safety of a user's processor."

Updated: HWInfo Application beta Introduces Power Reporting Deviation Sensor (Cheats)


Share this content
Twitter Facebook Reddit WhatsApp Email Print