FMA4 instruction set hidden, but is working on AMD Zen processors
In an interesting find, it has been discovered that AMD processors based on ZEN architecture actually support the latest iteration of FMA, the FMA4-instruction set. The theory is that the FMA3 supplement instruction set would have been disabled for unknown reasons, however as it seems, it at the very least is partially working and active.
FMA is short for fused multiply-add and was added to the 2012 AMD FX series processors and have seen iteration changes leading up-tp FMA 3 and FMA4. FMA is a floating-point multiply-add operation performed in a single step, with a single rounding. It is the equivalent of the Intel AVX AVX instruction set, but more efficient and FMA4 should be really fast. Officially FMA4 is 33% faster than FMA3, however, it is not supported in the operating system, likely it was left disabled due to bugs or perhaps stability issues as hey, there is a primary reason for it to remain disabled.
As it now seems, Level1Techs tested this with Zen processors by running an adapted script that sends FMA4 instructions to the processor. The FMA4 task fired off at the processor surprisingly did not get refused and got executed successfully. It's an interesting find. Meanwhile, CPUID still states it is not supported/detected.
Senior Member
Posts: 2549
Joined: 2012-04-16
There's the full in depth video on the subject. I just love the guys over at Level1Techs.
They're very open and knowledgeable. Also their weekly news episodes makes me giggle like a girl.
Don Vito Corleone
Posts: 45899
Joined: 2000-02-22
In real-world code there really is no huge performance difference between FMA3 and FMA4, certainly not anything on the scale of 33%. Not sure where that number even comes from. 4 = 3 + 33%? :p
The comparisons in Wendells video are of AVX vs FMA4, not accounting for FMA3.
No, FMA4 has 33% higher throughput, because it processes four operands per instruction instead of three.
Senior Member
Posts: 6357
Joined: 2010-10-17
So Zen could potentially be even faster than it already is???
Senior Member
Posts: 1315
Joined: 2010-05-12
in very narrow and particular scenarios where you need tons of multiply add and someone writing software enabled this feature, yes
Senior Member
Posts: 845
Joined: 2015-05-19
In real-world code there really is no huge performance difference between FMA3 and FMA4, certainly not anything on the scale of 33%. Not sure where that number even comes from. 4 = 3 + 33%? :p
The comparisons in Wendells video are of AVX vs FMA4, not accounting for FMA3.