Quick test: DirectX 12 API Overhead Benchmark results
As you guys know, DirectX 12 is going to greatly free up processor utilization, and thus your games can make more draw-calls with your processor. More efficient usage of CPUs with multiple cores is trivial in the design. We can now test a thing or two out thanks to the 3DMark update released hours ago.
At this very moment with DirectX 11, no matter how many cores your CPU has, the first core does the majority of the hard work for the API, the rest of the cores doing very little. With DirectX 12 (and Mantle), the data-sets and this load will be better distributed and utilized meaning a far more efficient design usage, this removes certain CPU bottlenecks. DX12 is said to be capable of using as many as eight cores and hey, AMD has very nicely priced 8-core processors right ? But how can we test to see what is up and coming ?
Well, even you yourself can check that out @home now. If you own a 3DMark 2013 license and have Windows 10 preview build 10041 (free to download and install as it is a public preview) and have a compatible graphics card with DX12 enabled you can now test and see how huge that overhead always has been with 3DMark. Just make sure you update Windows 10 towards build 10041 which has DX12 support. For the graphics card drivers, wait for the Windows update. The Windows driver that install through the update are DX12 class compatible with respective Nvidia and AMD graphics cards.
The test works by tasking the GPU to draw something on the screen, literally this is what a draw call is, a request from the game engine on the processor to draw and render an object, period. This instruction goes through the API, that API would be DX11, DX12, or AMD's Mantle.
The less efficient the API is in handling the what we explained are "draw calls" from the CPU to the GPU, the fewer objects can be drawn on your monitor. 3DMark now has this test, and will speed up draw calls and objects until the frame rate drops under 30 frames per second (fps) and that is its equilibrium. The result is what we can show as the difference in number of draw calls per API. Our test has been performed at 1920x1080 (default is 1280x720 I think it was), just to make things a little heavier and more representative. Check out these numbers man, specifically in the jump from DX 11 towards Mantle and then DirectX 12. Mantle actually is a notch faster compared to DX12, still ... what a difference.
Intel Core i7-5960X (16 threads) / Radeon R9 290X
Our test system is absed on the following hardware:
- Intel Core i7-5960X (16 threads)
- AMD Radeon R9 290X
- MSI X99S XPower AC
- 16GB GSKILL DDR4-2133 Quad channel
- Corsair Force SSD
The difference in increase draw calls is going to be extraordinary positive. Obviously the results are relative to actual complex rendered frames and frame rates. But any old and or new processor will be utilized so much better. In the end the Windows 10 / DX12 combo is going to make a difference alright, yeah we are excited as man. Scene complexity with many more objects is going to rock hard. I mean overall we are looking at a 15 to 20x draw call increase on any mainstream to enthusiast class processor and what that can do with scene complexity will be very impressive. Also this means very good news for AMD and its APUs and FX processors as well as the entry level Intel CPU SKUs.
Yeah DirectX 12 could be a game-changer (literally) once widely adopted by the software houses, as such we feel this release is going to be EPIC. Go try it out yourself and let us know your results in the forums. You can download the updated 3DMark right here.
Senior Member
Posts: 989
Joined: 2010-08-24
The FPU has nothing to do with whether a "core" is really a "core". The ALU is the processor core. The 8080 and 8086 were nothing but ALUs.
FryRender relies on multimedia instructions.
To test "raw performance", you need a benchmark limited strictly to x86 and 86x64 (AMD64/EM64T) instructions.
Sure, they are cores but lacking in one major chapter. By usage, our PC processors have grown to be as general-purpose as possible. AMD weirdly broke this rule with their octa-cores. That being said, the term 'module', like AMD called it aswell, is more appropriate.
It doesn't really matter that x years ago CPUs didn't have an FPU. Technology has advanced and we are constantly referring to the present, not the past.
About raw power, instead of comparing a dozen results it's just easier to say that the only places where AMD's octa cores are stronger are hashing and zlib. Out of many other tests. AMD have generally been good at these two tests, so I believe it's safe to assume they have certain optimizations in place.
It's indeed difficult to measure raw performance, since there are a lot of areas which have the potential to skew the results. And I admit I don't have any benchmark for this (maybe I should write a basic one).
Thats why i said depending on workload as the 4 extra cores will be unused typically but can now be used in properly threaded games where Intels is already optimized for the typical 4 main threads used in games as such cores are not idle.
If you look here at thief with Mantle the minimum frame rates have increased by 99% so almost double the performance. If direct X 12 can do the same well who knows, and although there is a increase on the FX4100 also it not as much as the FX8350 most likely due the amount of threads the game is using. Either way less cores idle means more can work.
Also comparing a 4770K to a 8350 is hardly fair since the 4770K is double the cost, might as well compare a i3 to i7, as such it's target is the core i7 3770k, for the 4770k it's the 9590 which is also quite a bit cheaper compared to the 4770k.
http://www.*******.com/testing-amds-mantle-battlefield-4-thief-and-pvz-garden-warfare/5/
Also like i was saying depending on the workload whether it is FP or ALU which is why it's muticore performance is all over the map but if DX12 acts like mantle it will allow it to flex it's mussels quite a bit and make them more competitive to Intel's offerings. Atm data crunching and encoding see's massive improvements while others it's only small.
http://www.anandtech.com/show/6396/the-vishera-review-amd-fx8350-fx8320-fx6300-and-fx4300-tested/4
I understand now, thanks for the clarification. I mostly agree.
About the comparison, I am just taking the higher ends of both. And I exclude the i7x because of the stupid price point and the FX-9590 because of its ridiculousness.
Senior Member
Posts: 11808
Joined: 2012-07-20
Debate here about DX12 on intel vs AMD CPU is really about compiler used. There are benchmarks (read, types of instructions used) where FX-8350 wins over i5-2600k or higher.
If DX12 was on linux and people could compile it with optimizations for their CPU, then we may see different results.
But here, m$ as usual delivers one final library, compiled in way they want. If they are nice, they will have it split in 2 and one code path would be compiled with AMD'd strengths in mind, other with intel's.
I wonder how many system libraries/part of them would be able to execute via OpenCL kernels on APUs.
Because when we take workloads with full optimizations to architecture of given CPU/APU, then iGP in a10-7800 APU even without use of CPU part beats 6core i7s.
When we look at raw performance, raw performance/watt and raw performance/$ then APUs are today clear winner.
We are only not leveraging that power. I can imagine supercomputer made of APUs, will be cheaper and do more while eating less power.