Hitman 2016: PC graphics performance benchmark review

Denial

2016-03-16 16:20

By the way, can someone explain why suddenly Nvidia´s cards are being spanked silly by AMD???

Combination of memory limitations and GCN previously being underutilized. At least in my opinion that's what it is. You go back and look at older matchups, for example the 680 vs the 7970. The 680 should have never been able to compete with the 7970 hardware wise. The 680 is a 2.5TFlop 2GB card being matched against a 7970's 3.7TFlops and 3GB's and yet in most benchmarks, the 680 won. Why? In my opinion it's because GCN was a completely new architecture for AMD and Kepler was essentially just a refined Fermi. The first Hitman was actually kind of an earlier precursor to what we are seeing happening now. Most people wrote it off as an "AMD Optimized" game when the 7970 came out almost 30% faster than a 680 but nearly every game being launched today is technically an "AMD Optimized" game, because nearly every game/engine is being ported from/to consoles. As time went on, developers became more familiar with GCN's strengths from consoles and played into them. Why use geometry heavy tools in a modern engine when AMD's architecture is bad at it? As developers cater their engine design/effects/etc towards AMD's strengths it basically takes away from Nvidia's. For the most part if you look at modern games being benchmarked, the list follows identically to their rated tflop output. There are a couple exceptions. Fury X being one of them. It's nearly 2Tflops more than a 980Ti but it's clearly being bottlenecked by something. I think it's memory performance in terms of ROP count. Looking at the Fury X's true bandwidth, it doesn't come close to being fully utilized 512GB/s. Fox made a post a while back about it being memory bottlenecked. Another exception is 2GB cards, different variants of them. Look at the 950 vs the 770. 770 is 3.2tflops while the 950 is like 1.57 iirc, yet the 950 wins in various titles. Why? I think it's the available memory. Both are 2GB cards, technically, but Maxwell has advanced memory compression over Kepler. Roughly about 25-30% worth. So technically it's a 2.6GB card, not a 2.0GB one. Cards like the 380x are the same thing -- GCN 1.2 gains nearly 40% more memory via compression, so potentially the 380 can store 2.8GB. This is my theory anyway. I'm having trouble finding useful 2 vs 4GB comparisons. Shadows of Mordor is one of the better ones, the 680's performance completely tanks when it hits the memory cap. It's almost 1/3 a 7970. The Division benchmarks also kind of show this, athough I don't have a direct comparison. Hilberts 4GB 370 outperforms a 770 in his review. Yet over on techspot, the 770 is almost 25% faster than a 370 (2GB). The problem is techspots review is on high and hilberts is on ultra, plus other differences in the testing system and whatnot. Idk, I get that occam's razor is easy and all, but I just don't accept the fact that Nvidia is continuously, purposely downgrading their cards in the face of public outcry. I think it's far more complex then that and I think the real answer is probably far more interesting as well. Mostly because AMD had an extremely long term strategy and getting the console market and Mantle, were both huge parts of it.

#5246149

PrMinisterGR

2016-03-16 16:32

Well, that was expected from first crop title on a new API. I have a sense that async compute in DX12 will be abused for the same nefarious reasons, as tessellation was abused with DX11, but this time on the opposite side of the fence.

This is nothing like tessellation. This is an integral part of both DX12 and Vulkan. NVIDIA cards don't lose performance if Async is used, AMD cards do when GameWorks games crank up tessellation.

Nice review HH, thanks for including Tahiti card. 280X today is fast as original Titan/GTX780. That's for admiration. Just like in Division, 950 is faster than 770. The trend continues.

Does the 4GB version get the same framerate as the 2?

The 380 has delta memory compression though, which AMD's own estimate is a 40% increase. I'd actually be curious to see a 770 (4GB) vs 950 vs 770 (2GB) benchmark.

Unless I am completely wrong Delta compression is for saving bandwidth, not for compressing the data within memory itself. I'm sure that either the Tonga cards, or the Maxwell 2GB cards don't compress within memory, which would make the 2GB argument invalid, since the 770 doesn't have a memory bandwidth problem.

GCN was underutilized. 280x is a 4.1tflop 3GB card, the 770 is a 3.2tflop 2GB card, they should have never been equal in the first place. I prefer to say GCN aged well, but I'm a glass is half full kind of guy.

That's my take on it too. Everybody except NVIDIA is working on improving gaming performance on GCN because of the consoles. By keeping the main architecture stable and gearing all engines for it, you get this. I believe that Maxwell will have the same fate, for the exact same reasons. It already has in a way, when Hawaii/GCN cards that were almost ridiculed when it launched are now considered better options than all Maxwell equivalents, except (maybe) the 980Ti.

#5246160

Denial

2016-03-16 16:54

Unless I am completely wrong Delta compression is for saving bandwidth, not for compressing the data within memory itself. I'm sure that either the Tonga cards, or the Maxwell 2GB cards don't compress within memory, which would make the 2GB argument invalid, since the 770 doesn't have a memory bandwidth problem.

To reduce DRAM bandwidth demands, NVIDIA GPUs make use of lossless compression techniques as data is written out to memory. The bandwidth savings from this compression is realized a second time when clients such as the Texture Unit later read the data. As illustrated in the preceding figure, our compression engine has multiple layers of compression algorithms. Any block going out to memory will first be examined to see if 4x2 pixel regions within the block are constant, in which case the data will be compressed 8:1 (i.e., from 256B to 32B of data, for 32b color). If that fails, but 2x2 pixel regions are constant, we will compress the data 4:1.

The data is stored as compressed and is only uncompressed when it hits the texture units. How else would you save bandwidth anyway? It's not like you can compress and uncompress it once it's in memory, not without sending additional traffic over the bus again. Think of like any other version of compression, it's no different. I compress a 1GB file to 500megs save it to google drive in a second. It halved the bandwidth per second, it also halved the space taken up on drive itself. When I want it also halves the bandwidth coming down. I then uncompress it. In the case of google cloud, you can uncompress it while it sits in google's servers on a video card though it can only be uncompressed on the GPU.

#5246161

RzrTrek

2016-03-16 16:54

Any word on the new drm?

#5246170

Ieldra

2016-03-16 17:10

It already has in a way, when Hawaii/GCN cards that were almost ridiculed when it launched are now considered better options than all Maxwell equivalents, except (maybe) the 980Ti.

Asynchronous compute is an optional DX12 feature intended to improve performance. If enabling it is to the detriment of performance, it should be turned off. If enabling it has no effect on performance, it should be turned off. In the ONE scenario in which AMD gains a significant boost to performance from having it on, nvidia loses performance with it. As for Hawaii/Tonga being ridiculed at launch and now performing better than all the maxwell parts. It simply isn't true. I just made a thread about this. If you compare a 390X at 1175 (highest OC Hilbert got) and a 970 at 1500 (not the highest OC hilbert got) the 970 is within 10% of a 390X in most games, while costing 23% more - at least here in Italy. (I focused on 1440p, here's the link to thread if you want to see my logic and calculations forums.guru3d.com/showthread.php?p=5245872) As Denial said, when Tonga/Hawaii were released nobody denied their specs on paper weren't impressive, they just didn't perform. Whether you choose to attribute the performance increases AMD has achieved recently to driver improvements (onus on AMD to optimize drivers) or games featuring GCN optimizations, those people who owned Kepler cards in Kepler's time had a great experience, better than that of the AMD equivalents. You only want to see the positive aspect of this, if you're going to make a trend out of this and claim Maxwell will meet Kepler's fate, then in all fairness you also have to concede that early Polaris adopters might as well bend over and prepare to get :banana: for the first two/three years I just ran the Ashes benchmark on my overclocked Ti. I got 74fps at 1440p Highhttp://images.anandtech.com/graphs/graph10067/80321.png Running this game on the CRAZY preset I barely match a 390X using the same overclocked Ti. This is like the difference between Hairworks on low and Hairworks on High for AMD cards in the Witcher; it is no different, the only real difference is that criticizing nvidia for beingevil is fashionable nowadays

#5246174

kinggavin

2016-03-16 17:12

Gaming is much smoother on Win10 than on 7.Note I said smoother not faster and I've yet to come across any "gamebreaking" or other annoying issues depends on your tastes i prefer win 7 i cant see myself getting windows 10 for 2 or 3 years if ever because i dont like the mobile phone feel of it with apps and stores , and gaming i think win 7 is better than 10 still and it has vulcan support so if lot games end up with vulcan no need for win 10, dx12 and microsoft store

#5246186

Dragondale13

2016-03-16 17:45

depends on your tastes i prefer win 7 i cant see myself getting windows 10 for 2 or 3 years if ever because i dont like the mobile phone feel of it with apps and stores , and gaming i think win 7 is better than 10 still and it has vulcan support so if lot games end up with vulcan no need for win 10, dx12 and microsoft store

It can be customized to have the look and feel of 7.It's definitely faster in normal operations and Windows Store is a choice believe it or not.But, like you said, depends on your tastes.:)

#5246192

PrMinisterGR

2016-03-16 17:53

Asynchronous compute is an optional DX12 feature intended to improve performance. If enabling it is to the detriment of performance, it should be turned off. If enabling it has no effect on performance, it should be turned off. In the ONE scenario in which AMD gains a significant boost to performance from having it on, nvidia loses performance with it. As for Hawaii/Tonga being ridiculed at launch and now performing better than all the maxwell parts. It simply isn't true. I just made a thread about this. If you compare a 390X at 1175 (highest OC Hilbert got) and a 970 at 1500 (not the highest OC hilbert got) the 970 is within 10% of a 390X in most games, while costing 23% more - at least here in Italy. (I focused on 1440p, here's the link to thread if you want to see my logic and calculations forums.guru3d.com/showthread.php?p=5245872) As Denial said, when Tonga/Hawaii were released nobody denied their specs on paper weren't impressive, they just didn't perform. Whether you choose to attribute the performance increases AMD has achieved recently to driver improvements (onus on AMD to optimize drivers) or games featuring GCN optimizations, those people who owned Kepler cards in Kepler's time had a great experience, better than that of the AMD equivalents. You only want to see the positive aspect of this, if you're going to make a trend out of this and claim Maxwell will meet Kepler's fate, then in all fairness you also have to concede that early Polaris adopters might as well bend over and prepare to get :banana: for the first two/three years I just ran the Ashes benchmark on my overclocked Ti. I got 74fps at 1440p Highhttp://images.anandtech.com/graphs/graph10067/80321.png Running this game on the CRAZY preset I barely match a 390X using the same overclocked Ti. This is like the difference between Hairworks on low and Hairworks on High for AMD cards in the Witcher; it is no different, the only real difference is that criticizing nvidia for beingevil is fashionable nowadays

Async is a basic feature of all new APIs. It's not really "optional", it's one of the selling points of the thing. It's not the same as Hairworks at all. Also if it reduces performance it says a lot about the implementation of the DX12 driver from NVIDIA. It should have been at least equal with the performance with it OFF. There is a good point on that you get better performance initially with NVIDIA cards. I believe that if you want to keep a card more than 12 months AMD has proven to have better solutions. Also the way things go, it seems that people now will be getting that performance from AMD hardware at the time of buy. We'll see how it becomes with Polaris vs Pascal.

#5246209

narukun

2016-03-16 18:34

i feel sad for Kepler, i had a 760 and it was good by the time, even better than the 280X in some games, look it now, 280X far better than the 770, i bought maxwell gtx 970 and i think the same thing is going to happen, well, i learned my lesson, AMD next time for sure.

#5246220

Ieldra

2016-03-16 19:00

It is a selling point, but it's not a requirement for any feature level afaik, that's besides the point though, from what I can tell there's nothing that indicates maxwell has problems with async in general, the other games that use it namely tomb raider, fable legends (won't count gow) run better with it on than off on both vendors' cards Again, if you're basing all this on ashes of the singularity then there isn't much to say, you know I'll point out the engine was initially made to exploit mantle. AMD has a history of releasing products that are great on paper but don't live up to it in practice, remember the 2900XT? That was advertised as being the G80 killer. Lol. On beyond 3d someone wrote a simple test using d3d12 to test async, he ran 1 graphics task +31 compute and execution times scaled linearly with every set of 1+31. Fiji scaled with every 64, but had higher execution times. You should check those results out, Fiji does a lot more concurrently than maxwell does, but async works, so we go back to saying async isn't even relevant in the discussion, the computing concept that is.

#5246283

Turanis

2016-03-16 20:50

ATI 2900XT from prehistoric or ice age times. Now we talk about the old R9 280X and R9 290 who bang Maxwell.Maxwell is still top selling but when Pascal will come then will be another let down.Old story,sadly.

#5246293

Ryu5uzaku

2016-03-16 21:06

It is a selling point, but it's not a requirement for any feature level afaik, that's besides the point though, from what I can tell there's nothing that indicates maxwell has problems with async in general, the other games that use it namely tomb raider, fable legends (won't count gow) run better with it on than off on both vendors' cards Again, if you're basing all this on ashes of the singularity then there isn't much to say, you know I'll point out the engine was initially made to exploit mantle. AMD has a history of releasing products that are great on paper but don't live up to it in practice, remember the 2900XT? That was advertised as being the G80 killer. Lol. On beyond 3d someone wrote a simple test using d3d12 to test async, he ran 1 graphics task +31 compute and execution times scaled linearly with every set of 1+31. Fiji scaled with every 64, but had higher execution times. You should check those results out, Fiji does a lot more concurrently than maxwell does, but async works, so we go back to saying async isn't even relevant in the discussion, the computing concept that is.

Fable Legends was put in to trash bin. GoW is just poor coding and there is no dx12 for TR yet so no async on pc...

#5246304

PrMinisterGR

2016-03-16 21:23

It is a selling point, but it's not a requirement for any feature level afaik, that's besides the point though, from what I can tell there's nothing that indicates maxwell has problems with async in general, the other games that use it namely tomb raider, fable legends (won't count gow) run better with it on than off on both vendors' cards

It's no "requirement". It's the capability for preemption that the hardware has. NVIDIA hardware is simply bad at it. I bet that Pascal won't be bad at all on this, and that Maxwell will probably go the way of Kepler. Great cards for graphics tasks/DX11 games, not so great with newer engines that need low latency and compute. NVIDIA's pre-emption performance has been described as "catastrophic", and everyone believes that Pascal will have hardware scheduling once more for that exact reason. It was a good decision for Maxwell, but not very good for people who want to keep those $700 cards more. I won't even speak of the Titan X :P As David Kanter from Occulus have said about it:

NVIDIA is very – to their credit – open and honest about this and how you tune for Oculus Rift is that you have to be super careful because you can miss a frame boundary because the preemption is not particularly well latency. And again, this is, it’s not like it’s a bad decision on the part of NVIDIA. It’s, you know, that’s just what made sense. Preemption was not something that was super important when the chip was designed and the API support was… there wasn’t much bang for your buck.

AMD has a history of releasing products that are great on paper but don't live up to it in practice, remember the 2900XT? That was advertised as being the G80 killer. Lol.

Both manufacturers have history on that. Remember the GeForceFX? The chip of the future? The G80 was an awesome chip, Fermi was an awesome chip.

On beyond 3d someone wrote a simple test using d3d12 to test async, he ran 1 graphics task +31 compute and execution times scaled linearly with every set of 1+31. Fiji scaled with every 64, but had higher execution times. You should check those results out, Fiji does a lot more concurrently than maxwell does, but async works, so we go back to saying async isn't even relevant in the discussion, the computing concept that is.

Do you have any link on that? This is quite informative.

#5246313

Ryu5uzaku

2016-03-16 21:31

It's no "requirement". It's the capability for preemption that the hardware has. NVIDIA hardware is simply bad at it. I bet that Pascal won't be bad at all on this, and that Maxwell will probably go the way of Kepler. Great cards for graphics tasks/DX11 games, not so great with newer engines that need low latency and compute. NVIDIA's pre-emption performance has been described as "catastrophic", and everyone believes that Pascal will have hardware scheduling once more for that exact reason. It was a good decision for Maxwell, but not very good for people who want to keep those $700 cards more. I won't even speak of the Titan X :P As David Kanter from Occulus have said about it: Both manufacturers have history on that. Remember the GeForceFX? The chip of the future? The G80 was an awesome chip, Fermi was an awesome chip. Do you have any link on that? This is quite informative.

I would say Kepler was awesome. Fermi was hot as hell 😀 It had pretty good performance but it was as bad as 290x with stock cooler and used a lot of power.

#5246323

Syranetic

2016-03-16 21:51

Wow... maybe Vulkan has a chance of gaining traction with such a poor showing of DX12. I'm surprised, you would think Microsoft would be getting in there and trying to make sure the first few implementations properly highlight the benefits of the API... I mean you frequently hear developers working with Microsoft and the GPU vendor to optimize a game.

#5246327

Denial

2016-03-16 21:55

Wow... maybe Vulkan has a chance of gaining traction with such a poor showing of DX12. I'm surprised, you would think Microsoft would be getting in there and trying to make sure the first few implementations properly highlight the benefits of the API... I mean you frequently hear developers working with Microsoft and the GPU vendor to optimize a game.

Vulkan won't be any better. DX12 is nice for AMD because it's improving the efficiency of their cards. But it's not going to bring massive performance improvements overall, not unless the game is CPU bottlenecked. The best of DX12 won't come until games are completely written for DX12 from the ground up. Oxide talked about this in an interview with Anandtech. By that time there won't be a DX11 comparisons so the performance increases, if any, will get lost in diminishing return graphics that people will write off as no improvement.

#5246336

Stormyandcold

2016-03-16 22:21

I haven't seen anything to suggest Vulkan can beat dx12 performance which is what's needed for wider adoption.

#5246361

leszy

2016-03-16 23:19

It seems that Hitman simply takes full advantage of the power of the GPU. The sequence looks similar to the measured one. Maybe that employed a few months ago driver engineer did a good job for AMD? Processing Power (GFLOPS) Radeon R9 FuryX - 8601.6 Radeon R9 Nano - 8192 Radeon R9 Fury - 7168 GeForce GTX Titan X - 6144 Radeon R9 390X - 5913.6 GeForce GTX 980 Ti - 5632 Radeon R9 390 - 5120 GeForce GTX 980 - 4612 Radeon R9 380X - 3973.1 GeForce GTX 970 - 3494 GeForce GTX 960 - 2308 GeForce GTX 950 - 1573

#5246365

Ieldra

2016-03-16 23:24

It's no "requirement". It's the capability for preemption that the hardware has. NVIDIA hardware is simply bad at it. I bet that Pascal won't be bad at all on this, and that Maxwell will probably go the way of Kepler. Great cards for graphics tasks/DX11 games, not so great with newer engines that need low latency and compute. NVIDIA's pre-emption performance has been described as "catastrophic", and everyone believes that Pascal will have hardware scheduling once more for that exact reason. It was a good decision for Maxwell, but not very good for people who want to keep those $700 cards more. I won't even speak of the Titan X :P As David Kanter from Occulus have said about it: Both manufacturers have history on that. Remember the GeForceFX? The chip of the future? The G80 was an awesome chip, Fermi was an awesome chip. Do you have any link on that? This is quite informative.

It's easy to point at something and say it's a trend, that's my point, and of course both amd/nv have had their ups and downs that's implicit in the discussion. With GCN an ACE can "steal" an execution unit from another task, and execute compute commands at low latency concurrently with the graphics loads. Nvidia needs a full flush to switch contexts, GCN benefits from steady streams of commands on the compute queue because it hide latency that way, as detailed by the link you posted. The downside of this is that the execution time of any one task is higher, GCN achieves high throughput with many concurrent commands being executed, but their individual execution times are higher, Maxwell does almost the opposite. Batching compute commands and running them through the compute queues in DX12 will provide a performance benefit to maxwell if the context switching delay is << execution time of the task GCN is undoubtedly better at this, and the implementation is more intuitive but there is a trade off. At the end of the day if your dx11 version runs better than your dx12 you're a ****ty dx12 developer, it's clear Ashes was designed for GCN. http://www.dualshockers.com/2016/03/14/directx12-requires-different-optimization-on-nvidia-and-amd-cards-lots-of-details-shared/ As for the Kepler argument: You realize games today running at the settings at which we benchmark are a lot more power hungry than those of two years ago right ? All that's happened is that AMD improved their drivers, and games have started using a lot more raw compute power, which Hawaii and Tonga had loads of . AMD cards got better, Kepler hasn't gotten worse, it's a crock of **** is what it is. But then again, we were talking about DX12 and async, and how you thought that was consistent with the trend set by kepler, now you obviously need to bring VR latency into the mix because that was a dead end. VR is another issue, AMD clearly has lower latency so that's that. VR is still far away though, it's gonna be very niche for a long time. There's no getting round it, with dx12 there have to be different vendor specific paths. https://forum.beyond3d.com/threads/dx12-performance-discussion-and-analysis-thread.57188/page-12 Some stuff from GDC http://www.dualshockers.com/2016/03/14/directx12-requires-different-optimization-on-nvidia-and-amd-cards-lots-of-details-shared/

http://cdn3.dualshockers.com/wp-content/uploads/2016/03/Direct-X12-Panel-Slides-59.jpg

#5246371

Ieldra

2016-03-16 23:40

It seems that Hitman simply takes full advantage of the power of the GPU. The sequence looks similar to the measured one. Maybe that employed a few months ago driver engineer did a good job for AMD? Processing Power (GFLOPS) Radeon R9 FuryX - 8601.6 Radeon R9 Nano - 8192 Radeon R9 Fury - 7168 GeForce GTX Titan X - 6144 Radeon R9 390X - 5913.6 GeForce GTX 980 Ti - 5632 Radeon R9 390 - 5120 GeForce GTX 980 - 4612 Radeon R9 380X - 3973.1 GeForce GTX 970 - 3494 GeForce GTX 960 - 2308 GeForce GTX 950 - 1573

That's interesting, although it's flawed. Each core can do 2 floating point operations per clock 3072x2=6144 per clock. Multiply by 1Ghz you get 6144Gflop/s Problem is average 980Ti clocks at ~1380 out the box, doing the math ; 7,772 GFlop/s A titan X at 1400 mhz is 8.6 Tflops My 980Ti at 1510 is 8.5 Tflops