For a week or two now in our forums there have been allegations that users of the GeForce GTX 970 have a darn hard time addressing and filling the last 10% of their graphics memory. The 4 GB card seems to run into issues addressing the last 400 to 600 MB of memory, which is significant.
Two weeks ago when I tested this myself to try and replicate it, some games halted at 3.5 GB while others like COD fill the 4 GB completely. These reports have been ongoing for a while now, then got dismissed. However a a new small tool helps us to indicate and verify a thing or two, and there really is something going on with that last chunk of memory for the GeForce GTX 970 and its memory usage. We have to concur the findings, there is a problem that the 970 shows, and the 980 doesn't.
Meanwhile an Nvidia representative here at the Guru3D forums already stated that "they are looking into it". The tool we are talking about to verify a thing or two was made by a German programmer under the name Nai, he has made a small program that benchmarks vram performance and we can see the 970 memory utilizing around the 3.3GB, while the GTX 980 does not show such behavior:
You can download the test to try it yourself, we placed it here (local guru3d mirror). This is a customized version based on the original programming by Nai, this one is programmed by a Guru3D member. With this version you can now also specify the allocation block size and the maximum memory that is used as follows:
vRamBandwidthTest.exe [BlockSizeMB] [MaxAllocationMB]
- BlockSizeMB: any number of 16 32 64 128 256 512 1024
- MaxAllocationMB: any number greater or equal to BlockSizeMB
If no arguments are given the test runs the 128MB blocksize by default with no memory limit, which corresponds exactly with the original program. Please disable AERO and preferably disconnect the monitor during the test. We are interested in hearing Nvidia's response to the new findings.
You can further discuss your findings here in our forums. Please do share us your GTX 970 and GTX 980 results.
Meanwhile at Nvidia (a chat from a forum user):
[10:11:39 PM] NV Chat: We have our entire team working on this issue with a high priority. This will soon be fixed for sure.
[10:11:54 PM] Me: So, what is the issue?
[10:12:07 PM] Me: What needs to be fixed?
[10:12:46 PM] NV Chat: We are not sure on that. We are still yet to find the cause of this issue.
[10:12:50 PM] NV Chat: Our team is working on it.
Update #1 - Nvidia responds
NVIDIA now has responded to the findings:
The GeForce GTX 970 is equipped with 4GB of dedicated graphics memory. However the 970 has a different configuration of SMs than the 980, and fewer crossbar resources to the memory system. To optimally manage memory traffic in this configuration, we segment graphics memory into a 3.5GB section and a 0.5GB section. The GPU has higher priority access to the 3.5GB section. When a game needs less than 3.5GB of video memory per draw command then it will only access the first partition, and 3rdparty applications that measure memory usage will report 3.5GB of memory in use on GTX 970, but may report more for GTX 980 if there is more memory used by other commands. When a game requires more than 3.5GB of memory then we use both segments.
We understand there have been some questions about how the GTX 970 will perform when it accesses the 0.5GB memory segment. The best way to test that is to look at game performance. Compare a GTX 980 to a 970 on a game that uses less than 3.5GB. Then turn up the settings so the game needs more than 3.5GB and compare 980 and 970 performance again.
Here’s an example of some performance data:
Shadow of Mordor <3.5GB setting = 2688x1512 Very High 72 fps 60 fps >3.5GB setting = 3456x1944 55fps (-24%) 45fps (-25%) Battlefield 4 <3.5GB setting = 3840x2160 2xMSAA 36 fps 30 fps >3.5GB setting = 3840x2160 135% res 19fps (-47%) 15fps (-50%) Call of Duty: Advanced Warfare <3.5GB setting = 3840x2160 FSMAA T2x, Supersampling off 82 fps 71 fps <3.5GB setting = >3.5GB setting = 3840x2160 FSMAA T2x, Supersampling on 48fps (-41%) 40fps (-44%)
On GTX 980, Shadows of Mordor drops about 24% on GTX 980 and 25% on GTX 970, a 1% difference. On Battlefield 4, the drop is 47% on GTX 980 and 50% on GTX 970, a 3% difference. On CoD: AW, the drop is 41% on GTX 980 and 44% on GTX 970, a 3% difference. As you can see, there is very little change in the performance of the GTX 970 relative to GTX 980 on these games when it is using the 0.5GB segment.
So removing SMMs to make the GTX 970 a lower spec product over the GTX 980 is the main issue here, 500MB is 1/8t of the 4GB total memory capacity yeah, two SMMs is 1/8th of the total SMM count. So the answer really is, the primary usable memory for the GTX 970 is a 3.5 GB partition.
Nvidias results seem to suggest this is a non issue, however actual users results contradict them. I'm not quite certain how well this info will sit with GTX 970 owners, as this isn't a bug that can be fixed, it's in design to function that way due to the cut down SMMs.
Update #2 - A little bit of testing
On a generic notice, I've been using and comparing games with both a 970 and 980 today, and quite honestly I can not really reproduce stutters or weird issues other then the normal stuff once you run out of graphics memory. Once you run out of ~3.5 GB memory or on the ~4GB GTX 980 slowdowns or weird behavior can occur, but that goes with any graphics card that runs out of video memory. I've seen 4GB graphics usage with COD, 3.6 GB with Shadows of Mordor with wide varying settings, and simply can not reproduce significant enough anomalies. Once you really run out of graphics memory, perhaps flick down the AA mode a tiny bit from 8x to 4x or something. I have to state this though, the primary 3.5 GB partition on the GTX 970 with a 500MB slow secondary partition is a big miss from Nvidia, but mostly for not honestly communicating this. The problem I find to be more of a marketing miss with a lot of aftermath due to not mentioning it.
Would Nvidia have disclosed the information alongside the launch, then you guys would/could have made a more informed decision. For most of you the primary 3.5 GB graphics memory will be more than plenty in 1920x1080 (Full HD) up-to 2560x1440 (WHQD).
Update #3 - The issue that is behind the issue
New info surfaces, Nvidia messed up quite a bit when they send out specs towards press and media like ourselves. As we now know, the GeForce GTX 970 has 56 ROPs, not 64 as listed in their reviewers guides. Having fewer ROPs is not a massive thing here but it exposes a thing or two about effects in the memory subsystem and L2 cache. Combined with some new features in the Maxwell architecture herein we can find the answers of the cards being split up in 3.5GB/0.5GB partions as noted above.
Look above, (and I am truly sorry to make this so complicated, as it really is just that .. complicated). You'll notice that for GTX 970 compared to 980 there are three disabled SMs giving the GTX 970 13 active SM (clusters with things like shader processors). The SMs shown at the top are followed by 256KB L2 caches and then pairs with 32-bit memory controllers located at the bottom. The crossbar is responsible for communication inbetween the SM's, cache en and memory controllers.
You will notice that greyed-out right-hand L2 for this GPU right ? That is a disabled L2 block and each L2 block is tied to ROPs, GTX 970 does not have 2,048KB but instead has 1,792KB of L2 cache. Disabling ROPs and thus L2 like that is actually new and Maxwell exclusive, on Kepler disabling a L2/ROP segment would disable the entire section including a memory controller. So while the L2/ROP unit is disabled, that 8th memory controller to the right still is active and in use.
Now that we know that Maxwell can disable smaller segments and keep the rest activated, we just learned that we can still use the 64-bit memory controllers and associated DRAM, but the final 1/8th L2 cache is missing/disabled. As you can see the DRAM controller actually need to buddy up into the 7th L2 unit, that it the root cause of a big performance issue. The GeForce GTX 970 has a 256-bit bus over a 4GB framebuffer, the memory controllers are all active and in use, but disabling that L2 segment tied to the 8th memory controller will result in the fact that overall L2 performance would operate at half of its normal performance.
Nvidia needed to tackle that problem and did so by splitting the total 4GB memory into a primary (196 GB/sec) 3.5GB partition that makes use of the first seven memory controllers and associated DRAM, then there is a (28 GB/sec) 0.5GB tied to the last 8th memory controller. Nvidia could have and probably should have marketed the card as 3.5GB, or they probably could even have deactivated an entire right side quad and go for a 192-bit memory interface tied to just 3GB of memory but did not pursue that as alternative as this solution offers better performance. Nvidia's claims that games hardly suffer from this design / workaround.
In a rough simplified explanation the disabled L2 unit causes a challenge, an offset performance hit tied to one of the memory controllers. To divert that performance hit the memory is split up into two segments, bypassing the issue at hand, a tweak to get the most out of a lesser situation. Both memory partions are active and in use, the primary 3.5 GB partion is very fast, the 512MB secondary partion is much slower.
Thing is, the quantifying fact is that nobody really has massive issues, dozens and dozens of media have tested the card with in-depth reviews like the ones here on my site. Replicating the stutters and stuff you see in some of the video's, well to date I have not been able to reproduce them unless you do crazy stuff, and I've been on this all weekend. Overall scores are good, and sure if you run out of memory at one point you will see perf drops. But then drop from 8 to like 4x AA right ?
Nvidia messed up badly here .. no doubt about it. The ROP/L2 cache count was goofed up and slipped through the mazes and ended up in their reviewers guides and spec sheets, and really ... they should have called this a 3.5 GB card with an extra layer of L3 cache memory or something. Right now Nvidia is in full damage control, however I will stick to my recommendations, the GeForce GTX 970 is still a card we like very much in the up-to 2560x1440 (WHQD) domain, but it probably should have been called a 3.5 GB product with an added 512MB L3 cache.
To answer my own title question, does Nvidia have a memory allocation bug ? Nope, this all was done per design, Nvidia however failed to communicate this completely with the tech-media and thus in the end, the people that buy the product.
Let us know your thoughts in the forums.