Guru3D.com
  • HOME
  • NEWS
    • Channels
    • Archive
  • DOWNLOADS
    • New Downloads
    • Categories
    • Archive
  • GAME REVIEWS
  • ARTICLES
    • Rig of the Month
    • Join ROTM
    • PC Buyers Guide
    • Guru3D VGA Charts
    • Editorials
    • Dated content
  • HARDWARE REVIEWS
    • Videocards
    • Processors
    • Audio
    • Motherboards
    • Memory and Flash
    • SSD Storage
    • Chassis
    • Media Players
    • Power Supply
    • Laptop and Mobile
    • Smartphone
    • Networking
    • Keyboard Mouse
    • Cooling
    • Search articles
    • Knowledgebase
    • More Categories
  • FORUMS
  • NEWSLETTER
  • CONTACT

New Reviews
Fractal Design Pop Air RGB Black TG review
Palit GeForce GTX 1630 4GB Dual review
FSP Dagger Pro (850W PSU) review
Razer Leviathan V2 gaming soundbar review
Guru3D NVMe Thermal Test - the heatsink vs. performance
EnGenius ECW220S 2x2 Cloud Access Point review
Alphacool Eisbaer Aurora HPE 360 LCS cooler review
Noctua NH-D12L CPU Cooler Review
Silicon Power XPOWER XS70 1TB NVMe SSD Review
Hyte Y60 chassis review

New Downloads
Intel ARC graphics Driver Download Version: 30.0.101.1743
AMD Radeon Software Adrenalin 22.6.1 WHQL driver download
GeForce 516.59 WHQL driver download
Media Player Classic - Home Cinema v1.9.22 Download
AMD Chipset Drivers Download v4.06.10.651
CrystalDiskInfo 8.17 Download
AMD Radeon Software Adrenalin 22.6.1 Windows 7 driver download
ReShade download v5.2.2
HWiNFO Download v7.26
7-Zip v22.00 Download


New Forum Topics
FSR Thread [3rd-Party Driver] Amernime Zone Radeon Insight 22.5.1 WHQL Driver Pack (Released) Foundry TSMC states prices of graphics cards and processors will increase by 9% MSI AB / RTSS development news thread Extreme 4-Way Sli Tuning Is 2090 gpu core a good O.C for a 3080ti? When is "Downscaling" worth it? (DLDSR, etc) AMD Radeon Software Adrenalin 22.6.1 - Driver download and discussion Info Zone - gEngines, Ray Tracing, DLSS, DLAA, TSR, FSR, XeSS, DLDSR etc. AMD Radeon Software Adrenalin 22.6.1 - Windows 7/10 (Legacy ASICS) - Driver download & discussion




Guru3D.com » News » Does the GeForce GTX 970 have a memory allocation bug ? (update 3)

Does the GeForce GTX 970 have a memory allocation bug ? (update 3)

by Hilbert Hagedoorn on: 01/27/2015 02:05 AM | source: | 1945 comment(s)
Does the GeForce GTX 970 have a memory allocation bug ? (update 3)

For a week or two now in our forums there have been allegations that users of the GeForce GTX 970 have a darn hard time addressing and filling the last 10% of their graphics memory. The 4 GB card seems to run into issues addressing the last 400 to 600 MB of memory, which is significant.

Two weeks ago when I tested this myself to try and replicate it, some games halted at 3.5 GB while others like COD fill the 4 GB completely. These reports have been ongoing for a while now, then got dismissed. However a a new small tool helps us to indicate and verify a thing or two, and there really is something going on with that last chunk of memory for the GeForce GTX 970 and its memory usage. We have to concur the findings, there is a problem that the 970 shows, and the 980 doesn't.

Meanwhile an Nvidia representative here at the Guru3D forums already stated that "they are looking into it". The tool we are talking about to verify a thing or two was made by a German programmer under the name Nai, he has made a small program that benchmarks vram performance and we can see the 970 memory utilizing around the 3.3GB, while the GTX 980 does not show such behavior:

You can download the test to try it yourself, we placed it here (local guru3d mirror). This is a customized version based on the original programming by Nai, this one is programmed by a Guru3D member. With this version you can now also specify the allocation block size and the maximum memory that is used as follows:

vRamBandwidthTest.exe [BlockSizeMB] [MaxAllocationMB]

  • BlockSizeMB: any number of 16 32 64 128 256 512 1024
  • MaxAllocationMB: any number greater or equal to BlockSizeMB

If no arguments are given the test runs the 128MB blocksize by default with no memory limit, which corresponds exactly with the original program. Please disable AERO and preferably disconnect the monitor during the test. We are interested in hearing Nvidia's response to the new findings. 

You can further discuss your findings here in our forums. Please do share us your GTX 970 and GTX 980 results.

Meanwhile at Nvidia (a chat from a forum user):

[10:11:39 PM] NV Chat: We have our entire team working on this issue with a high priority. This will soon be fixed for sure.
[10:11:54 PM] Me: So, what is the issue?
[10:12:07 PM] Me: What needs to be fixed?
[10:12:46 PM] NV Chat: We are not sure on that. We are still yet to find the cause of this issue.
[10:12:50 PM] NV Chat: Our team is working on it.

Update #1 - Nvidia responds

NVIDIA now has responded to the findings:

The GeForce GTX 970 is equipped with 4GB of dedicated graphics memory.  However the 970 has a different configuration of SMs than the 980, and fewer crossbar resources to the memory system. To optimally manage memory traffic in this configuration, we segment graphics memory into a 3.5GB section and a 0.5GB section.  The GPU has higher priority access to the 3.5GB section.  When a game needs less than 3.5GB of video memory per draw command then it will only access the first partition, and 3rdparty applications that measure memory usage will report 3.5GB of memory in use on GTX 970, but may report more for GTX 980 if there is more memory used by other commands.  When a game requires more than 3.5GB of memory then we use both segments.

We understand there have been some questions about how the GTX 970 will perform when it accesses the 0.5GB memory segment.  The best way to test that is to look at game performance.  Compare a GTX 980 to a 970 on a game that uses less than 3.5GB.  Then turn up the settings so the game needs more than 3.5GB and compare 980 and 970 performance again.

Here’s an example of some performance data:

 GeForce 
GTX 980
GeForce 
GTX 970
Shadow of Mordor    
<3.5GB setting = 2688x1512 Very High 72 fps 60 fps
>3.5GB setting = 3456x1944 55fps (-24%) 45fps (-25%)
Battlefield 4    
<3.5GB setting = 3840x2160 2xMSAA 36 fps 30 fps
>3.5GB setting = 3840x2160 135% res 19fps (-47%) 15fps (-50%)
Call of Duty: Advanced Warfare    
<3.5GB setting = 3840x2160 FSMAA T2x, Supersampling off 82 fps 71 fps
<3.5GB setting = >3.5GB setting = 3840x2160 FSMAA T2x, Supersampling on 48fps (-41%) 40fps (-44%)

On GTX 980, Shadows of Mordor drops about 24% on GTX 980 and 25% on GTX 970, a 1% difference.  On Battlefield 4, the drop is 47% on GTX 980 and 50% on GTX 970, a 3% difference.  On CoD: AW, the drop is 41% on GTX 980 and 44% on GTX 970, a 3% difference.  As you can see, there is very little change in the performance of the GTX 970 relative to GTX 980 on these games when it is using the 0.5GB segment.

So removing SMMs to make the GTX 970 a lower spec product over the GTX 980 is the main issue here, 500MB is 1/8t of the 4GB total memory capacity yeah, two SMMs is 1/8th of the total SMM count. So the answer really is, the primary usable memory for the GTX 970 is a 3.5 GB partition. 

Nvidias results seem to suggest this is a non issue, however actual users results contradict them. I'm not quite certain how well this info will sit with GTX 970 owners, as this isn't a bug that can be fixed, it's in design to function that way due to the cut down SMMs. 

Update #2 - A little bit of testing

On a generic notice, I've been using and comparing games with both a 970 and 980 today, and quite honestly I can not really reproduce stutters or weird issues other then the normal stuff once you run out of graphics memory. Once you run out of ~3.5 GB memory or on the ~4GB GTX 980 slowdowns or weird behavior can occur, but that goes with any graphics card that runs out of video memory. I've seen 4GB graphics usage with COD, 3.6 GB with Shadows of Mordor with wide varying settings, and simply can not reproduce significant enough anomalies. Once you really run out of graphics memory, perhaps flick down the AA mode a tiny bit from 8x to 4x or something. I have to state this though, the primary 3.5 GB partition on the GTX 970 with a 500MB slow secondary partition is a big miss from Nvidia, but mostly for not honestly communicating this. The problem I find to be more of a marketing miss with a lot of aftermath due to not mentioning it.

Would Nvidia have disclosed the information alongside the launch, then you guys would/could have made a more informed decision. For most of you the primary 3.5 GB graphics memory will be more than plenty in 1920x1080 (Full HD) up-to 2560x1440 (WHQD).

Update #3 - The issue that is behind the issue

New info surfaces, Nvidia messed up quite a bit when they send out specs towards press and media like ourselves. As we now know, the GeForce GTX 970 has 56 ROPs, not 64 as listed in their reviewers guides. Having fewer ROPs is not a massive thing here but it exposes a thing or two about effects in the memory subsystem and L2 cache. Combined with some new features in the Maxwell architecture herein we can find the answers of the cards being split up in 3.5GB/0.5GB partions as noted above.
 


 

Look above, (and I am truly sorry to make this so complicated, as it really is just that .. complicated). You'll notice that for GTX 970 compared to 980 there are three disabled SMs giving the GTX 970 13 active SM (clusters with things like shader processors). The SMs shown at the top are followed by 256KB L2 caches and then pairs with 32-bit memory controllers located at the bottom. The crossbar is responsible for communication inbetween the SM's, cache en and memory controllers.

You will notice that greyed-out right-hand L2 for this GPU right ? That is a disabled L2 block and each L2 block is tied to ROPs, GTX 970 does not have 2,048KB but instead has 1,792KB of L2 cache. Disabling ROPs and thus L2 like that is actually new and Maxwell exclusive, on Kepler disabling a L2/ROP segment would disable the entire section including a memory controller. So while the L2/ROP unit is disabled, that 8th memory controller to the right still is active and in use.

Now that we know that Maxwell can disable smaller segments and keep the rest activated, we just learned that we can still use the 64-bit memory controllers and associated DRAM, but the final 1/8th L2 cache is missing/disabled. As you can see the DRAM controller actually need to buddy up into the 7th L2 unit, that it the root cause of a big performance issue. The GeForce GTX 970 has a 256-bit bus over a 4GB framebuffer, the memory controllers are all active and in use, but disabling that L2 segment tied to the 8th memory controller will result in the fact that overall L2 performance would operate at half of its normal performance.

Nvidia needed to tackle that problem and did so by splitting the total 4GB memory into a primary (196 GB/sec) 3.5GB partition that makes use of the first seven memory controllers and associated DRAM, then there is a (28 GB/sec) 0.5GB tied to the last 8th memory controller. Nvidia could have and probably should have marketed the card as 3.5GB, or they probably could even have deactivated an entire right side quad and go for a 192-bit memory interface tied to just 3GB of memory but did not pursue that as alternative as this solution offers better performance. Nvidia's claims that games hardly suffer from this design / workaround.

In a rough simplified explanation the disabled L2 unit causes a challenge, an offset performance hit tied to one of the memory controllers. To divert that performance hit the memory is split up into two segments, bypassing the issue at hand, a tweak to get the most out of a lesser situation. Both memory partions are active and in use, the primary 3.5 GB partion is very fast, the 512MB secondary partion is much slower.

Thing is, the quantifying fact is that nobody really has massive issues, dozens and dozens of media have tested the card with in-depth reviews like the ones here on my site. Replicating the stutters and stuff you see in some of the video's, well to date I have not been able to reproduce them unless you do crazy stuff, and I've been on this all weekend. Overall scores are good, and sure if you run out of memory at one point you will see perf drops. But then drop from 8 to like 4x AA right ?

Nvidia messed up badly here .. no doubt about it. The ROP/L2 cache count was goofed up and slipped through the mazes and ended up in their reviewers guides and spec sheets, and really ... they should have called this a 3.5 GB card with an extra layer of L3 cache memory or something. Right now Nvidia is in full damage control, however I will stick to my recommendations, the GeForce GTX 970 is still a card we like very much in the up-to 2560x1440 (WHQD) domain, but it probably should have been called a 3.5 GB product with an added 512MB L3 cache.

To answer my own title question, does Nvidia have a memory allocation bug ? Nope, this all was done per design, Nvidia however failed to communicate this completely with the tech-media and thus in the end, the people that buy the product.

Let us know your thoughts in the forums.



Does the GeForce GTX 970 have a memory allocation bug ? (update 3) Does the GeForce GTX 970 have a memory allocation bug ? (update 3) Does the GeForce GTX 970 have a memory allocation bug ? (update 3)




« Acer Predator XR341CK 34" Curved Gaming Screen with G-sync · Does the GeForce GTX 970 have a memory allocation bug ? (update 3) · Windows 10 update enabled DirectX 12 »

389 pages « < 4 5 6 7 > »


skacikpl
Senior Member



Posts: 624
Joined: 2014-07-08

#4997960 Posted on: 01/23/2015 04:52 PM
As you are testing, I would ask you exactly opposite thing.
Do not try to have minimal allocation before you start test.

Do allocate even 1GB of vram before test, and terminate game after test allocates whole remaining 4GB.
(if it allocates it only during bench itself and not in earlier part where it pauses, then kill game as soon as chunks start to get tested).

This way you clear additional space which should not be allocated by benchmark.

And if test does not show drop in performance after getting additional vram, issue is due to overhead.

I'll try that.
//
Initial test - Lords of the Fallen, maxed out (3.5Gb VRAM used), benchmark crashes right away.
BF4(1.1GB Vram used) ran alongside the benchmark
BF4(1.1GB Vram used) killed ASAP after bench allocates memory.

Fox2232
Senior Member



Posts: 11809
Joined: 2012-07-20

#4997961 Posted on: 01/23/2015 04:53 PM
Why should people have to alter their configuration just to test a theory? Especially a theory that can not be definitively proven?

While I trust Fox's evaluation of the source code, even he can't guarantee that there is not a flaw somewhere causing "unusual" results. If there was actually a "flaw", then all the results would fall within a respectable margin of error.

Using CUDA to demonstrate a flaw is already creating a flaw in the testing to begin with. CUDA is an NVidia IP so it should not have been used and any results derived from a CUDA based test should be disregarded.

You are right, I do not know inner CUDA workings. That is why I want to preallocate 1GB, let test to allocate remaining 3GB.
Then Kill 1st GB of allocation leaving free space for bench itself. If it indeed has CUDA based memory overhead 1GB of vram should be enough to accommodate it and test would show only 22/23 chunks, but they would all perform well.
If even with free 1GB block test shows that end blocks are performing bad then it is not caused by code.

Fox2232
Senior Member



Posts: 11809
Joined: 2012-07-20

#4997966 Posted on: 01/23/2015 04:56 PM
I'll try that.
//
Initial test - Lords of the Fallen, maxed out (3.5Gb VRAM used), benchmark crashes right away.

Likely crash because it could not allocate even 0th block and then tried to run bench on it. Because there is no protection which checks if even one got allocated.

JohnLai
Senior Member



Posts: 136
Joined: 2006-04-25

#4997967 Posted on: 01/23/2015 04:58 PM
Just to clarify, what that OCN guy said regarding fresh users wasn't directed at you specifically, nor did I mean it that way. :)

Instead it was illuminating a theme which seems to be constantly reoccurring in this situation. Similar comments have also been made on AnandTech.....


And I don't know anything about coding so I can't help you there sorry. Coding is one of the things I know least about tbh.

Thanks, I really appreciate it.

sykozis
Senior Member



Posts: 22107
Joined: 2008-07-14

#4997970 Posted on: 01/23/2015 05:04 PM
You are right, I do not know inner CUDA workings. That is why I want to preallocate 1GB, let test to allocate remaining 3GB.
Then Kill 1st GB of allocation leaving free space for bench itself. If it indeed has CUDA based memory overhead 1GB of vram should be enough to accommodate it and test would show only 22/23 chunks, but they would all perform well.
If even with free 1GB block test shows that end blocks are performing bad then it is not caused by code.

The graphics card needs memory to perform operations, since that's where the data is stored that the GPU needs to be able to perform the requested operations. The more data you force into memory, the less is available to continue performing operations. The memory bandwidth is going to drop. Even an OpenCL based application would show this occuring. Being able to run from the CPU gives OpenCL the advantage of less overhead which would mean less drop in measured bandwidth.

When you execute this "test", the instructions required for the operations are loaded into graphics memory. From there, the instructions are processed. For each function, more instructions have to be loaded into memory. When you flood the memory, there's no place to store the next set of instructions so the necessary space has to be flushed, which negatively affects memory bandwidth.

The only way to avoid this would be for NVidia to partition the ram to give CUDA it's own, dedicated memory partition, which just isn't feasible on a consumer graphics card. For the Quadro line it might be, but it'll provide limited benefit to consumers compared to the cost associated with doing such.

389 pages « < 4 5 6 7 > »


Post New Comment
Click here to post a comment for this news story on the message forum.


Guru3D.com © 2022