NVIDIA Working on Tile-based Multi-GPU Rendering Technique Called CFR - Checkered Frame Rendering

Published by

Click here to post a comment for NVIDIA Working on Tile-based Multi-GPU Rendering Technique Called CFR - Checkered Frame Rendering on our message forum
https://forums.guru3d.com/data/avatars/m/235/235224.jpg
Sounds like it can help run SLI on engines that use information from previous frames like temporal effects.
data/avatar/default/avatar12.webp
With gpu improvements becoming smaller and smaller, we need SLI more than ever... so this makes me very happy !
https://forums.guru3d.com/data/avatars/m/260/260048.jpg
I think this all is heading towards chiplet design GPU's. In several years most like all of us will use SLI or a technology similar to it in one way or another.
data/avatar/default/avatar34.webp
i read that picture as rendering half pixel per frame.... unsure is an improvement. Otherwise why writing frame N and N+1? i disagree we need sli more than ever. We need a way to go back playing games with a decent amount of money.... And if sli is only for top cards that is not gonna happen.
https://forums.guru3d.com/data/avatars/m/220/220214.jpg
I wonder how this gets around the problem of most modern game engines not being fully compatible with multi-gpu rendering (because of the way the engines work). I mean this was the reason that SLI support died off in last few years. It was a nightmare for developers and NVidia to try and shoehorn in support in a hacky way which ended up being no better and causing more trouble than it was worth. As far as I can tell the only way this will work in future with real multi-gpu "chiplet" type designs... is an actual game engine designed from the beginning to work with multiple GPUs and their own RAM or most likely some form of multi-gpu with *shared* RAM solution (which would make the problems associated with multi-gpu game engine easier - i think the main problem is access to other frames to generate current frame but information is in different VRAM so requires totally inefficient copying across continuously) So yes I think multi-GPU chiplet design would have to have a shared VRAM + cache amongst all GPUs - which is what SLI does not have now on seperate GPUs.
https://forums.guru3d.com/data/avatars/m/273/273678.jpg
interesting that people are only noticing this now.... since its been in the NVAPI reference since august. Reports are wrong as usual because it has a full opengl implementation including NVAPI reference values, and Pref exposed to Inspector (its not grouped wit hthe rest of the sli settings) enum EValues_OGL_SLI_CFR_MODE Enumerator OGL_SLI_CFR_MODE_DISABLE OGL_SLI_CFR_MODE_ENABLE OGL_SLI_CFR_MODE_CLASSIC_SFR OGL_SLI_CFR_MODE_NUM_VALUES OGL_SLI_CFR_MODE_DEFAULT https://cdn.discordapp.com/attachments/395665775077359626/647026061053263882/unknown.png
https://forums.guru3d.com/data/avatars/m/216/216349.jpg
Maybe i´m writing something really stupid because i have no expertise on this area but wouldn´t it be better to divide the workload differently between GPUs. Crysis for example, the GPU had to render everything including large portions of the island at a distance, is it possible to have one GPU rendering just the background and the physics and the other one running the rest of the scene??? This way the workload would be divided and the performance would increase. Of course this is a very simplistic approach and there are problems to solve like rendering everything in sync but still i wonder if this would work in real life. If my sugestion is really stupid, don´t be afraid to say it guys!
https://forums.guru3d.com/data/avatars/m/232/232349.jpg
geogan:

I wonder how this gets around the problem of most modern game engines not being fully compatible with multi-gpu rendering (because of the way the engines work). I mean this was the reason that SLI support died off in last few years. It was a nightmare for developers and NVidia to try and shoehorn in support in a hacky way which ended up being no better and causing more trouble than it was worth. As far as I can tell the only way this will work in future with real multi-gpu "chiplet" type designs... is an actual game engine designed from the beginning to work with multiple GPUs and their own RAM or most likely some form of multi-gpu with *shared* RAM solution (which would make the problems associated with multi-gpu game engine easier - i think the main problem is access to other frames to generate current frame but information is in different VRAM so requires totally inefficient copying across continuously) So yes I think multi-GPU chiplet design would have to have a shared VRAM + cache amongst all GPUs - which is what SLI does not have now on seperate GPUs.
Really beg to differ...... Only lazy development from lazy developers showed it wasn't worth it. Anything from dice is a scalable dream.... Play valve games and they're amazing. Play a crytek game and it is a dream with more than one card...... Lazy development is what got multi-card setups the hack..... I can still game at 6880x2440 in some games maxed out on my quad sli Titans...... But sli has been dead because of other garbage....... Brand new systems with most powerful card can not run what I run at 120+fps 6880x2440 I know because I just built another one...... Five year old computer can run "supported" games faster than a sysytem built today.... Facts.
https://forums.guru3d.com/data/avatars/m/232/232349.jpg
H83:

Maybe i´m writing something really stupid because i have no expertise on this area but wouldn´t it be better to divide the workload differently between GPUs. Crysis for example, the GPU had to render everything including large portions of the island at a distance, is it possible to have one GPU rendering just the background and the physics and the other one running the rest of the scene??? This way the workload would be divided and the performance would increase. Of course this is a very simplistic approach and there are problems to solve like rendering everything in sync but still i wonder if this would work in real life. If my sugestion is really stupid, don´t be afraid to say it guys!
Kind of makes sense to me. It's the same that they're doing already between the CPU and the GPU as it is. In theory.....
https://forums.guru3d.com/data/avatars/m/218/218363.jpg
I welcome this with open arms. Back in the day you bought one card and then another one when the next generation hit the shelves. Both cards combined were more powerful than the new generation flagship card and cheaper too.
https://forums.guru3d.com/data/avatars/m/80/80129.jpg
H83:

Maybe i´m writing something really stupid because i have no expertise on this area but wouldn´t it be better to divide the workload differently between GPUs. Crysis for example, the GPU had to render everything including large portions of the island at a distance, is it possible to have one GPU rendering just the background and the physics and the other one running the rest of the scene??? This way the workload would be divided and the performance would increase. Of course this is a very simplistic approach and there are problems to solve like rendering everything in sync but still i wonder if this would work in real life. If my sugestion is really stupid, don´t be afraid to say it guys!
This is basically what the Hydra Engine was by Lucid (https://en.wikipedia.org/wiki/Hydra_Engine) There are a bunch of issues with it - for starters a ton of modern shaders in games use interframe data in order improve performance - if this data is sitting on another graphic cards then either the improvement can't be used or there is a massive performance penalty in getting it off that GPU onto the other GPU. Similarly, managing all these different elements as the scene shifts and recombining them in a single frame buffer for output takes time and thus effects performance. Managing the CPU threads to manage both GPUs is a nightmare too because you're essentially spending time before the scene even starts rendering figuring out how to divide the scene to avoid stalls across both GPUs. Then if the different GPUs have different feature sets it becomes even more complicated.. and it's all for what? So that the 25 people with SLI/Xfire can benefit slightly at the expense of everyone else because all the interframe optimization is now gone? I think most devs, ones who are even capable of doing this kind of low-level hardware development, look at it and go "it's not even close to being worth it" and that's it.
https://forums.guru3d.com/data/avatars/m/246/246171.jpg
As far as I'm aware (and maybe I'm wrong), one of the main problems is that GPUs can only render 1 frame at a time. Each "stage" throughout the rendering process doesn't take up an equal amount of resources. So, although I think Nvidia's CFR idea is a good one, what if things were taken a step further, where any idle cores were used to calculate the next frame in parallel? Although I don't know how GPUs work at the driver level, here's my very crude estimate of how each frame is rendered. Each "stage" may take up dozens of clock cycles: 1. The GPU receives, compiles, and parses new frame data to calculate 2. Calculate physics (if necessary) 3. Set up the mesh geometry to fit the viewport 4. Apply textures 5. Apply lighting effects 6. Perform ray tracing or calculate reflections 7. Run post-processing effects 8. Return any data back to the program that it may be expecting Obviously, not all of these stages need the same amount of compute power. Some need fewer GPU cores than others. Some could get by with half-precision floats. So, what if idle cores were always used to render the next frame? It's a similar idea to AFR, except instead of splitting up the entire frame rendering process per die, you split up the individual stages of frame rendering between individual cores. This should help reduce latency and maximize untapped resources. EDIT: And this is different from H83's idea, which to my understanding, designates certain regions of the frame to be rendered on separate cores. Of course, there must be some fundamental flaw in this idea (or, my assumption that only 1 frame is rendered at a time is wrong), or else it'd have been done already.
https://forums.guru3d.com/data/avatars/m/216/216349.jpg
Denial:

This is basically what the Hydra Engine was by Lucid (https://en.wikipedia.org/wiki/Hydra_Engine) There are a bunch of issues with it - for starters a ton of modern shaders in games use interframe data in order improve performance - if this data is sitting on another graphic cards then either the improvement can't be used or there is a massive performance penalty in getting it off that GPU onto the other GPU. Similarly, managing all these different elements as the scene shifts and recombining them in a single frame buffer for output takes time and thus effects performance. Managing the CPU threads to manage both GPUs is a nightmare too because you're essentially spending time before the scene even starts rendering figuring out how to divide the scene to avoid stalls across both GPUs. Then if the different GPUs have different feature sets it becomes even more complicated.. and it's all for what? So that the 25 people with SLI/Xfire can benefit slightly at the expense of everyone else because all the interframe optimization is now gone? I think most devs, ones who are even capable of doing this kind of low-level hardware development, look at it and go "it's not even close to being worth it" and that's it.
Well that´s it then. Thanks for explaining it!
https://forums.guru3d.com/data/avatars/m/242/242134.jpg
@Netherwind I dont remember what market you in or what cards you referred to, but during the time i worked in shops in germany/US, i have never seen 2 cards being faster than the top one, or they were not cheaper (e.g. two of the 2nd biggest chip), and most games would still not be running as smooth as with a single card (micro stutter) needed a bigger cpu and most of the time a bigger psu (vs biggest chip). And if a new gen dropped, virtually all chips of the previous gen dropped in price, not just the smaller ones, so not an argument..
https://forums.guru3d.com/data/avatars/m/196/196426.jpg
I wonder which software does tile-based rendering the best ?... Something Cinema 4D, something Blender ... Apply the same concept to multi-GPUs and RTX will get a LOT faster ! One GPU is pretty damn fast these days for classic raster, but it's down on it's knees with RTX ON... Answer: RTX ON x2 (and of course $$$ x2, because why not)
https://forums.guru3d.com/data/avatars/m/218/218363.jpg
fry178:

@Netherwind I dont remember what market you in or what cards you referred to, but during the time i worked in shops in germany/US, i have never seen 2 cards being faster than the top one, or they were not cheaper (e.g. two of the 2nd biggest chip), and most games would still not be running as smooth as with a single card (micro stutter) needed a bigger cpu and most of the time a bigger psu (vs biggest chip). And if a new gen dropped, virtually all chips of the previous gen dropped in price, not just the smaller ones, so not an argument..
As always, my memory doesn't serve me well so I don't remember exactly which cards I had but I think I rocked two GTX 970 which in perfect scaling were faster than the 1080 (not sure about the 1080Ti). Then something similar with older nVidia cards and even ATi cards.
https://forums.guru3d.com/data/avatars/m/247/247281.jpg
cryohellinc:

I think this all is heading towards chiplet design GPU's. In several years most like all of us will use SLI or a technology similar to it in one way or another.
Agreed. On the surface, the move to chiplet design seems to be what they're going for. Games have been moving away from fully supporting traditional multiple GPU setups (SLI + Crossfire) for years. I can't imagine SLI, as we know it today, being their end goal. It will probably benefit it, though, regardless. 🙂
data/avatar/default/avatar24.webp
fry178:

@Netherwind I dont remember what market you in or what cards you referred to, but during the time i worked in shops in germany/US, i have never seen 2 cards being faster than the top one, or they were not cheaper (e.g. two of the 2nd biggest chip), and most games would still not be running as smooth as with a single card (micro stutter) needed a bigger cpu and most of the time a bigger psu (vs biggest chip). And if a new gen dropped, virtually all chips of the previous gen dropped in price, not just the smaller ones, so not an argument..
There are plenty of games with near perfect sli scaling - frostbite titles have 97% sli scaling, and g-sync completely eliminates the micro stutter associated with sli.
https://forums.guru3d.com/data/avatars/m/118/118854.jpg
Correct me if I am wrong here, Didn't ati in the past used similar method in xfire mode? I thought they used this once before.
data/avatar/default/avatar31.webp
DeskStar:

Really beg to differ...... Only lazy development from lazy developers showed it wasn't worth it. ....
The basic problem is it's now down to the game devs. DX11 multigpu was mostly done in drivers, so it was down to the gpu driver writers (i.e. AMD/Nvidia). The low levelness of DX12/Vulcan moves most of that work over to the game devs. For a lot of game devs it's too hard and just not worth the effort. Hence no Sli/Xfire. In one sense they are lazy. Alternatively you could argue that the shift to low level api's has made the driver writers lazy - now their job is really simple as they've pushed a lot of the work they used to do over to the game devs. Imo that's part of the reason AMD pushed low level api's so hard. DX11 and other high level api's mean gpu drivers are complex and because Nvidia had more people they would do a better job. Mantle and now Vulcan/DX12 pushes that job over to the game devs and makes driver writing simple so it effectively evens the driver writing field which suits AMD.