New Technologies Introduced With Pascal
Technologies introduced with Pascal - with the announcement of the GeForce GTX 1070 & 1080 Nvidia also launched a few new technologies, a new way of making screenshots, VR Audio, Improved multi-monitor support and SLI, these are items I wanted to run through.
Nvidia Ansel: 360 Degree Screenshots
Named after a famous photographer, Nvidia intros Ansel, a new way of making in-game screenshots. Capturing stills from games we pretty much do on a daily basis here at Guru3D. With the Ansel announcement Nvidia steps it up a notch, Nvidia even called it an Artform (I wouldn't go that far though). Screenshots typically are based on a 2D image taken from a 3D rendered scene. Nvidia figures (with VR in mind) why not grab a 360 screenshot in-game (if the game supports Ansel technology) so that you can grab a still, save it and then later on use your VR headset to look at the screenshot in 3D. It can also be used to create super-resolution screenshots or just "regular" screenshots to which you can then apply EXR effects and filters.
Ansel offers the ability to grab screenhots in 3D at incredible resolutions, up-to 61,440 x 34,560 pixels with silly sized screengrabs that can be 1.5 GB for one grab. Funky however is that Nvidia "borrowed" some SweetFX ideas. After you've captured the screenshot you can alter the RAW data and this makes an image darker/lighter, set color tone and thus apply filters to that screenshot (think Instagram effects). While not "necessary", Ansel was designed with VR in mind so that you can grab a still and then alter it and then watch it in 3D with your Oculus Rift or HTC Vive. Ansel will also become available for previous generation products and is not a Pascal specific thing.
Ansel does need game support, some titles that do and will support it: Tom Clancy's The Division, The Witness, Law Breakers, The Witcher Wild Hunt, Paragon, No Man's Sky and Unreal Tournament will be the first adopter games to offer support for this new way of grabbing screenshots.
Nvidia FAST SYNC
Nvidia is adding a new SYNC mode. This mode works especially well with high FPS games, the folks that like to play Counter-Strike at 100 FPS. With this feature, Nvidia is basically decoupling the render engine and display by using a third buffer. This method was specifically designed for that high FPS demographic, the new sync mode will eliminate stuttering and screen tearing with high FPS games and thus offers low latency across the board. It can be combined with GSYNC (which works great with the lower spectrum of refresh rates). Fast Sync is a latency-conscious alternative to traditional Vertical Sync (V-SYNC) that eliminates tearing, while allowing the GPU to render unrestrained by the refresh rate to reduce input latency. If you use V-SYNC ON, the pipeline gets back-pressured all the way to the game engine, and the entire pipeline slows down to the refresh rate of the display. With V-SYNC ON, the display is essentially telling the game engine to slow down, because only one frame can be effectively generated for every display refresh interval. The upside of V-SYNC ON is the elimination of frame tearing, but the downside is high input latency. When using V-SYNC OFF, the pipeline is told to ignore the display refresh rate, and to deliver game frames as fast as possible. The upside of V-SYNC OFF is low input latency (as there is no back-pressure), but the downside is frame tearing. These are the choices that gamers face today, and the vast majority of eSports gamers are playing with V-SYNC OFF to leverage its lower input latencies, lending them a competitive edge. Unfortunately, tearing at high FPS causes a vast amount of jittering, which can hamper their gameplay.
NVIDIA has decoupled the front end of the render pipeline from the backend display hardware. This allows different ways to manipulate the display that can deliver new benefits to gamers. Fast Sync is one of the first applications of this new approach. With Fast Sync, there is no flow control. The game engine works as if V-SYNC is OFF. And because there is no back-pressure, input latency is almost as low as with V-SYNC OFF. Best of all, there is no tearing because FAST SYNC chooses which of the rendered frames to scan to the display. FAST SYNC allows the front of the pipeline to run as fast as it can, and it determines which frames to scan out to the display, while simultaneously preserving entire frames so they are displayed without tearing. The experience that FAST SYNC delivers, depending on frame rate, is roughly equal to the clarity of V-SYNC ON combined with the low latency of V-SYNC OFF.
Another technology that was introduced is again VR related. Nvidia offers VRWorks, a set of developer tools that allows the devs to improve their games with VR. One of the new additions is VRWorks Audio, this technology makes it possible to simulate GPU reflections and absorption of audio waves within a virtual 3D-space. Basically the GPU will calculate and predict how certain audio would sound if it bounces off a hard floor or a soft one combined with other objects it bounces off. For example, if you talk in a room with concrete walls it would sound different opposed to that same room with carpets hanging on the walls. The last time that a GPU manufacturer added 3D audio over the GPU it failed to become a success alright, that would be AMD True Audio.
The question arises, will VRWorks audio create enough interest and momentum so that developers will actually implement it? To demo all the possibilities of VRWorks Nvidia will release a new demo-application soon, it is called Nvidia VR Funhouse. The application will not just demo VRWorks Audio but also Physics simulations in a VR environment.
SMP - Simultaneous Multi-Projection
One of the more interesting and bigger new technologies demoed at the GeForce GTX Series 1000 Pascal launch was Simultaneous Multi-Projection, which is pretty brilliant for people using three monitors. You know it, when you place your right and left monitors at an angle, the game will get warped and looks as if the angle does not match. See the screengrab below:
Look at the angle related bends when you place your surround monitors at an angle.
New with Pascal is simultaneous multi-projection, a technology that allows the GPU to calculate a maximum of sixteen camera view points, at nearly no performance loss. Previous gen GPU-architectures can only do the math on one cam viewing angle / point of view. This feature is not software based, it is located in hardware in the GPU pipeline. So why would you want to be able to do this you might wonder? Well, I spilled the beans already a bit in the opening paragraph. Currently when you game on multiple screens you are looking at a "Projection" of one 2D image. Now, if you have one 2D Image on three monitors, then it's only going to look good if the monitors are standing in straight lines next to each other. When you angle the "curve" of the monitors around you, the angles will distort the image, e.g. a straight line would have a bend. Some games have fixes for this, but nothing solid. Well, with Pascal this is a thing of the past as this is exactly where Simultaneous Multi-Projection comes in. With the help of your driver control panel you can alter the angle of your monitors so that it matches how you have set up the monitors. The 3D imagery will now be calculated for each screen on the GPU, based on the angle of your monitors. So if you were to surround yourself with three monitors, the rendered images displayed will not be warped, but will be displayed correctly.
Have a look at the screengrab again, now with Simultaneous Multi-Projection enabled.
The beauty here is that due to the added core-logic on the GPU, this angle correction does not come at a performance loss, or at least a very little one. SMP obviously also helps out in VR environments where typically you need to do all kinds of tricks for the normally two rendered images versus lenses and warping. To be able to do this in one pass in hardware on the GPU will create huge performance increases for the upcoming GeForce GTX 1070 and 1080 on VR. Again, this is hardware based and thus cannot be added to Fermi and/or Maxwell models with a driver update. The Simultaneous Multi-Projection Engine is capable of processing geometry through up to 16 preconfigured projections, sharing the center of projection (the viewpoint), and with up to 2 different projection centers, offset along the X axis. Projections can be independently tilted or rotated around an axis. Since each primitive may show up in multiple projections simultaneously, the SMP engine provides multi-cast functionality, allowing the application to instruct the GPU to replicate geometry up to 32 times (16 projections x 2 projection centers) without additional application overhead as the geometry flows through the pipe.
GPU Boost 3.0
A few years ago Nvidia introduced boost modes for the graphics cards. Typically a GPU clock frequency was fixed at a certain MHz, they altered that to a base frequency, and then a Boost frequency. That Boost frequency would allow the GPU to reach high clocks if, say, the temperature of the GPU is low enough, or say the GPU would have low enough load. So, ever since it was introduced, dynamic clock frequencies and voltages have become a popular thing, Nvidia calls this Nvidia Boost, and it has now reached revision three. A fundamental change has been made as the GPU is now even more adaptive and allows for per voltage point frequency offsets. Meaning at each stage on MHz you will have a certain tolerance in voltage that point can take. The advantage here is that each stage can get an optimal voltage for your boost and thus overclocking frequency. It is highly complex, but does offer a new technology to make these cards run faster at even higher clock frequencies.
So basically new addition would be:
- Adjust per point clock frequency frequency offset (controlled by the end user).
- Overvoltage setting mode changes (exact voltage->set range 0%~100%), voltages are now based on percentage .
- Add new limit option -> No load limit (=utilization limit)
- This new NVAPI only supports PASCAL GPUs. meaning the new features that we discuss and sho today only will work on GeForce GTX 1070 and 1080 (and other TBA products).
Basically with future updates in overclocking software you will see multiple stages of control:
- Regular voltage control in percentage (no longer can fixed/exact Voltage offsets be used). Maximum voltage will vary based on temperature.
Then on the GPU core frequency:
- Basic mode - a single clock frequency offset applied to all V/F points.
- Linear mode control - You can specify a frequency offset for the maximum clock and minimum clocks. In AfterBurner this linearly interpolates to fill a curve.
- Manual mode - per point frequency offset control through the V/F editor in AfterBurner.
An Update To SLI
With Pascal there is a change invoked for SLI. One critical ingredient to NVIDIA’s SLI technology is the SLI Bridge, which is a digital interface that transfers display data between GeForce graphics cards in a system. Two of these interfaces have historically been used to enable communications between three or more GPUs (i.e., 3-Way and 4-Way SLI configurations). The second SLI interface is required for these scenarios because all other GPUs need to transfer their rendered frames to the display connected to the master GPU, and up to this point each interface has been independent.
Beginning with NVIDIA Pascal GPUs, the two interfaces are now linked together to improve bandwidth between GPUs. This new dual-link SLI mode allows both SLI interfaces to be used in tandem to feed one Hi-res display or multiple displays for NVIDIA Surround. Dual-link SLI mode is supported with a new SLI Bridge called SLI HB. The bridge facilitates high-speed data transfer between GPUs, connecting both SLI interfaces, and is the best way to achieve full SLI clock speeds with GeForce GTX 1080 GPUs running in SLI. The GeForce GTX 1080 is also compatible with legacy SLI bridges; however, the GPU will be limited to the maximum speed of the bridge being used. Using this new SLI HB Bridge, GeForce GTX 1080’s new SLI interface runs at 650 MHz, compared to 400 MHz in previous GeForce GPUs using legacy SLI bridges. Where possible though, older SLI Bridges will also get a speed boost when used with Pascal. Specifically, custom bridges that include LED lighting will now operate at up to 650 MHz when used with GTX 1080, taking advantage of Pascal’s higher speed IO.
No More 3 & 4-Way SLI
NVIDIA no longer recommends 3 or 4 way systems for SLI and places its focus on 2-way SLI only. Only a handful of applications like Futuremark synthetic testing can work with extreme setups. But for the consumer and end-users 2-way is the way to go and also where it needs to end.
New Multi GPU Modes
Compared to prior DirectX APIs, Microsoft has made a number of changes that impact multi-GPU functionality in DirectX 12. At the highest level, there are two basic options for developers to use multi-GPU on NVIDIA hardware in DX12: Multi Display Adapter (MDA) Mode, and Linked Display Adapter (LDA) mode. LDA Mode has two forms: Implicit LDA Mode which NVIDIA uses for SLI, and Explicit LDA Mode where game developers handle much of the responsibility needed for multi-GPU operation to work successfully. MDA and LDA Explicit Mode were developed to give game developers more control. The following table summarizes the three modes supported on NVIDIA GPUs:
In LDA Mode, each GPU’s memory can be linked together to appear as one large pool of memory to the developer (although there are certain corner case exceptions regarding peer-to-peer memory); however, there is a performance penalty if the data needed resides in the other GPU’s memory, since the memory is accessed through inter-GPU peer-to-peer communication (like PCIe). In MDA Mode, each GPU’s memory is allocated independently of the other GPU: each GPU cannot directly access the other’s memory. LDA is intended for multi-GPU systems that have GPUs that are similar to each other, while MDA Mode has fewer restrictions—discrete GPUs can be paired with integrated GPUs, or with discrete GPUs from another manufacturer—but MDA Mode requires the developer to more carefully manage all of the operations that are needed for the GPUs to communicate with each other. By default, GeForce GTX 1070/1080 SLI supports up to two GPUs. 3-Way and 4-Way SLI modes are no longer recommended. As games have evolved, it is becoming increasingly difficult for these SLI modes to provide beneficial performance scaling for end users. For instance, many games become bottlenecked by the CPU when running 3-Way and 4-Way SLI, and games are increasingly using techniques that make it very difficult to extract frame-to-frame parallelism. Of course, systems will still be built targeting other Multi-GPU software models including:
- MDA or LDA Explicit targeted
- 2 Way SLI + dedicated PhysX GPU