BFG GeForce 8800 GTX review

Graphics cards 1049 Page 4 of 15 Published by

teaser

Page 4

The Unified state of DirectX 10.

GeFore 8800 GTX & GTS review - Copyright 2006 Guru3D.comWhat you need to understand that the new microarchitecture of the the Dx10 GPU (Graphics Processing Unit) has been changed significantly, the generic elements are all still there.

Despite the fact that graphics cards are all about programmability and thus shaders these days you'll notice in today's product that we'll not be talking about pixel and vertex shaders much anymore. With the move to DirectX 10 we now have a new technology called Unified shader technology and graphics hardware will adapt to that model, it's very promising. DirectX 10 is scheduled to ship at the beginning of next year with the first public release version of Windows Vista. It will definitely change the way software developers make games for Windows and very likely benefit us gamers in terms of better gaming visuals and better overall performance.

The thing is, with DirectX 10 Microsoft has removed what we call the fixed function pipeline completely (what you guys know as 16 pixel pipelines, for example) and allowing it to make everything programmable. How does that relate to new architecture? Have a look.

The new architecture is all about programmability and thus shaders. But the term Shader might be a little Shady for you.

What is a shader ?

To understand what is so important today first allow me to explain what a shader is and how does it relate to rendering all that gaming goodness on your screen (the short version).

What do we need to render a three dimensional object; 2D on your monitor? We start off by building some sort of structure that has a surface, that surface is being built from triangles. And why triangles? They are quick and easy to compute. How's each triangle being processed? Each triangle has to be transformed according to its relative position and orientation to the viewer. Each of the three vertices the triangle is made up of is transformed to its proper view space position. The next step is to light the triangle by taking the transformed vertices and applying a lighting calculation for every light defined in the scene. At last the triangle needs to be projected to the screen in order to rasterize it. During rasterization the triangle will be shaded and textured.

Graphic processors like the GeForce series are able to perform a large sum of these tasks. Actually the first generation was able to draw shaded and textured triangles in hardware, which was a revolution. The CPU still had the burden to feed the graphics processor with transformed and lit vertices, triangle gradients for shading and texturing, etc. Integrating the triangle setup into the chip logic was the next step and finally even transformation and lighting (TnL) was possible in hardware, reducing the CPU load considerably (surely everyone remembers the GeForce 256). The big disadvantage was that a game programmer had no direct (i.e. program driven) control over transformation, lighting and pixel rendering because all the calculation models were fixed on the chip, and that is the point where shaders come in. We now we finally get to the stage where we can explain Shaders. Vertex and Pixel shaders allow software and game developers to program tailored transformation and lighting calculations as well as pixel coloring functionality.

And here's the answer to your the initial question and also reach the new G80 micro-architecture; each shader is basically nothing more than a relatively small program (programming code) executed on the graphics processor to control either vertex or pixel processing. So a Pixel or Vertex unit in fact is/was a small Pixel or Vertex shader processor within your last generation GPU and such a processor is basically a floating point processor.

What is a shader core then? In the past graphics processors have had dedicated units for diverse types of operations in the rendering pipeline, such as vertex processing and pixel shading as we just explained. A good idea to understand that is to have a look at the image below.

GeFore 8800 GTX & GTS review - Copyright 2006 Guru3D.com

Each of these unit's above the L1 cache (memory) is a shader core. For the unified architecture, NVIDIA engineered a single floating point shader core with multiple independent processors. Each of these independent processors is capable of handling any type of shading operation, including pixel shading, vertex shading, geometry shading, and yes, physics shading.

Pixel Shaders .. Vertex Shaders and now  .. Geometry Shaders

With the introduction of DirectX 8 & 9, in the traditional way when you executed a shader instruction you had to to send it to either the pixel or the vertex processor. And when you think about that it a little more that seems somewhat inefficient, as you could have the pixel shader units 100% utilized while the vertex units were only 60% utilized. And that's a waste of resources, efficiency and power, thus power consumption as you are not using a lot of transistors. The new approach in DirectX 10 (DX10) and thus new generation of graphics processors is simple. Any shader; pixel or vertex is being sent to a unified shader processor and executed. This way you can 100% utilize the architecture and have as little performance loss as possible as you can use ALL shader processors on the GPU. That's pretty cool from an efficiency point of view as maximum utilization means more computing power, which means either more eye candy on your screen at a better rendering frame rate. So this entire story has one word written all over it: efficiency.

You will notice that NVIDIA will call the unified shader processors the stream processors. And the stream processors will manage pixel, vertex and Geometry shaders. That's right, geometry shaders! We have a new third shader. DX10 and thus Shader Model 4 is exciting. Like the three musketeers, all for one and one for all.

Geometry Shaders

Again we need to get a little techie here I'm afraid. You might want to skip this part unless you are a true geek.

We already discussed Pixel and Vertex Shaders, but with DX10 comes a new shader: Geometry Shaders (GS). Geometry Shaders do some quite specific things that make no common sense for a PS/VS program and that is why this new shader was introduced.

A Geometry shader will be a innovative set of shaders present in next generation graphics hardware like GeForce 8 Series and Radeon R600. Geometry shaders do per-primitive operations on vertices grouped into primitives like triangles, lines, strips and points outputted by vertex shaders. Geometry shaders can make copies of the inputted primitives; so unlike a vertex shader, they can actually create new vertices. Examples of use include shadow volumes on the GPU, render-to-cubemap and procedural generation. A geometry shader works at a larger level of granularity than vertices (which are at a larger granularity than pixels): triangles, objects, lines, strips, points. Primitives.

So after the vertices are processed by the vertex shader, the geometry shader can be utilized to push further work on them. And that's exactly where the money shot is to be found as a  limitation of the traditional vertex shader is that it really cant create new vertices. This is where the geometry shader surfaces, as it can be used to work on the edges of a triangle to create a different figure.

GeFore 8800 GTX & GTS review - Copyright 2006 Guru3D.comThe surface (rock vertices) of this demo are created at random and realtime with a Geometry shader. When the camera moves up the GPU will calculate new surface constantly and endlessly, random if you wish. This is a very good example of a the usage of a geometry shader. Small interesting side-note;  the water follows the path of the surface and thus reacts to with it, which is a physics model.

Now let me try to explain this in more simple wording as most of you likely did not understand a word of what I just tried to explain. Example: imagine a landscape, usually precomputed static data. By firing off geometry shader code to the Unified shader processors you could very well have all the landscape generated randomly or simply change calculated real-time in the graphics processor. And that is something very cool. There are of course thousands of applications in which you can use a Geometry shader. Think of stuff like automatic stencil shadow polygon generation, skinning operations, physical simulations (hair for example) environment map creation, better and more complex point sprites, adaptive subdivision, all that while offloading work from the CPU and Guru3D believes that the impact of Geometry shaders will be bigger than expected.

Dynamic creation of geometry on the GPU, that's what you need to remember. Now at this stage in the DX10 pipe-line the GPU also can do something I already mentioned: Physical simulations creation. A couple of examples that NVIDIA gave us:

  • Litter and Debris
  • Smoke and Fog that moves
  • Cloth and fluid flows with object and characters
  • large amounts of rubble (collapsing buildings, explosions, avalanches)
  • Rampaging tornados full of debris
  • swarms of insects
  • So many more possibilities!

As you can see this is a very hot topic in game rendering as in the end we can push gaming to a new dimension. Close at this stage in the DX10 pipeline we can do a lot of other cool stuff. A function called stream out for example.

Stream output is very important and useful new DirectX 10 feature as it enables data generated from geometry shaders (or vertex shaders if geometry shaders are not used) to be sent to memory buffers and subsequently forwarded back into the top of the GPU pipeline to be processed again. Basically what I'm saying here is such dataflow permits even more complex geometry processing, advanced lighting calculations and GPU-based physical simulations with little CPU involvement. You simply keep the data to be altered in the pipeline.

A good example of the new stream architecture for example allows us to do instancing dozens of not hundreds of the same objects with a slightly different movement can fill your screen without a huge impact on performance as it hardly requires CPU calculations.

GeFore 8800 GTX & GTS review - Copyright 2006 Guru3D.com

So DirectX 10 and its related new hardware products offer a good number of improvements. So much actually that it would require an article on its own. And since we are here to focus on NVIDIA's two new products we'll take a shortcut at this stage in the article. Discussed in our Guru3D forums I often have seen the presumption that DX10 is only a small improvement over DX9 Shader Model 3.0. Ehm yes and no. I say it's a huge step as a lot of constraints are removed for the software programmers. The new model is more simple, easy to adapt and allows heaps of programmability, which in the end means a stack of new features and eye candy in your games.

Whilst I will not go into detail about the big differences I simply would like to ask you to look at the chart below and draw your own conclusions. DX10 definitely is a large improvement, yet look at it as a good step up.

GeFore 8800 GTX & GTS review - Copyright 2006 Guru3D.com

Here you can see how DirectX's Shader Models have evolved ever since DX8 Shader Model 1.

So I think what you need to understand is that DirectX 10 doesn't commence a colossal fundamental change in new capabilities; yet it brings expanded and new features into DirectX that will enable game developers to optimize games more thoroughly and thus deliver incrementally better visuals and better frame rates, which obviously is great. How fast will it be adopted well, Microsoft is highlighting the DX10 API as God's gift to the gaming universe yet what they forget to mention is that all developers who support DX10 will have to continue supporting DirectX9 as well and thus maintain two versions of the rendering code in their engine as DXD10 is only available on Windows Vista and not XP, which is such a drama.

You heard the rumors and it's false, DirectX 9.0L will NOT make Windows XP DX10 compatible as it is the other way around, if you have DirectX 9 hardware, you will be using DirectX 9.0L as your API in Windows Vista. With that statement you also need to realize that a DX10 card like the G80 is fully DX9 compatible!

However, you can understand that from a game developer point of view it brings a considerable amount of additional workload and cost to PC game development until Vista finally becomes mainstream.

Regardless of the immense marketing hype, DirectX 10 just is not extraordinarily different from DirectX 9, you'll mainly see good performance benefits on DirectX 10 rather than vastly prominent visual differences with obviously a good number of exceptions here and there; but DX is evolving into something better and faster.

GeFore 8800 GTX & GTS review - Copyright 2006 Guru3D.com
Stretchy skin, geometry shaders as you alter surface vertices on the fly. Poor Froggy.

With the introduction of Unified Shader technology the industry will also make you believe that GPU's no longer have a pixel pipeline. That's true but not entirely, we obviously are still dealing with a pixel pipeline yet the dynamics simply have changed.

Stating that this product has 24 pixel pipelines does not apply anymore and that by itself will force a shift on how we need to look at new GPU microarchitecture. So I'm afraid that from now on, we can't say ooh this product has 24 pixel pipelines. The new method of making you guys understand what we are talking about and relate that to performance will simply be the cumulative number of shader processors.

Just remember this: we have moved from a fixed pipeline based architecture to a processor based architecture.

With that in mind that "number" of processors will be our new and more easy to understand and comprehend method of relating how fast a product "can" be. I know this is shady to explain.

Prepare for the impact now, the GeForce 8800 GTS has 96 shader processors / stream processors and the GeForce 8800 GTX has 128 of these unified processor units. Think for a moment about the GeForce 7900 GTX and relate that to its 8 vertex and 24 pixel processors. See the parallel already?

The internal GPU clocks have changed quite a bit also. A year or two ago our own Alexey (Rivatuner programmer) made a discovery. NVIDIA's architecture all of the sudden was showing registers for multiple clocks coming from the graphics processor. So at that time it became clear that, for example, a GeForce 7900 GTX has three (and likely more) different internal clocks. This is something you need to get used to as the G80 series has many clocked domains within the graphics processor and everything seems to be asynchronous, which is quite interesting as everything in the history of graphics cards has been developed in a very parallel manner.

So as contradicting as it may sound the GeForce 8800 GTX has a "generic" 575 MHz core clock, yet its memory is running at 900 MHz (x2) and get this, the Shader processors are clocked at 1350 MHz. And I'm pretty confident that we can find a few other clocks in there as well. Memory is a tad weird also. The GTX for example has no less than 768 MB of memory and it's 384-bits wide, now this is where things can get a little tricky to understand but there is no 384 Bit wide memory with the GTS at 320-bits.

Share this content
Twitter Facebook Reddit WhatsApp Email Print