"Previously, when a large texture needed to be read, one instruction would be issued, and one shader circuit would need to make several passes while other circuits sat idle," NVIDIA spokesman Hector Marinez said. "But [NVIDIA] patent authors Emmett Kilgariff and Rui Bastos figured out a way to allow for a partial texture load. By breaking the texture load into smaller pieces – able to be completed in one pass each – all circuits can keep firing."
Textures above 32-bit (64-bit or 128-bit) require more than the one pass. This was because the texture structures were monolithic and took multiple cycles to be executed, during which time all other shader units would have to remain idle. With these two patent authors' invention, the larger textures can actually be broken down into smaller ones, which can be completed more rapidly, removing the idleness of the other GPU segments.
"But to do that, the monolithic texture-load instructions had to be split into chunks. Break a 128-bit texture into four pieces – each of which can be completed in one pass – and that lets one cycle-hungry instruction be broken into four instructions. Doing this means that other circuits keep processing instructions – no more waiting," Marinez added.
"Kilgariff and Bastos [also] discovered they could reorder instructions for greater efficiency. For instance, if a texture for instruction 1 is not immediately available, the shader circuit could get to work on instruction 2. Instructions don't back up in a queue [and] textures render faster, [providing] more seamless game play."
These solutions are outlined in patent no. 7609272. Elements of this patent have been used in already-existing NVIDIA graphics processing units, including the GeForce 6 family of graphics adapters (launched in 2004) and the Reality Synthesizer (RSX) GPU co-developed by NVIDIA for the Sony PlayStation 3.