Undervolting the Yeston GeForce RTX 4070 Ti

Graphics cards 1049 Page 4 of 20 Published by

teaser

GPU Architecture

GPU Architecture

This design, built on a unique TSMC 4N process, provides more raster, ray-tracing, and AI-accelerated computation performance over the previous generation Ampere. The AD102 GPU has 76.3 billion transistors and a surface area of 608.4 mm2. It indicates that the transistor density of 125.5 million per mm2 is 2.78x higher than Samsung fabbed  GA102 Ampere GPU built on the 8N node. NVIDIA Ada (named after the mathematician) has something new called Shader Execution Reordering (SER), which is said to speed up raster operations and provide up to 25% improved gaming performance. Ada is also fitted with next-generation RT Cores (Gen3) and faster Tensor cores (Gen4). The latter can achieve up to 1400 TFLOPS, 4,375 times greater than Ampere’s third-generation cores.


index.php?ct=articles&action=file&id=81761

GeForce RTX 4090

GeForce RTX 4080

GeForce RTX 4070

Architecture

Ada (TSMC 4NM)

Ada (TSMC 4NM)

Ada (TSMC 4NM)

GPU

AD102-300

AD103-300

AD104-400

AD104-250

SMs

128

76

60

56

Shader Cores

16384

9728

7680

7168

Boost Clock

2520 MHz

2.51 GHz

2.61 GHz

TBC

Ray-tracing cores

128 Gen3

76 Gen3

60 Gen3

TBC

Tensor Cores

512 Gen4

304 Gen4

240 Gen4

Memory

24 GB G6X

16 GB G6X

12GB G6X

10GB G6X

Memory Bus

384-bit

256-bit

192-bit

160-bit

Memory Speed

21 Gbps

22.4 Gbps

21 Gbps

21 Gbps

Bandwidth

1008 GB/s

717 GB/s

504 GB/s

420 GB/s

Socket Power

12VHPWR

12VHPWR

12VHPWR

PCIe 

Gen4 x16

Gen4 x16

Gen4 x16

TGP

450W

320W

285W

Launch Date

October 12th, 2022

November 2022

January 2023

Price 

$1599 / 1959 EUR

$1199 / 1469 EUR

$899 / 1099 EUR


Although Nvidia kept the RTX 4000’s specifications tightly under wraps, a few tidbits have leaked. It has been speculated that the flagship AD102 GPU die would be used in the upcoming RTX 4090 and RTX 4080 graphics cards, resulting in a ~70% increase in CUDA/Shader cores available compared to the RTX 3000 series equivalent. A fully enabled version of the AD102 GPU would see well over 18K Shader cores. Nvidia’s RTX 4000 Series graphics cards are built on TSMC’s 4/5nm production node, promising improved performance over the RTX 3000 Series’ 8nm GPUs. Nvidia can pack more transistors onto the GPU by using a more compact process node, increasing its processing speed. Since ray tracing and DLSS are still crucial technologies for GeForce graphics cards, Nvidia will undoubtedly work to improve their efficiency. Ada Lovelace’s architecture denotes an update of its streaming multiprocessors. Each of them gives up to twice the performance. Nvidia is also adding a new reordering option for shader execution. It should speed up shading in the GPU pipeline by rescheduling real-time jobs to ensure they are completed as efficiently as feasible. According to Nvidia, this improves overall gaming performance by up to 25% and is two to three times faster for ray tracing.

Ray-tracing and Tensor Cores

The third generation of ray tracing cores is also introduced in Lovelace, increasing the throughput of ray-triangle interceptions. Then the fourth generation of Tensor cores must also offer up to four times the throughput of its predecessor. Additionally, AV1 encoding will be supported by RTX 40 series GPUs. For the first time, the upscaling approach DLSS is getting a new 3.0 version to generate its frames for higher frame rates. DLSS 3.0 is only available on RTX 40 cards and does not work on GPUs from previous generations. NVIDIA engineers have developed three new features in the Ada RT Core to enable high-performance ray tracing of highly complex geometry:

  • First, Ada’s Third-Generation RT Core features 2x Faster Ray-Triangle Intersection Throughput relative to Ampere; this enables developers to add more detail to their virtual worlds. 
  • Second, Ada’s RT Core has 2x Faster Alpha Traversal; the RT Core features a new Opacity Micromap Engine to alpha-test geometry and significantly reduce shader-based alpha computations directly. With this new functionality, developers can very compactly describe irregularly shaped or translucent objects, like ferns or fences, and more efficiently trace them with the Ada RT Core. 
  • Third, the new Ada RT Core supports 10x Faster BVH Build in 20X Less BVH Space when using its new Displaced Micro-Mesh Engine to generate micro-triangles from micro meshes on-demand. The micro-mesh is a new primitive representing a structured mesh of micro-triangles that the Ada RT Core processes natively, saving the storage and processing compared to what is usually required when describing complex geometries using only basic triangles.

Tensor Cores are technological high-performance compute cores for matrix multiplying and accumulating math operations in AI and HPC applications. Tensor Cores deliver unprecedented performance for matrix calculations, crucial for deep learning neural network training and inference functions at the edge. Ada outperforms Ampere in terms of FP16, BF16, TF32, INT8, and INT4 Tensor TFLOPS and also incorporates the Hopper FP8 Transformer Engine, which yields over 1.3 PetaFLOPS of tensor processing in the RTX 4090. of course, for us consumers, these will be applied for DLSS.

Video engine

Ada GPUs advance streaming and video content by adding AV1 video encoding support to the Ada eighth-generation dedicated hardware encoder (NVENC). Ampere GPUs of previous generations supported AV1 decoding but not encoding. Ada’s AV1 encoder is 40% more efficient than the GeForce RTX 30 Series GPUs’ H.264 encoder. AV1 will allow users already broadcasting at 1080p to boost their resolution to 1440p while maintaining the same bitrate and quality. For users with 1080p displays, streams will appear similar to 1440p, resulting in improved quality. Dual NVENC encoders are included on Ada GeForce RTX 40 Series GPUs with at least 12 GB of memory to improve encoding performance. This supports video encoding at 8K/60 or 4K/60 for professional video editing. (Game streaming services can also utilize this to enable more concurrent sessions, for example.) DaVinci Resolve by Blackmagic Design, the Voukoder plugin for Adobe Premiere Pro, and Jianying, China’s leading video editing tool, enable AV1 compatibility and a dual encoder via encode presets. In October, dual encoder and AV1 compatibility will be available for these applications. NVIDIA is also collaborating with the popular video effects application Notch to enable AV1 and Topaz to offer support for AV1 and dual encoders. In addition to NVENC, Ada GPUs feature the fifth-generation hardware decoder, which was introduced with Ampere (NVDEC). NVDEC supports hardware-accelerated MPEG-2, VC-1, H.264 (AVCHD), H.265 (HEVC), VP8, VP9, and AV1 video decoding. 8K/60 decoding is also supported in full.

Share this content
Twitter Facebook Reddit WhatsApp Email Print