ADA (Lovelace) GPU Architecture; 76.3B transistors
This design, built on a unique TSMC 4N process, provides more raster, raytracing, and AI-accelerated computation performance over the previous generation Ampere. The AD102 GPU has 76.3 billion transistors and a surface area of 608.4 mm2. This indicates that the transistor density of 125.5 million per mm2 is 2.78x higher than Samsung fabbed GA102 Ampere GPU built on the 8N node. NVIDIA Ada (named after the mathematician) has something new called Shader Execution Reordering (SER), which is said to speed up raster operations and provide up to 25% improved gaming performance. Ada is also fitted with next-generation RT Cores (Gen3) and faster Tensor cores (Gen4). The latter can achieve up to 1400 TFLOPS, which is 4,375 times greater than Ampere's third-generation cores.
|Shader Cores||L2 Cache|
The ADA GPU
Team Green has shown the most powerful Lovelace GPU, the biggest GPU ADA102 has up to 76 billion transistors and, like Hopper, is built on TSMC's 4N node. Regular shaders, as well as the raytracing and Tensor cores, have all been improved. At its initial price of USD 1,499, the GeForce RTX 3090 was $1,000 less than the Nvidia Titan RTX. Unfortunately, we don't see this trend continuing, but the RTX 4090 will likely be priced between $1,499 and $1,999 depending on AIB designs, making it competitive with the RTX 3090 Ti, the current king of the RTX hill. We now turn our focus to the RTX 4080 and perhaps announced later RTX 4070; we had hoped that their initial retail pricing of $699 and $499, respectively, would be maintained. However, the recent increase in the cost of silicon wafers may cause a 10% increase in the MSRP of RTX 4000 GPUs. The CUDA Core (Shaders/Stream) count is going to rise on all Nvidia hardware, the RTX 4090 graphics card will contain 16,384 Shading Cores. Below is an overview of what we think are the specs; these will be updated once more and official information arrives.
|Speculated specs||GeForce RTX 4090||GeForce RTX 4080||GeForce RTX 4070|
|Architecture||Ada (TSMC 4NM)||Ada (TSMC 4NM)||Ada (TSMC 4NM)|
|Base Clock||2235 MHz||TBC||TBC||TBC||TBC|
|Boost Clock||2520 MHz||2.51 GHz||2.61 GHz||TBC||TBC|
|Raytracing cores||128 Gen3||76 Gen3||60 Gen3||TBC||TBC|
|Tensor Cores||512 Gen4||304 Gen4||240 Gen4|
|Memory||24 GB G6X||16 GB G6X||12GB G6X||12GB G6X||10GB G6X|
|Memory Speed||21 Gbps||22.4 Gbps||21 Gbps||21 Gbps||21 Gbps|
|Bandwidth||1008 GB/s||717 GB/s||504GB/s||504 GB/s||420 GB/s|
|PCIe||Gen4 x16||Gen4 x16||Gen4 x16|
|Launch Date||October 12th, 2022||November 2022||TBA||TBA|
|Price||$1599 / 1959 EUR||$1199 / 1469 EUR||$899 / 1099 EUR|
Although Nvidia kept the RTX 4000's specifications tightly under wraps, a few tidbits have leaked over time. It has been speculated that the flagship AD102 GPU die would be used in the upcoming RTX 4090 and RTX 4080 graphics cards, resulting in a ~70% increase in CUDA/Shader cores available compared to the RTX 3000 series equivalent. A fully enabled version of the AD102 GPU would see well over 18K Shader cores. Nvidia's RTX 4000 Series graphics cards are built on TSMC's 4/5nm production node, promising improved performance over the RTX 3000 Series' 8nm GPUs. Nvidia can pack more transistors onto the GPU by using a more compact process node, increasing its processing speed. Since ray tracing and DLSS are still crucial technologies for GeForce graphics cards, Nvidia will undoubtedly work to improve their efficiency. Ada Lovelace's architecture denotes an update of its streaming multiprocessors. Each of them gives up to twice the performance. Nvidia is also adding a new reordering option for shader execution. It should speed up shading in the GPU pipeline by rescheduling real-time jobs to ensure they are completed as efficiently as feasible. According to Nvidia, this improves overall gaming performance by up to 25% and is two to three times faster for ray tracing.
Raytracing and Tensor Cores
The third generation of ray tracing cores is also introduced in Lovelace, increasing the throughput of ray-triangle interceptions. Then fourth generation of Tensor cores must also offer up to four times the throughput of its predecessor. Additionally, AV1 encoding will be supported by RTX 40 series GPUs. For the first time, the upscaling approach DLSS is getting a new 3.0 version that can generate its own frames for higher frame rates. DLSS 3.0 is only available on RTX 40 cards and does not work on GPUs from previous generations. NVIDIA engineers have developed three new features in the Ada RT Core to enable high-performance ray tracing of highly complex geometry:
- First, Ada’s Third-Generation RT Core features 2x Faster Ray-Triangle Intersection Throughput relative to Ampere; this enables developers to add more detail into their virtual worlds.
- Second, Ada’s RT Core has 2x Faster Alpha Traversal; the RT Core features a new Opacity Micromap Engine to directly alpha-test geometry and significantly reduce shader-based alpha computations. With this new functionality, developers can very compactly describe irregularly shaped or translucent objects, like ferns or fences, and directly and more efficiently ray trace them with the Ada RT Core.
- Third, the new Ada RT Core supports 10x Faster BVH Build in 20X Less BVH Space when using its new Displaced Micro-Mesh Engine to generate micro-triangles from micromeshes on-demand. The micro-mesh is a new primitive that represents a structured mesh of micro-triangles that the Ada RT Core processes natively, saving the storage and processing compared to what is normally required when describing complex geometries using only basic triangles.
Tensor Cores are technological high-performance compute cores designed for matrix multiply and accumulating math operations utilised in AI and HPC applications. Tensor Cores deliver unprecedented performance for matrix calculations, which are crucial for deep learning neural network training and inference functions at the edge. Ada outperforms Ampere in terms of FP16, BF16, TF32, INT8, and INT4 Tensor TFLOPS, and also incorporates the Hopper FP8 Transformer Engine, which yields over 1.3 PetaFLOPS of tensor processing in the RTX 4090. of course fos us consumers this means these will be applied for DLSS.
A new video engine
Ada GPUs advance streaming and video content by adding AV1 video encoding support to the Ada eighth-generation dedicated hardware encoder (NVENC). Ampere GPUs of previous generations supported AV1 decoding but not encoding. Ada's AV1 encoder is 40% more efficient than the GeForce RTX 30 Series GPUs' H.264 encoder. AV1 will allow users who are already broadcasting at 1080p to boost their resolution to 1440p while maintaining the same bitrate and quality. For users with 1080p displays, streams will appear similar to 1440p, resulting in improved quality. Dual NVENC encoders are included on Ada GeForce RTX 40 Series GPUs with at least 12 GB of memory to improve encoding performance. This supports video encoding at 8K/60 or four 4K/60 for professional video editing. (Game streaming services can also utilise this to enable more concurrent sessions, for example.) DaVinci Resolve by Blackmagic Design, the Voukoder plugin for Adobe Premiere Pro, and Jianying, the leading video editing tool in China, all enable AV1 compatibility and a dual encoder via encode presets. In October, dual encoder and AV1 compatibility will be available for these applications. NVIDIA is also collaborating with the popular video effects application Notch to enable AV1 and with Topaz to offer support for AV1 and dual encoders. In addition to NVENC, Ada GPUs feature the fifth-generation hardware decoder, which was introduced with Ampere (known as NVDEC). NVDEC supports hardware-accelerated MPEG-2, VC-1, H.264 (AVCHD), H.265 (HEVC), VP8, VP9, and AV1 video decoding. 8K/60 decoding is also supported in full.