ASUS ROG GeForce RTX 4090 MATRIX review (Page 4)

ASUS ROG GeForce RTX 4090 MATRIX review

Graphics cards 1049 Page 4 of 30 Published 2023-09-19 15:00 by Hilbert Hagedoorn

GPU Architecture

ADA (Lovelace) GPU Architecture; 76.3B transistors

This design, built on a unique TSMC 4N process, provides more raster, raytracing, and AI-accelerated computation performance over the previous generation Ampere. The AD102 GPU has 76.3 billion transistors and a surface area of 608.4 mm2. This indicates that the transistor density of 125.5 million per mm2 is 2.78x higher than Samsung fabbed GA102 Ampere GPU built on the 8N node. NVIDIA Ada (named after the mathematician) has something new called Shader Execution Reordering (SER), which is said to speed up raster operations and provide up to 25% improved gaming performance. Ada is also fitted with next-generation RT Cores (Gen3) and faster Tensor cores (Gen4). The latter can achieve up to 1400 TFLOPS, which is 4,375 times greater than Ampere's third-generation cores.


	Shader Cores	L2 Cache
AD102	18,432	96MB
AD103	10,752	64MB
AD104	7,680	48MB
AD106	4,608	32MB
AD107	3,072	32MB

The ADA 102 GPU

Team Green has shown the most powerful Lovelace GPU, which has up to 76 billion transistors and, like Hopper, is built on TSMC's 4N node. Regular shaders, as well as the raytracing and Tensor cores, have all been improved. At its initial price of USD 1,499, the GeForce RTX 3090 was $1,000 less than the Nvidia Titan RTX. Unfortunately, we don't see this trend continuing, but the RTX 4090 will likely be priced between $1,499 and $1,999 depending on AIB designs, making it competitive with the RTX 3090 Ti, the current king of the RTX hill. We now turn our focus to the RTX 4080 and perhaps announced later RTX 4070, we had hope that their initial retail pricing of $699 and $499, respectively, would be maintained. However, the recent increase in the cost of silicon wafers may cause a 10% increase in the MSRP of RTX 4000 GPUs. 899 USD is the cheapest version. The CUDA Core (Shaders/Stream) count is going to rise on all Nvidia hardware, the RTX 4090 graphics card will contain 16,384 Shading Cores. Below is an overview of what we think are the specs; these will be updated once more and official information arrives.


Speculated specs	GeForce RTX 4090	GeForce RTX 4080		GeForce RTX 4070
Architecture	Ada (TSMC 4NM)	Ada (TSMC 4NM)		Ada (TSMC 4NM)
GPU	AD102-300	AD103-300	AD104	AD104-400	AD104-250
SMs	128	76	60	60	56
CUDA Cores	16384	9728	7680	7680	7168
Base Clock	2235 MHz	TBC	TBC	TBC	TBC
Boost Clock	2520 MHz	2.51 GHz	2.61 GHz	TBC	TBC
Raytracing cores	128 Gen3	76 Gen3	60 Gen3	TBC	TBC
Tensor Cores	512 Gen4	304 Gen4	240 Gen4
Memory	24 GB G6X	16 GB G6X	12GB G6X	12GB G6X	10GB G6X
Memory Bus	384-bit	256-bit	192-bit	192-bit	160-bit
Memory Speed	21 Gbps	23 Gbps	21 Gbps	21 Gbps	21 Gbps
Bandwidth	1008 GB/s	736 GB/s	504GB/s	504 GB/s	420 GB/s
Socket Power	12VHPWR	12VHPWR	12VHPWR
PCIe	Gen4 x16	Gen4 x16	Gen4 x16
TGP	450W	340W	320W
Launch Date	October 12th, 2022	November 2022		TBA	TBA
Price	$1599 / 1959 EUR	$1199 / 1469 EUR	$899 / 1099 EUR

Although Nvidia kept the RTX 4000's specifications tightly under wraps, a few tidbits have leaked over time. It has been speculated that the flagship AD102 GPU die would be used in the upcoming RTX 4090 and RTX 4080 graphics cards, resulting in a ~70% increase in CUDA/Shader cores available compared to the RTX 3000 series equivalent. A fully enabled version of the AD102 GPU would see well over 18K Shader cores. Nvidia's RTX 4000 Series graphics cards are built on TSMC's 4/5nm production node, promising improved performance over the RTX 3000 Series' 8nm GPUs. Nvidia can pack more transistors onto the GPU by using a more compact process node, increasing its processing speed. Since ray tracing and DLSS are still crucial technologies for GeForce graphics cards, Nvidia will undoubtedly work to improve their efficiency. Ada Lovelace's architecture denotes an update of its streaming multiprocessors. Each of them gives up to twice the performance. Nvidia is also adding a new reordering option for shader execution. It should speed up shading in the GPU pipeline by rescheduling real-time jobs to ensure they are completed as efficiently as feasible. According to Nvidia, this improves overall gaming performance by up to 25% and is two to three times faster for ray tracing.

Raytracing and Tensor Cores

The third generation of ray tracing cores is also introduced in Lovelace, increasing the throughput of ray-triangle interceptions. Then fourth generation of Tensor cores must also offer up to four times the throughput of its predecessor. Additionally, AV1 encoding will be supported by RTX 40 series GPUs. For the first time, the upscaling approach DLSS is getting a new 3.0 version that can generate its own frames for higher frame rates. DLSS 3.0 is only available on RTX 40 cards and does not work on GPUs from previous generations. NVIDIA engineers have developed three new features in the Ada RT Core to enable high-performance ray tracing of highly complex geometry:

First, Ada’s Third-Generation RT Core features 2x Faster Ray-Triangle Intersection Throughput relative to Ampere; this enables developers to add more detail into their virtual worlds.
Second, Ada’s RT Core has 2x Faster Alpha Traversal; the RT Core features a new Opacity Micromap Engine to directly alpha-test geometry and significantly reduce shader-based alpha computations. With this new functionality, developers can very compactly describe irregularly shaped or translucent objects, like ferns or fences, and directly and more efficiently ray trace them with the Ada RT Core.
Third, the new Ada RT Core supports 10x Faster BVH Build in 20X Less BVH Space when using its new Displaced Micro-Mesh Engine to generate micro-triangles from micromeshes on-demand. The micro-mesh is a new primitive that represents a structured mesh of micro-triangles that the Ada RT Core processes natively, saving the storage and processing compared to what is normally required when describing complex geometries using only basic triangles.

Tensor Cores are technological high-performance compute cores designed for matrix multiply and accumulating math operations utilised in AI and HPC applications. Tensor Cores deliver unprecedented performance for matrix calculations, which are crucial for deep learning neural network training and inference functions at the edge. Ada outperforms Ampere in terms of FP16, BF16, TF32, INT8, and INT4 Tensor TFLOPS, and also incorporates the Hopper FP8 Transformer Engine, which yields over 1.3 PetaFLOPS of tensor processing in the RTX 4090. of course fos us consumers this means these will be applied for DLSS.

A new video engine

Ada GPUs advance streaming and video content by adding AV1 video encoding support to the Ada eighth-generation dedicated hardware encoder (NVENC). Ampere GPUs of previous generations supported AV1 decoding but not encoding. Ada's AV1 encoder is 40% more efficient than the GeForce RTX 30 Series GPUs' H.264 encoder. AV1 will allow users who are already broadcasting at 1080p to boost their resolution to 1440p while maintaining the same bitrate and quality. For users with 1080p displays, streams will appear similar to 1440p, resulting in improved quality. Dual NVENC encoders are included on Ada GeForce RTX 40 Series GPUs with at least 12 GB of memory to improve encoding performance. This supports video encoding at 8K/60 or four 4K/60 for professional video editing. (Game streaming services can also utilise this to enable more concurrent sessions, for example.) DaVinci Resolve by Blackmagic Design, the Voukoder plugin for Adobe Premiere Pro, and Jianying, the leading video editing tool in China, all enable AV1 compatibility and a dual encoder via encode presets. In October, dual encoder and AV1 compatibility will be available for these applications. NVIDIA is also collaborating with the popular video effects application Notch to enable AV1 and with Topaz to offer support for AV1 and dual encoders. In addition to NVENC, Ada GPUs feature the fifth-generation hardware decoder, which was introduced with Ampere (known as NVDEC). NVDEC supports hardware-accelerated MPEG-2, VC-1, H.264 (AVCHD), H.265 (HEVC), VP8, VP9, and AV1 video decoding. 8K/60 decoding is also supported in full.

Share this content

Twitter Facebook Reddit WhatsApp Email Print

Beelink GTR7 Pro (Ryzen 9 7940HS) MiniPC review

Deepcool Assassin IV air cooler review