Guru3D.com
  • HOME
  • NEWS
    • Channels
    • Archive
  • DOWNLOADS
    • New Downloads
    • Categories
    • Archive
  • GAME REVIEWS
  • ARTICLES
    • Rig of the Month
    • Join ROTM
    • PC Buyers Guide
    • Guru3D VGA Charts
    • Editorials
    • Dated content
  • HARDWARE REVIEWS
    • Videocards
    • Processors
    • Audio
    • Motherboards
    • Memory and Flash
    • SSD Storage
    • Chassis
    • Media Players
    • Power Supply
    • Laptop and Mobile
    • Smartphone
    • Networking
    • Keyboard Mouse
    • Cooling
    • Search articles
    • Knowledgebase
    • More Categories
  • FORUMS
  • NEWSLETTER
  • CONTACT

New Reviews
G.Skill TridentZ 5 RGB 6800 MHz CL34 DDR5 review
Be Quiet! Dark Power 13 - 1000W PSU Review
Palit GeForce RTX 4080 GamingPRO OC review
Core i9 13900K DDR5 7200 MHz (+memory scaling) review
Seasonic Prime Titanium TX-1300 (1300W PSU) review
F1 2022: PC graphics performance benchmark review
MSI Clutch GM31 Lightweight​ (+Wireless) mice review
AMD Ryzen 9 7900 processor review
AMD Ryzen 7 7700 processor review
AMD Ryzen 5 7600 processor review

New Downloads
CPU-Z download v2.04
Intel ARC graphics Driver Download Version: 31.0.101.4090
AMD Radeon Software Adrenalin 23.1.2 (RX 7900) download
GeForce 528.24 WHQL driver download
Display Driver Uninstaller Download version 18.0.6.0
Download Intel network driver package 27.8
ReShade download v5.6.0
Media Player Classic - Home Cinema v2.0.0 Download
HWiNFO Download v7.36
MSI Afterburner 4.6.5 (Beta 4) Download


New Forum Topics
Monitor turns black and windows disable my GPU driver AMD Software: Adrenalin Edition 23.1.2 for AMD Radeon™ RX 7900 Series Amernime Zone AMD Software: Adrenalin / Pro Driver - Release Discovery 22.12.2 WHQL NVIDIA GeForce 528.24 WHQL driver download & Discussion 7-Zip decompression speed test, the Intel Core i9-13900K is 60% faster than the i9-12900K. Forspoken implements Microsoft's DirectStorage API, faster load times; lowers FPS; raises FPS? 4th proprietary graphics driver is now available exclusively for AMD Radeon RX 7900 XTX and XT Intel Shares Fourth-Quarter and Full-Year 2022 Financial Results AMD Software: Adrenalin Edition 22.11.2 - Driver download and discussion Corsair 10GB/s MP700 PCIe Gen5 SSD got unveiled, but quickly gets hidden




Guru3D.com » Review » Colorful GeForce RTX 3060 Bilibili 12G review » Page 4

Colorful GeForce RTX 3060 Bilibili 12G review - GPU Architecture

by Hilbert Hagedoorn on: 10/05/2021 02:27 PM [ 5] 7 comment(s)

Tweet

Ampere GPU Architecture 

Ampere, of course, is the base unit of electric current in the international system of units. But the GPU is named after André-Marie Ampère, a French mathematician and physicist, considered the father of electrodynamics. NVIDIA has a track record of naming their GPU architectures after mathematicians and physicists or prominent figures from closely related fields, to name a few; Pascal, Fermi, Kepler, Maxwell and, more recently, Turing. While it was no secret that the new GPUs would be based on Ampere, we've seen much discussion about fabrication nodes, architecture, and specifications. Still, everybody seems to have forgotten that Ampere already launched earlier this year for the HPC market. NVIDIA announced three Ampere based graphics cards. Shortly before announcements, specifications of the GeForce RTX 3070, 3080 and 3090 had leaked onto the web; however, it ended with a twist in Shader core counts as they doubled up from what everybody expected. These GPUs are fabricated on an 8nm node derived from Samsung. This process is a further development of Samsung's 10nm process, which means that no EUV is applied in production just yet. 

GeForce RTX 3060 

The GeForce RTX 3060 is built around the GA106 GPUholding 3584 shader cores running at 1.78 GHz. It will get 12GB of last-gen GDDR6 (not X) memory that runs at 15 Gbps on a 192-bit bus. This card is introduced at a price of $329 USD for the Founder Edition cards (NVIDIA's in-house (own) model). 

GeForce RTX 3060 Ti

The GeForce RTX 3060 Ti is built around the same chip that the RTX 3070 has, a revised GA104-200 GPUholding 4864 shader cores running at 1.67 GHz. It will get 8GB of last-gen GDDR6 (not X) memory that runs at 14 Gbps on a 256-bit bus. This card is introduced at a price of $399 USD for the Founder Edition cards (NVIDIA's in-house (own) model). Further down on this page we tabled up all the sexy details and geeky specs.

GeForce RTX 3070

Opposed to the 3080 and 3090, the GeForce RTX 3070 is built around a GA104-300 GPU; it still holds a substantial chunk 'o transistors though, as there are a proper 5888 Shader cores running at 1.73 GHz. It will get 8GB of last-gen GDDR6 (not X) memory that runs at 14 Gbps on a 256-bit bus. As we mentioned earlier, GA104 is based on 8nm node fabrication; NVIDIA however has closed a deal with TSMC to move at least some production to TSMC at 7nm, as they cannot produce these puppies fast enough to meet demand. This card is introduced at a price of $499 USD for the Founder Edition cards (NVIDIA's in-house (own) model). Further down on this page we tabled up all the sexy details and geeky specs.

GeForce RTX 3080

The GeForce RTX 3080 is based on the GA102-200 GPU and will get 8704 Shader cores clocking in at 1710 MHz. This card sees 10GB of GDDR6X memory fitted and it is running at 19 Gbps. So, that is a 320-bit bus, which is still a gnarly whopping 760 GB/s of memory bandwidth. Rated at a TGP of 320W, the card is connected with a 12-pin connector. With just these specifications, it should be as fast as the GeForce RTX 2080 Ti. The Founder editions are released at a price of $699 USD.

GeForce RTX 3090

The GeForce RTX 3090 comes with 24 GB of GDDR6X memory running on a 384-bit bus at 19.5 Gbps and that boils down to a frightful 936 GB/s of effective memory bandwidth. The GPU enabling it is the GA102-300 GPU, and it holds a comprehensive 10469 Shader cores. The clock frequency for the Shader cores will tick at 1695 MHz alongside a 350W rating on energy consumption, that's not a TDP. The TGP (Total Graphics Power) describes the maximum amount of graphics board power that the system power supply should be able to provide to the graphics card. The Founder editions are released at a price of $1499 USD.

   

RTX 2080 Ti

RTX 3090RTX 3080RTX 3070RTX 3060 TiRTX 3060
GPU 12nm TU102 8nm GA102-300 8nm GA102-200 8nm GA104-300 8nm GA104-200 8nm GA106-300
Transistors 18.6 Billion 28 Billion 28 Billion 17 Billion 17 Billion 13 Billion
Shader Cores 4352 10496 8704 5888 4864 3584
Raytracing Cores 68 82 68 46 38 28
Tensor Cores 544 328 272 184 152 112
ROPs 96 96 96 96 80 48
Texture Units 272 328 272 184 152 112
Base Clock 1350 MHz 1400 MHz 1440 MHz 1500 MHz 1410 MHz 1320 MHz
Boost Clock 1635 MHz 1695 MHz 1710 MHz 1730 MHz 1665 MHz 1780 MHz
Memory 11GB G6 24GB G6X 10GB G6X 8GB G6 8GB G6 12GB G6
Memory Clock 14 Gbps 19.5 Gbps 19 Gbps 14 Gbps 14 Gbps 15 Gbps
Memory Freq 7000 MHz 9750 MHz 9500 MHz 7000 MHz 7000 MHz 7000 MHz
Memory Bus 352-bit 384-bit 320-bit 256-bit 256-bit 192-bit
Bandwidth 616 GB/s 936 GB/s 760 GB/s 448 GB/s 448 GB/s 360 GB/s
Shader Perf TFLOP 13.4 35.6 29.8

20.3

16.2

13

PCIe Gen 3.0 x16 4.0 x16 4.0 x16

4.0 x16

4.0 x16

4.0 x16

TGP 320W 350W 320W

220W

200W

170W

Price $1199 $1499 $699

$499

$399

329 USD

Released Out September 24 September 17

October 27th

December 1st

February 25th

  

Ampere architecture

Ampere has an updated architecture that has gotten a new SM (Streaming Multiprocessor) design. One SM is a cluster that holds your Shader processors. As most of you have noticed, the Shader processor count was a bit of an enigma; it seems that mysteriously the Shader count has doubled up from what was expected in the first place. GA102/GA104 however contains three different types of compute cores:

  • Programmable Shading Cores, which consist of Shader/Stream/CUDA Cores, Ampere received double the Shading capabilities.
  • RT Cores, which accelerate Bounding Volume Hierarchy (BVH) traversal and intersection of scene geometry during ray-tracing, a gen 2 unit that now is twice as fast.
  • Tensor Cores, which provide enormous speedups for AI neural network training and inferencing

The GPC is the dominant high-level hardware block with all of the key graphics processing units dwelling inside the GPC. Each GPC includes a dedicated Raster Engine, and now also includes two ROP partitions (each partition containing eight ROP units), which is a new feature for NVIDIA Ampere Architecture GA10x GPUs. The GPC includes six TPCs that each includes two SMs and one PolyMorph Engine. Each SM in GA10x GPUs then 128 Shading Cores, four third-generation Tensor Cores, a 256 KB Register File, four Texture Units, one second-generation ray-tracing core, and 128 KB of L1/Shared Memory, which can be configured for differing capacities depending on the needs of the compute or graphics workloads

So there would be further nuances to explain. Changes have been made to the Streaming Multiprocessor design that holds the Shading cores. The RTX 3000 series GPUs hold SMs that hold fp32 compute units. Ampere architecture supports parallel execution of FP32 and INT32 operations with independent thread scheduling. That's also described as concurrent execution of FP32 and INT32 operation. New, seen from Turing, is a combination of an INT32/FP32 cluster of Shader processors that effectively doubles up that Shader count. We'll show by example: 

 

Ampere SM - Look to the left side cluster, INT32+FP32 is a significant change

 

The RTX 3000 series GPUs hold SMs that in their core blocks hold FP32 compute units, and that was one in the past generation as well (Turing). However, look closer. One cluster holding the INT32 is now INT32 + FP32. So to reiterate, the Ampere SM has a new datapath design for FP32 and INT32 operations. One datapath in each partition consists of 16 FP32 shader cores capable of executing 16 FP32 operations per clock. Another datapath consists of both 16 FP32 shader cores and 16 INT32 cores. And therein is the secret sauce to be found as that doubles up to twice the Shading throughput. The result of this change (compared to top Turing) is that the unit is capable of executing 32 FP32 operations per clock, or 16 FP32 and 16 INT32 operations per clock. One SM in its entirety can now execute 128 FP32 operations per clock, and that is double the FP32 rate of a Turing SM (which does 64 FP32 and 64 INT32 operations per clock). Performance gains will vary at the Shader and application level depending on the mix of instructions. According to NVIDIA, ray-tracing denoising shaders are good examples that should benefit greatly from doubling FP32 throughput. Twice the Shading performance of course can create bottlenecks all by themselves at an earlier stage in the pipeline. Therefore it has twice the shared memory and L1 cache performance for the SM, which would be 128 bytes/clock per Ampere SM versus 64 bytes/clock in Turing. Total L1 (128KB) bandwidth for GeForce RTX 3080 is 219 GB/sec versus 116 GB/sec (96KB) for GeForce RTX 2080 Super (Turing). Each segment then leads to one Tensor core and of course an RT core, both again renewed.

GA106 Block diagram

Ampere is formed based on Graphics Processing Clusters (GPCs), Texture Processing Clusters (TPCs), Streaming Multiprocessors (SMs), Raster Operators (ROPS), and later the memory controllers. The GPC is the dominant high-level hardware block with all of the key graphics processing units residing inside the GPC. Each GPC includes a dedicated Raster Engine. Ampere has one more change here, it carries two ROP partitions (each partition containing eight ROP units), which is a new feature for NVIDIA Ampere Architecture GA10x GPUs. 

PCI Express Gen 4.0

New on the spec list is support for PCI-express 4.0. Competitor AMD had been making big bets with the 2019 NAVI products and already moved to PCIe Gen 4.0 as well as their chipsets and processors. But what does PCIe Gen 4.0 bring to the table? Well, simply put, more bandwidth for data to pass through. 

   

PCIe Gen

Line Code

Transfer Rate

x1 Bandwidth

x4

x8

x16

1.0

8b/10b

2.5 GT/s

250 MB/s

1 GB/s

2 GB/s

4 GB/s

2.0

8b/10b

5 GT/s

500 MB/s

2 GB/s

4 GB/s

8 GB/s

3.0

128b/130b

8 GT/s

1 GB/s

4 GB/s

8 GB/s

16 GB/s

4.0

128b/130b

16 GT/s

2 GB/s

8 GB/s

16 GB/s

32 GB/s

5.0

128b/130b

32 GT/s

4 GB/s

16 GB/s

32 GB/s

64 GB/s

 

On the 4.0 interface, you’ll be hard-pressed to run out of bandwidth as each lane gets doubled up in that bandwidth, per lane. Of course, there has been a recent PCI-Express Gen 5.0 announcement as well, for ease of mind I already inserted it into the table. What benefits will you have at PCIe gen 4.0 with a graphics card? If we are to believe Nvidia's performance claims, the high-end RTX 3000 cards will probably benefit more from the new standard to make the most of the graphics memory buffers. 

8K  - AV1 - HDMI 2.1

New is active hardware-accelerated support for the new AV1 video compression standard. Ampere based graphics cards can play back over their video decoder engine without utilizing your processor. NVIDIA specifically mentions decoder and not encoder. The NVENC API will not support encoding at this time. It has been talked about for a long time, but HDMI 2.1 is noted as a specification. The advantages of HDMI 2.1 are significant as the signal bandwidth can carry a lot over that HDMI cable (with a compatible display). HDMI 2.1 brings 144Hz support at a 4K resolution, as well as 60Hz at 8K. 




29 pages « 3 4 5 6 next »



Related Articles
Colorful GeForce iGame RTX 4090 Vulcan OC-V review
Colorful is in da house with their really exquisite Colorful GeForce iGame RTX 4090 Vulcan OC-V, it offers all that a flahship premiu 4090 has to offer and then they thow in high-color 480 x 128-pixel...

Colorful GeForce RTX 3060 Bilibili 12G review
All your base are belong to Colorful, join us as we review their iGame GeForce RTX 3060 bilibili E-Sports edition OC, also with 12GB, 3584 shading processors activated but with a proper factory boost ...

Colorful iGame GeForce RTX 3090 KUDAN review
Colorful offers a super over-engineered GeForce RTX 3090 KUDAN edition graphics card. Tweaked to the max, huge power delivery, and a card that remains silent for a product in this enthusiast range., t...

Colorful GeForce RTX 3080 iGAME VULCAN review
In this review, we benchmark the GeForce RTX 3080 iGAME VULCAN from Colorful. perhaps this is the most beautiful 3080 out there (you'll get it once you see it). This is a very nice customized and fas...

© 2023