Guru3D.com
  • HOME
  • NEWS
    • Channels
    • Archive
  • DOWNLOADS
    • New Downloads
    • Categories
    • Archive
  • GAME REVIEWS
  • ARTICLES
    • Rig of the Month
    • Join ROTM
    • PC Buyers Guide
    • Guru3D VGA Charts
    • Editorials
    • Dated content
  • HARDWARE REVIEWS
    • Videocards
    • Processors
    • Audio
    • Motherboards
    • Memory and Flash
    • SSD Storage
    • Chassis
    • Media Players
    • Power Supply
    • Laptop and Mobile
    • Smartphone
    • Networking
    • Keyboard Mouse
    • Cooling
    • Search articles
    • Knowledgebase
    • More Categories
  • FORUMS
  • NEWSLETTER
  • CONTACT

New Reviews
be quiet Pure Loop 2 FX 280mm LCS review
HP FX900 1 TB NVMe Review
Scythe FUMA2 Rev.B CPU Cooler review
SK Hynix Platinum P41 2TB M.2 NVMe SSD Review
Corsair K70 RGB PRO Mini Wireless review
MSI MPG A1000G - 1000W PSU Review
Goodram IRDM PRO M.2 SSD 2 TB NVMe SSD Review
Samsung T7 Shield Portable 1TB USB SSD review
DeepCool LS720 (LCS) review
Fractal Design Pop Air RGB Black TG review

New Downloads
Display Driver Uninstaller Download version 18.0.5.4
FurMark Download v1.31
Intel HD graphics Driver Download Version: 31.0.101.3222
Intel ARC graphics Driver Download Version: 30.0.101.1743
AMD Radeon Software Adrenalin 22.7.1 driver download
GeForce 516.93 WHQL Studio driver download
Corsair Utility Engine Download (iCUE) Download v4.26.110
ReShade download v5.3.0
AIDA64 Download Version 6.75
7-Zip v22.01 Download


New Forum Topics
Nvidia reduces revenue prediction due to video card demand. NVIDIA GeForce Hotfix Driver Version 516.79 What reason to go to Windows 11? Intel Core i9-13900K with and without power management settings Enable Resize Bar by yourself in every game EK Releases New Delta2 TEC for LGA 1700 and Intel 12th Gen Core CPUs Micron is developing 21Gbps and 24Gbps GDDR6X memory chips. Possible Ryzen 7000-series CPU Specifications and Pricing Leak; Ryzen 9 7950X to Reach 5.7 GHz [3rd-Party Driver] Amernime Zone Radeon Release Nemesis 22.6.1 WHQL DriverPack (22.7.1 pending ...) In Win Chopin MAX 3.3-liter Mini-ITX case with 54mm CPU height




Guru3D.com » News » Nvidia GP100 GPU architecture recap - full GPU has 3840 Shader processors

Nvidia GP100 GPU architecture recap - full GPU has 3840 Shader processors

by Hilbert Hagedoorn on: 04/12/2016 09:19 AM | source: | 47 comment(s)
Nvidia GP100 GPU architecture recap - full GPU has 3840 Shader processors

Last week Nvidia announced the GP100 GPU powering the Tesla P100 HPC module. More rumors are surfacing about GP104 as well. Since the full block diagrams for Gp100 are now available, we can also tell what a full GP100 looks like when fully enabled, in this post a little recap on the GP100 architecture and its positioning. 

This year several GPUs are going to be released from Nvidia, all based on their new Pascal architecture in a wide variety of segments in the market channels. For consumers the first wave of graphics cards it will be the GP104 GPU, these are empowering the high-end products like 'GTX 980' class products, the current rumor is that the new GTX 1070 and 1080 albeit with a bit of weird Full HD like naming, will use that chip. These should be announced during Computex time in June with availability in the summer (likely July). 

Then there is big Pascal, the big daddy Nvidia GPU developed under GPU codename GP100. This is the GPU that will empower (for the consumer side) the enthusiast class products e.g. the Titan etc. Make no mistake, this product will not launch anytime soon for consumers. Expect at the very best a launch late this year closer to the Christmas season, likely even later in Q1/Q2 2017 (we think). 

All Pascal products are based on a 16nm FinFet design and the GP100 in particular comes with stacked HBM2 (16GB in four stacks). The Pascal based GPU driving the unit holds 15 Billion transistors which is roughly double that of the current biggest Maxwell chip. Gp100 is huge at 600mm^2. The prognosis performance (according to Nvidia) is 5.3TFLOPS using 64-bit floating-point numbers and is rated at 10.6TFLOPS using 32-bit and 21.2TFLOPS using 16-bit. P100 has 4MB of L2 cache and 14MB of shared memory for just the register file. The following table provides a high-level comparison of Tesla P100 specifications compared to previous-generation Tesla GPU accelerators, however I added the GP100 as a fully enabled product:

Tesla Products Tesla K40 Tesla M40 Tesla P100 GP100
GPU GK110 (Kepler) GM200 (Maxwell) GP100 (Pascal) GP100 (Pascal)
SMs 15 24 56 60
TPCs 15 24 28 30
FP32 CUDA Cores / SM 192 128 64 64
FP32 CUDA Cores / GPU 2880 3072 3584 3840
FP64 CUDA Cores / SM 64 4 32 32
FP64 CUDA Cores / GPU 960 96 1792 1920
Base Clock 745 MHz 948 MHz 1328 MHz ~1328 MHz
GPU Boost Clock 810/875 MHz  1114 MHz 1480 MHz ~1480 MHz
Texture Units 240 192 224 240
Memory Interface 384-bit GDDR5 384-bit GDDR5 4096-bit HBM2 4096-bit HBM2
Memory Size Up to 12 GB Up to 24 GB 16 GB 16 GB
L2 Cache Size 1536 KB 3072 KB 4096 KB 4096 KB
Register File Size / SM 256 KB 256 KB 256 KB 256 KB
Register File Size / GPU 3840 KB 6144 KB 14336 KB 14336 KB
TDP 235 Watts 250 Watts 300 Watts ~300 Watts
Transistors 7.1 billion 8 billion 15.3 billion 15.3 billion
GPU Die Size 551 mm² 601 mm² 610 mm² 610 mm²
Manufacturing Process 28-nm 28-nm 16-nm 16-nm

As the block diagram now shows, the GP100 features six graphics processing clusters (GPCs). Just look at the diagram and count along with me - each GPC holds 10 streaming multiprocessors (SMs) and then each SM has 64 CUDA cores and four texture units. Do the math and you'll reach 640 shader processors per GPC and 3840 shader cores with 240 texture units in total.

  • 6 (GPC) x (10x64) = 3840 Shader processor units in total.

Meaning the GP100 used on the Tesla P100 is not fully enabled. Nvidia is known to out GPU that have disabled segments, it helps them selling different SKUs, the Tesla P100 holds a shader count of 3584 and thus has 56 SMs enabled (from the 60).

GP100’s SM incorporates 64 single-precision (FP32) CUDA Cores. In contrast, the Maxwell and Kepler SMs had 128 and 192 FP32 CUDA Cores, respectively. The GP100 SM is partitioned into two processing blocks, each having 32 single-precision CUDA Cores, an instruction buffer, a warp scheduler, and two dispatch units. While a GP100 SM has half the total number of CUDA Cores of a Maxwell SM, it maintains the same register file size and supports similar occupancy of warps and thread blocks.GP100’s SM has the same number of registers as Maxwell GM200 and Kepler GK110 SMs, but the entire GP100 GPU has far more SMs, and thus many more registers overall. This means threads across the GPU have access to more registers, and GP100 supports more threads, warps, and thread blocks in flight compared to prior GPU generations.

Since the graphics memory is on-die HBM2, the VRAM amount is fixed. That means that ALL GP100 products will get 16GB of memory. HBM2 will run a wide 4096-bit HBM2 (1024 bit per IC stack) memory interface running an effective bandwidth anywhere up-to a full 1 TB/s.

This is a big chip, very big at 600mm^2 hence it is interesting to see that 16nm can offer a lot in terms of clock frequency, The Tesla P100 is an enterprise part that ends up in servers, however this part already is clocked at 1328 MHz with Boost capabilities towards a frequency of 1480 MHz. Combined the TDP still remains to be under 300W. 



Nvidia GP100 GPU architecture recap - full GPU has 3840 Shader processors Nvidia GP100 GPU architecture recap - full GPU has 3840 Shader processors




« Nvidia Pascal Consumer card announced during Computex · Nvidia GP100 GPU architecture recap - full GPU has 3840 Shader processors · SteelSeries Apex M500 Gaming Mechanical Keyboard »

Related Stories

Download Nvidia GeForce 364.72 WHQL drivers - 03/28/2016 03:17 PM
You can download the new Nvidia GeForce 364.72 WHQL driver for Windows 7 - 8.1 and 10 in 32 and 64-bit flavors. This title adds VR support and game ready support for Quantum Break, Killer Instinct, ...

Nvidia GeForce X80 and X80 Ti Pascal Specs? - 03/17/2016 08:57 AM
In our forums somebody posted the spec list shown after the break. Indicating the Pascal graphics cards would be called GeForce X80 and X80Ti....

NVIDIA GameWorks SDK 3.1 Released - 03/15/2016 08:28 AM
NVIDIA today announced worldwide availability of the NVIDIA GameWorks software development kit (SDK) 3.1, which introduces three groundbreaking graphics techniques for shadows and lighting as well as ...

Download Nvidia GeForce 364.51 WHQL driver - 03/08/2016 09:23 PM
Nvidai updated to a new WHQL driver. This is a new driver following the issues Nvidia had with multi-monitor support. The latest version, 364.51 This driver ensures you will have an optimal experienc...

Download Nvidia GeForce 364.47 WHQL driver - 03/07/2016 04:38 PM
Nvidia just released a new GeForce 364.47 WHQL driver. It's aimed at an optimal experience with Tom Clancy’s The Division, Hitman, Need for Speed, Ashes of the Singularity, and Rise of th...


10 pages « 2 3 4 5 > »


CPC_RedDawn
Senior Member



Posts: 9385
Joined: 2008-01-06

#5256965 Posted on: 04/12/2016 01:44 PM
Nope GP104 as described in the news-item. GP100 will be used for a titan like equivalent.

Ah my bad. I thought these were for 1070 and 1080 specs not TITAN or 1080Ti specs.


Even the number of special function units are enough:

3840 fp32 + 1920 fp64 + 960 sfu = 6720 cores computing at the same time.

Is that really how it works? So can fp64 cores still be used on fp32 calculations?

fp64 is double precision right? and fp32 is single precision...? so what is sfu??

fantaskarsef
Senior Member



Posts: 13295
Joined: 2014-07-21

#5256981 Posted on: 04/12/2016 02:14 PM
If the article really holds through, and there will be only ONE way to work with GP100, which is with 16GB HBM2, we won't see much of a Pascal Ti, will we? How would they handle it compared to a Pascal Titan, which is definately coming out? Gimped chips? But with gimped chips, can they still connect 4x4GB HMB2? (970s architectural layout and how their last 0.5 of the 4GB was connected).

Also, I hope they forget about DP in the Pascal Tis, if there will be any. Or is it going to be usefull with dx12 in any way to a gamer?

xIcarus
Senior Member



Posts: 954
Joined: 2010-08-24

#5257020 Posted on: 04/12/2016 03:18 PM
Is that really how it works? So can fp64 cores still be used on fp32 calculations?

fp64 is double precision right? and fp32 is single precision...? so what is sfu??

Of course, if you have an FP64 register you can natively do FP32 operations on it. It's all about the size that can be worked with.

Let me show you an example. Suppose you have a 16-bit and an 8-bit register.
So
0000 0000 0000 0000
and
0000 0000

You can easily see how the 16-bit register is twice as large, and can thus easily acommodate for any operation that would otherwise fit on the 8-bit register.

All that remains are the physical computational units to have the necessary instructions (or know-how) in order to do the operations. FP units intended to work with different precisions should have the exact same instructions and hardware except the register size difference. On the lowest level they work the same.

The design of an ALU and FPU is made such that it's the most efficient when working with one type of data. Think of it as a number of steps which need to be executed in order to get a result.
An ALU for example is general-purpose enough so that it can enter a state called 'compatibility mode' which allows them to work with data that they were not designed for (data intended for the FPU to work with). But some of those steps hard-coded into the ALU will need to be repeated in order to get a valid output, thus you can do the FPU's job but slower.

However I do not know if the FPU can do the ALU's job, but it should be able to. I do not know because I've never tried it and it's a bit harder to find stuff like this on google.
In fact by what I know the FPU should be able to do the ALU's job without a performance penalty since it's able to calculate the integer part of a floating point just fine. But I could be horribly wrong, don't quote me on it.

Overly simplified but I hope you understand.

PrMinisterGR
Senior Member



Posts: 8052
Joined: 2014-09-27

#5257023 Posted on: 04/12/2016 03:20 PM
I get the feeling that we're being teased with the monster that will really come a year from now, because the stuff they will bring now are not going to be anything exciting.

Denial
Senior Member



Posts: 13800
Joined: 2004-05-16

#5257036 Posted on: 04/12/2016 03:47 PM
I get the feeling that we're being teased with the monster that will really come a year from now, because the stuff they will bring now are not going to be anything exciting.


It's not like Nvidia is marketing GP100 towards gaming. Everything they said about it was related to HPC and they did say that the Tesla variant won't hit OEM's till Q4, so yeah, basically a year for a consumer one.

It depends on your definition of excitement. It's becoming pretty obvious that both AMD's Polaris 10 and Nvidia's GP104 aren't going to be that much faster then what we have already. Just far more efficient.

10 pages « 2 3 4 5 > »


Post New Comment
Click here to post a comment for this news story on the message forum.


Guru3D.com © 2022