Nvidia GP100 GPU architecture recap - full GPU has 3840 Shader processors

Turanis

2016-04-12 17:00

Common Guru3D show us real news not recap of recap of PR marketing from Nvidia. They put this PR thing to make people to not be interested in other concurent products. Some of this PR thing claim that GP10x will come in January 2016,then in March,then again in April.But guess what:its only PR thing to make fanboys to stay in touch with this beloved new chip. From Blizzard with love:"Its done when its done",not sooner or later.

#5257075

southamptonfc

2016-04-12 17:10

300W fail... unless watercooled.

That's not true, there are plenty of 3rd party air coolers that can easily cope with 300w. Having said that, I am quite surprised that Nvidia would produce a 300w consumer card, that is not the direction we need to be heading and the EU are always hovering to take action.

#5257076

Denial

2016-04-12 17:10

Common Guru3D show us real news not recap of recap of PR marketing from Nvidia. They put this PR thing to make people to not be interested in other concurent products. Some of this PR thing claim that GP10x will come in January 2016,then in March,then again in April.But guess what:its only PR thing to make fanboys to stay in touch with this beloved new chip. From Blizzard with love:"Its done when its done",not sooner or later.

It's an architecture overview. It's the full chip and the specifications of it. If you really don't think that's technical news you should probably find an other forum to go on. Also what your saying applies directly to AMD's Polaris demonstrations and I don't recall you posting similar stuff with all those threads.

#5257080

Turanis

2016-04-12 17:20

Nobody knows whats the new chip specs,only rumors.Nvidia dont want to "cooperate". 😉 On that Amd side I see few news,not much put pretty documented like in the last one about Polaris.

#5257086

Denial

2016-04-12 17:26

Nobody knows whats the new chip specs,only rumors.Nvidia dont want to "cooperate". 😉 On that Amd side I see few news,not much put pretty documented like in the last one about Polaris.

What are you talking about? Nvidia released the block diagram. It's published on their own site. GP100 specs that are in the Tesla P100 are also available on Nvidia's website. This isn't a rumor, this is the chip. The last AMD leak had a chart made by videocardz.net or some nonsense. Edit: Yeah http://www.guru3d.com/news_story/amd_polaris_11_in_shows_compubench_has_1024_shader_processors.html Videocards.com -- that's some serious documentation.

#5257094

xIcarus

2016-04-12 17:54

300W fail... unless watercooled.

Wut?

That's not true, there are plenty of 3rd party air coolers that can easily cope with 300w. Having said that, I am quite surprised that Nvidia would produce a 300w consumer card, that is not the direction we need to be heading and the EU are always hovering to take action.

I'm glad they are shooting up towards 300W. That should mean more performance. They've been capping out their TDP in a very weird way these past years. The 980Ti and 780Ti both have a TDP of 250W, while the 680 is rated at 195W. Compare that to AMD's 300W for the past 3 generations. To me this sounds that Nvidia has been releasing cards powerful enough to compete with AMD. And Nvidia shooting for 300W means that AMD have finally caught up, which is perfect.

#5257125

Tugrul_512bit

2016-04-12 19:23

Ah my bad. I thought these were for 1070 and 1080 specs not TITAN or 1080Ti specs. Is that really how it works? So can fp64 cores still be used on fp32 calculations? fp64 is double precision right? and fp32 is single precision...? so what is sfu??

As 'xIcarus' has written, logic units may be used out of their intended capabilities such as doing math on address units. you spawn 256 threads computing pure 32-bit numbers and you spawn another 256 threads computing pure 64-bit numbers, those work on same cluster. When a 64-bit core is empty and if a thread on that core needs fp64, then it is issued for the computation. So 512 threads working at the same time on same core cluster. Of course 32-bit calcs finish sooner so there is no %100 scaling. At the same time, there can be tens of threads per core in flight. Just like Intel's hyperthreading but 10x better. This makes every transistor busy doing things.

#5257163

TheDeeGee

2016-04-12 21:22

Jesus...X80 delayed till July : / Maybe ill just get a 980 Ti then.

#5257164

Ieldra

2016-04-12 21:22

As 'xIcarus' has written, logic units may be used out of their intended capabilities such as doing math on address units. you spawn 256 threads computing pure 32-bit numbers and you spawn another 256 threads computing pure 64-bit numbers, those work on same cluster. When a 64-bit core is empty and if a thread on that core needs fp64, then it is issued for the computation. So 512 threads working at the same time on same core cluster. Of course 32-bit calcs finish sooner so there is no %100 scaling. At the same time, there can be tens of threads per core in flight. Just like Intel's hyperthreading but 10x better. This makes every transistor busy doing things.

I think the someone in this thread was asking if two fp32 ops can be run on a single fp64 unit, like fp16 on pascal What you're saying is performing fp32 on the fp64 units in addition to the fp32 units, I believe you would be cache and register limited, possibly dispatch as well

#5257167

Tugrul_512bit

2016-04-12 21:37

I think the someone in this thread was asking if two fp32 ops can be run on a single fp64 unit, like fp16 on pascal What you're saying is performing fp32 on the fp64 units in addition to the fp32 units, I believe you would be cache and register limited, possibly dispatch as well

I've meant 2xfp32 operations with: 1 on fp32-core and 1 on fp64-core. I think it's impossible to compute 2xfp32 on a single fp64 in one cycle. Ofcourse it will have limited bandwidth. But its likely a gain with so much fp64 cores. Like having a titan-x-vanta on top of your gpu. Dispatcher: if it can feed fp32-multiplication, then it can feed fp32-division and fp64-multiplication, both division could be even easier I suppose. Ofcourse fp64 addition and fp32 addition at the same time could be bottlenecked by dispatcher.

#5257215

Tugrul_512bit

2016-04-12 23:42

I've meant 2xfp32 operations with: 1 on fp32-core and 1 on fp64-core. I think it's impossible to compute 2xfp32 on a single fp64 in one cycle. Ofcourse it will have limited bandwidth. But its likely a gain with so much fp64 cores. Like having a titan-x-vanta on top of your gpu. Dispatcher: if it can feed fp32-multiplication, then it can feed fp32-division and fp64-multiplication, both division could be even easier I suppose. Ofcourse fp64 addition and fp32 addition at the same time could be bottlenecked by dispatcher.

Quake-3 engine had an old trick using integer cores for fast inverse square root fp calculations. I mis-clicked to quote insteaed of edit. I didn't even see it was a submit button.

#5257218

Ieldra

2016-04-12 23:47

Quake-3 engine had an old trick using integer cores for fast inverse square root fp calculations. I mis-clicked to quote insteaed of edit. I didn't even see it was a submit button.

yeah I remember fast inverse sqrt ! wasn't quake 3! way older, or maybe it was... Wasn't quake 3 released in like 2006 that wasn't from id though, they stole the idea from someone else

#5257222

alanm

2016-04-13 00:07

I thought quake 3 was so well known and familiar with any gamer or GPU enthusiasts that no one could mistake it for a 2006 game. Released 1999 and was the prime benchmark used in any GPU reviews in following years.

#5257224

Ieldra

2016-04-13 00:13

I thought quake 3 was so well known and familiar with any gamer or GPU enthusiasts that no one could mistake it for a 2006 game. Released 1999 and was the prime benchmark used in any GPU reviews in following years.

couldn't remember if it was 3 or 4 that released in ~2006, time flies

#5257232

Tugrul_512bit

2016-04-13 00:31

couldn't remember if it was 3 or 4 that released in ~2006, time flies

I've read somewhere that fast inverse sqrt was first used before 1990 in a chemistry modeling software(Algorithms look similar but magic number changes.). That was even before 80286? Math operations were so slow, people were getting results from memory look-up-tables instead of calculating on cpu. SFU cores in a gpu must be using a very fast look up table to calculate sqrt faster than that quake thing. I tested on my hd7870, hardware function is faster than quake version.

#5257254

ScoobyDooby

2016-04-13 01:32

Like the 980ti, once the 1080ti releases, I will grab the first one I can get, stock, and throw a waterblock on it. Done and done.

#5257257

-Tj-

2016-04-13 01:43

I've meant 2xfp32 operations with: 1 on fp32-core and 1 on fp64-core. I think it's impossible to compute 2xfp32 on a single fp64 in one cycle. Ofcourse it will have limited bandwidth. But its likely a gain with so much fp64 cores. Like having a titan-x-vanta on top of your gpu. Dispatcher: if it can feed fp32-multiplication, then it can feed fp32-division and fp64-multiplication, both division could be even easier I suppose. Ofcourse fp64 addition and fp32 addition at the same time could be bottlenecked by dispatcher.

You can't really mix fp32 and fp64 or use fp64 to convert into 2x fp32 or use fp64 along with fp32 at the same time.. Each is a separate and does its own job. This is not Intel/AMD cpu AVX/SSE that can do 256bit or combine 2x128bit or split 128bit into 2x64bit..

#5257264

Ieldra

2016-04-13 02:07

Like the 980ti, once the 1080ti releases, I will grab the first one I can get, stock, and throw a waterblock on it. Done and done.

Really? Stock? No custom pcb, those go a long way man.

#5257289

PrMinisterGR

2016-04-13 03:14

yeah I remember fast inverse sqrt ! wasn't quake 3! way older, or maybe it was... Wasn't quake 3 released in like 2006 that wasn't from id though, they stole the idea from someone else

I've read somewhere that fast inverse sqrt was first used before 1990 in a chemistry modeling software(Algorithms look similar but magic number changes.). That was even before 80286? Math operations were so slow, people were getting results from memory look-up-tables instead of calculating on cpu. SFU cores in a gpu must be using a very fast look up table to calculate sqrt faster than that quake thing. I tested on my hd7870, hardware function is faster than quake version.

Here it is:

Fast inverse square root (sometimes referred to as Fast InvSqrt() or by the hexadecimal constant 0x5f3759df) is a method of calculating x***8722;½, the reciprocal (or multiplicative inverse) of a square root for a 32-bit floating point number in IEEE 754 floating point format. The algorithm was probably developed at Silicon Graphics in the early 1990s, and an implementation appeared in 1999 in the Quake III Arena source code, but the method did not appear on public forums such as Usenet until 2002 or 2003.[1] (There is a discussion on the Chinese developer forum CSDN back in 2000.[2]) At the time, the primary advantage of the algorithm came from avoiding computationally expensive floating point operations in favor of integer operations. Inverse square roots are used to compute angles of incidence and reflection for lighting and shading in computer graphics.

#5257308

alxtorrentazos

2016-04-13 06:18

All this power in a piece of silicon just to play bad console ports.......sigh! Let´s see how it improves the VR before jumping the gun.