Nvidia Shows and Demos Testla Volta V100 has 5120 Shader processors

Published by

Click here to post a comment for Nvidia Shows and Demos Testla Volta V100 has 5120 Shader processors on our message forum
https://forums.guru3d.com/data/avatars/m/237/237771.jpg
Yep. Even when they shave the die size down for commercial products, it's still going to be huge. For example, if they shrink V100 down by the ratio that the 1080 was to P100, it's still larger than GP102 in the 1080Ti.
Wait what? 1080 was GP104.
https://forums.guru3d.com/data/avatars/m/198/198862.jpg
I kindly believe prices will go up for a maximum of 200$ for titan, 150$ for TI, 100$ for x80, 75$ for x70 and 50$ for x60. Meaningless differences I suppose.
How can you say that? You forgot how much nvidia asked for Pascal FE cards when it came out? You will be paying 500$ for the 2070.
https://forums.guru3d.com/data/avatars/m/80/80129.jpg
Wait what? 1080 was GP104.
He's saying that if you take the size cut from GP100 to GP104 and applied it to GV100, it would still be bigger than GP102. The problem is GV100 is a completely new architecture and doing that comparison is not going to be accurate at all.
https://forums.guru3d.com/data/avatars/m/250/250418.jpg
Yep. Even when they shave the die size down for commercial products, it's still going to be huge. For example, if they shrink V100 down by the ratio that the 1080 was to P100, it's still larger than GP102 in the 1080Ti.
That's why Polaris is great at just 232mm^2. The smaller the chip the more they can fit in a single wafer and make it more affordable. Having high performance is great but it would be awesome if everyone could afford it!
I kindly believe prices will go up for a maximum of 200$ for titan, 150$ for TI, 100$ for x80, 75$ for x70 and 50$ for x60. Meaningless differences I suppose.
They are already expensive af.
800mm that's crazy. They must have very high confidence in the fab or are willing to ask a kidney from consumers.
Even if they can use 90% of the dies, it will have less dies per wafer. So each die will be more expensive. 800mm^2 is completely nuts.
How can you say that? You forgot how much nvidia asked for Pascal FE cards when it came out? You will be paying 500$ for the 2070.
I'm sure the full Volta GPU will cost about $1499. 2080 will cost about $999. 2070 $500? nah... That die size could be bad, if they had any competition. AMD will take 1 or 2 years to make something after Vega.
https://forums.guru3d.com/data/avatars/m/245/245459.jpg
It's 15 TFLOPS for Gaming
Thanks Hilbert, I missed the fact that the 15 TFLOPS are included/written in the slide! It's a bit disappointing that this massive 800mm squared die is 'only' has about 50% more processing power than the GTX 1080ti. Other people have mentioned how expensive a gaming version of this product would be due to the large die size. Now 50% increase in performance wouldn't be bad, but it might be unlikely we'll get such a large chip given the large price that would go with it, so maybe Volta won't be that much of an improvement over Pascal. Although, I'm sure I saw NVidia slides showing twice the performance of Volta over Pascal on their roadmap, but this large Testla chip is only 50% more processing performance. Hmmm....
Just for random thought: 1170(volta) 11 tflops 1180(volta) 13 tflops 1180ti/titan 15 tflops
Maybe, but as some people have been saying that 1180ti is gonna be very very expensive so might not happen this way!
https://forums.guru3d.com/data/avatars/m/80/80129.jpg
That's why Polaris is great at just 232mm^2. The smaller the chip the more they can fit in a single wafer and make it more affordable. Having high performance is great but it would be awesome if everyone could afford it! They are already expensive af. Even if they can use 90% of the dies, it will have less dies per wafer. So each die will be more expensive. 800mm^2 is completely nuts. I'm sure the full Volta GPU will cost about $1499. 2080 will cost about $999. 2070 $500? nah... That die size could be bad, if they had any competition. AMD will take 1 or 2 years to make something after Vega.
The die size is for a datacenter oriented product. GP100 is a completely different design than GP102/104. Volta is clearly taking this to a new level. Cut the FP64, the new Tensor cores, etc and the chip is going to be like a 1/3rd the size. Trying to extrapolate price from a die size comparison of a product that's completely separated from the gaming market is like really stretching it.
https://forums.guru3d.com/data/avatars/m/237/237771.jpg
Question. Does the 800mm^2 include HBM? As Denial says GTX die will be much smaller. GP100 is 610mm^2 and GP102 is 471mm^2. Full size GTX Volta may be more around 560mm^2. Which is not a huge jump from GP102. Vega is rumored around 520mm^2 for reference and the users in here talking about how expensive the Volta cards will be based on die size think Vega will be retailing for $300.
https://forums.guru3d.com/data/avatars/m/56/56686.jpg
show me the next xx60 card!
https://forums.guru3d.com/data/avatars/m/80/80129.jpg
https://devblogs.nvidia.com/parallelforall/wp-content/uploads/2017/05/image3.png Apparently the die mm2 includes the HBM2/Interposer. So the actual die is smaller. Also the tensor units are why FP32 is low given the die size itself. The tensor units most likely won't be on the gaming cards - or will be severely cut down. Along with the FP64. Edit: I guess I could have appended this to my previous post. Sorry.
https://forums.guru3d.com/data/avatars/m/258/258664.jpg
Nerdgasmic. I want one with hacked drivers 🤓
https://forums.guru3d.com/data/avatars/m/242/242471.jpg
So ~15Tflop for GV100, midget high-end GV104 ~ 12Tflop?
https://devblogs.nvidia.com/parallelforall/wp-content/uploads/2017/05/image3.png Apparently the die mm2 includes the HBM2/Interposer. So the actual die is smaller. Also the tensor units are why FP32 is low given the die size itself. The tensor units most likely won't be on the gaming cards - or will be severely cut down. Along with the FP64. Edit: I guess I could have appended this to my previous post. Sorry.
Yeah, I was about to say its probably with HBM interposer,.. Im sure normal GTX variant will remain in the same price range 699$, 1080Ti settled to ~700- 750€, just like 980Ti after 1-2months. https://geizhals.eu/?cat=gra16_512&xf=9810_7+10609+-+GTX+1080+Ti&sort=p
https://forums.guru3d.com/data/avatars/m/259/259654.jpg
The big news I'm getting out of this is that Volta has had a third process switch and it's now again at 16/12nm. Which means it will most likely come for consumer products earlier than the end of Q1 2018, as initially predicted. The gaming teraflop numbers aren't exactly crazy though, an overclocked 1080Ti will get close to these. That's pretty much a 2GHz 1080Ti. It's not as huge a leap as I imagined it to be, so I'm curious what different things the actual architecture holds. They seem to have gone very wide this time, I wonder if other architectural details are different or this is "just" a wider Pascal. Interesting times ahead. And yes, that chip is massive. About the die size, isn't Vega 530-560mm2 and 12.5Tflop? What I'm reading wrong here? Are the tensor parts that much of the die size? It seems like they aren't, considering that the increase of SM from 56-80 seems to be almost proportional to the increase in die size.
https://forums.guru3d.com/data/avatars/m/237/237771.jpg
The big news I'm getting out of this is that Volta has had a third process switch and it's now again at 16/12nm. Which means it will most likely come for consumer products earlier than the end of Q1 2018, as initially predicted. The gaming teraflop numbers aren't exactly crazy though, an overclocked 1080Ti will get close to these. That's pretty much a 2GHz 1080Ti. It's not as huge a leap as I imagined it to be, so I'm curious what different things the actual architecture holds. They seem to have gone very wide this time, I wonder if other architectural details are different or this is "just" a wider Pascal. Interesting times ahead. And yes, that chip is massive.
If these clock similar to Pascal however we may see 18.5-19 Tflop performance with this chip. Also as previously established the die size discussed includes the HBM imposters. To your edit. The doubled the tensor core count from GP100. As fare as I know they were not a part of GP102. Does AMD include HBM impossers in their die measurement as I would estimate GV100 at 580~610mm^2 without HBM.
https://forums.guru3d.com/data/avatars/m/80/80129.jpg
https://devblogs.nvidia.com/parallelforall/inside-volta/?ncid=so-twi-vt-13918 Has some more information about architectural changes.
New Streaming Multiprocessor (SM) Architecture Optimized for Deep Learning Volta features a major new redesign of the SM processor architecture that is at the center of the GPU. The new Volta SM is 50% more energy efficient than the previous generation Pascal design, enabling major boosts in FP32 and FP64 performance in the same power envelope. New Tensor Cores designed specifically for deep learning deliver up to 12x higher peak TFLOPs for training. With independent, parallel integer and floating point datapaths, the Volta SM is also much more efficient on workloads with a mix of computation and addressing calculations. Volta’s new independent thread scheduling capability enables finer-grain synchronization and cooperation between parallel threads. Finally, a new combined L1 Data Cache and Shared Memory subsystem significantly improves performance while also simplifying programming. Second-Generation NVLink™ The second generation of NVIDIA’s NVLink high-speed interconnect delivers higher bandwidth, more links, and improved scalability for multi-GPU and multi-GPU/CPU system configurations. GV100 supports up to 6 NVLink links at 25 GB/s for a total of 300 GB/s. NVLink now supports CPU mastering and cache coherence capabilities with IBM Power 9 CPU-based servers. The new NVIDIA DGX-1 with V100 AI supercomputer uses NVLink to deliver greater scalability for ultra-fast deep learning training. HBM2 Memory: Faster, Higher Efficiency Volta’s highly tuned 16GB HBM2 memory subsystem delivers 900 GB/sec peak memory bandwidth. The combination of both a new generation HBM2 memory from Samsung, and a new generation memory controller in Volta, provides 1.5x delivered memory bandwidth versus Pascal GP100 and greater than 95% memory bandwidth efficiency running many workloads. Volta Multi-Process Service Volta Multi-Process Service (MPS) is a new feature of the Volta GV100 architecture providing hardware acceleration of critical components of the CUDA MPS server, enabling improved performance, isolation, and better quality of service (QoS) for multiple compute applications sharing the GPU. Volta MPS also triples the maximum number of MPS clients from 16 on Pascal to 48 on Volta. Enhanced Unified Memory and Address Translation Services GV100 Unified Memory technology in Volta GV100 includes new access counters to allow more accurate migration of memory pages to the processor that accesses the pages most frequently, improving efficiency for accessing memory ranges shared between processors. On IBM Power platforms, new Address Translation Services (ATS) support allows the GPU to access the CPU’s page tables directly. Cooperative Groups and New Cooperative Launch APIs Cooperative Groups is a new programming model introduced in CUDA 9 for organizing groups of communicating threads. Cooperative Groups allows developers to express the granularity at which threads are communicating, helping them to express richer, more efficient parallel decompositions. Basic Cooperative Groups functionality is supported on all NVIDIA GPUs since Kepler. Pascal and Volta include support for new Cooperative Launch APIs that support synchronization amongst CUDA thread blocks. Volta adds support for new synchronization patterns. Maximum Performance and Maximum Efficiency Modes In Maximum Performance mode, the Tesla V100 accelerator will operate unconstrained up to its TDP (Thermal Design Power) level of 300W to accelerate applications that require the fastest computational speed and highest data throughput. Maximum Efficiency Mode allows data center managers to tune power usage of their Tesla V100 accelerators to operate with optimal performance per watt. A not-to-exceed power cap can be set across all GPUs in a rack, reducing power consumption dramatically, while still obtaining excellent rack performance. Volta Optimized Software New versions of deep learning frameworks such as Caffe2, MXNet, CNTK, TensorFlow, and others harness the performance of Volta to deliver dramatically faster training times and higher multi-node training performance. Volta-optimized versions of GPU accelerated libraries such as cuDNN, cuBLAS, and TensorRT leverage the new features of the Volta GV100 architecture to deliver higher performance for both deep learning and High Performance Computing (HPC) applications. The NVIDIA CUDA Toolkit version 9.0 includes new APIs and support for Volta features to provide even easier programmability.
Seems like PRMinister was right - they kind of copied AMD's implementation of thread scheduling although it isn't clear how similar it is to what AMD does. Probably won't find out till closer to launch.
data/avatar/default/avatar29.webp
If these clock similar to Pascal however we may see 18.5-19 Tflop performance with this chip. Also as previously established the die size discussed includes the HBM imposters.
Its hard to conclude anything about GTX line from this GV100 data. Two being completely bifurcated and GV100 carrying significant amount of extra weight. One of few things they do share is: Volta SM (Streaming Multiprocessor) New mixed-precision FP16/FP32 Tensor Cores purpose-built for deep learning matrix arithmetic; Enhanced L1 data cache for higher performance and lower latency; Streamlined instruction set for simpler decoding and reduced instruction latencies higher clocks and higher power efficiency. https://devblogs.nvidia.com/parallelforall/inside-volta/?ncid=so-twi-vt-13918
https://forums.guru3d.com/data/avatars/m/259/259654.jpg
If these clock similar to Pascal however we may see 18.5-19 Tflop performance with this chip. Also as previously established the die size discussed includes the HBM imposters. To your edit. The doubled the tensor core count from GP100. As fare as I know they were not a part of GP102. Does AMD include HBM impossers in their die measurement as I would estimate GV100 at 580~610mm^2 without HBM.
These are apparently the key new features of GV100: New Streaming Multiprocessor (SM) Architecture Optimized for Deep Learning Second-Generation NVLink™ HBM2 Memory: Faster, Higher Efficiency Volta Multi-Process Service Enhanced Unified Memory and Address Translation Services Cooperative Groups and New Cooperative Launch APIs Maximum Performance and Maximum Efficiency Modes Volta Optimized Software Out of all of that, almost zero seems to deal with new deeper architectural stuff. They pretty much seem to have a more optimized/refined memory controller and that's it. The Maximum Performance and Maximum Efficiency modes remind me of similar wording from AMD about Hawaii and Fiji. I don't believe that this one will clock that high this time. We'll see.
https://forums.guru3d.com/data/avatars/m/80/80129.jpg
These are apparently the key new features of GV100: New Streaming Multiprocessor (SM) Architecture Optimized for Deep Learning Second-Generation NVLink™ HBM2 Memory: Faster, Higher Efficiency Volta Multi-Process Service Enhanced Unified Memory and Address Translation Services Cooperative Groups and New Cooperative Launch APIs Maximum Performance and Maximum Efficiency Modes Volta Optimized Software Out of all of that, almost zero seems to deal with new deeper architectural stuff. They pretty much seem to have a more optimized/refined memory controller and that's it. The Maximum Performance and Maximum Efficiency modes remind me of similar wording from AMD about Hawaii and Fiji. I don't believe that this one will clock that high this time. We'll see.
New Streaming Multiprocessor (SM) Architecture Optimized for Deep Learning Volta features a major new redesign of the SM processor architecture that is at the center of the GPU. The new Volta SM is 50% more energy efficient than the previous generation Pascal design, enabling major boosts in FP32 and FP64 performance in the same power envelope.
Something in the architecture changed to enable a 50% increase in efficiency.
https://forums.guru3d.com/data/avatars/m/238/238795.jpg
I'll get excited when the software catches up to the hardware. I've fallen for the hype of hardware too many times, to just sit on it playing console ports. Sounds impressive, but until I see a game that NEEDS it, I'm not impressed. Why buy a V8 if you're only driving to work?
data/avatar/default/avatar17.webp
Holy crap it really is 815mm2. And TSMC has made 12FFN exclusivly for Nvidia. According to JHH: It's at the reticle limit. You cant make a bigger die. Getting one working die per 12'' wafer is - unlikely