Nvidia Announces PCI-Express version of Tesla V100 accelerator

Published 2017-06-21 07:31 by Hilbert Hagedoorn

Nvidia announced a PCI Expres version 'card' of its the Tesla V100 accelerator, to be releases later this year. The unit has 16GB HBM2 memory and the Volta GPU has been fitted with 5120 sharder processors. .

The PCI-E card of the Tesla V100 is intended to be a server part for stuff like deep-learning, research and analysis. The card is clocked slightly slower compared to the original module model and comes with a TDP of 250 watts. Nvidia claims these cards will be releases 'later this year'.

Specifications of the PCIe form factor include:

7 teraflops double-precision performance, 14 teraflops single-precision performance and 112 teraflops half-precision performance with NVIDIA GPU BOOST™ technology
16GB of CoWoS HBM2 stacked memory, delivering 900GB/sec of memory bandwidth
Support for PCIe Gen 3 interconnect (up to 32GB/sec bi-directional bandwidth)
250 watts of power


	Tesla V100 (SXM2)	Tesla V100 (PCIe)	Tesla P100 (SXM2)	Tesla P100 (PCIe)
Architecture	Volta	Volta	Pascal	Pascal
Gpu	GV100 (815mm2)	GV100 (815mm2)	GP100 (610mm2)	GP100 (610mm2)
Shader cores	5120	5120	3584	3584
Tensor cores	640	640	AFTER	AFTER
Core Speed	?	?	1328MHz	?
Boost Clock	1455MHz	~ 1370MHz	1480MHz	1300MHz
Memory Speed	1.75Gbps HBM2	1.75Gbps HBM2	1.4Gbps HBM2	1.4Gbps HBM2
Memory	4096-bit	4096-bit	4096-bit	4096-bit
memory Bandwidth	900GB / sec	900GB / sec	720GB / sec	720GB / sec
Vram	16GB	16GB	16GB	16GB
L2 cache	6MB	6MB	4MB	4MB
Half Precision	30 TFLOPS	28 TFLOPS	21.2 TFLOPS	18.7 TFLOPS
Single Precision	15 TFLOPS	14 TFLOPS	10.6 TFLOPS	9.3 TFLOPS
Double Precision	7.5 TFLOPS (1/2 rate)	7 TFLOPS (1/2 rate)	5.3 TFLOPS (half rate)	4.7 TFLOPS (1/2 rate)
Tensor Performance (Deep Learning)	120 TFLOPS	112 TFLOPS	AFTER	AFTER
Transistors	21 billion	21 billion	15.3 billion	15.3 billion
TDP	300W	250W	300W	250W
Form Factor	Mezzanine (SXM2)	PCIe	Mezzanine (SXM2)	PCIe
Process	TSMC 12nm FFN	TSMC 12nm FFN	TSMC 16nm FinFET	TSMC 16nm FinFET

One of the biggest innovations of the V100 compared to the P100 are the all new Tensor Cores, the GV100 GPU has 640 them: eight per sm. Nvidia claims huge performance gains for applications that can make use of it. At regular fp32- and fp64 calculations the GV100 is about 1.5 times as fast as the GP100. NVIDIA Tesla V100 GPU accelerators for PCIe-based systems are expected to be available later this year from NVIDIA reseller partner and manufacturers, including Hewlett Packard Enterprise (HPE).

Share this content