Back in May Nvidia announced its Testla Volta V100 processor with Tensor architecture. The companies TSMC’s 12nm finfet process bakes graphics processor has 5120 shader processors activated out of a total of 5376 cores. Scores surfaced in geekbench, and they are impressive.
So a fully enabled GV100 GPU actually consists of six GPCs, 84 Volta SMs, 42 TPCs (each including two SMs), and eight 512-bit memory controllers (4096 bits total). Each SM has 64 FP32 Cores, 64 INT32 Cores, 32 FP64 Cores, and 8 new Tensor Cores. Each SM also includes four texture units.
With 84 SMs, a full GV100 GPU thus has a total of 5376 FP32 cores, 5376 INT32 cores, 2688 FP64 cores, 672 Tensor Cores, and 336 texture units. Each memory controller is attached to 768 KB of L2 cache, and each HBM2 DRAM stack is controlled by a pair of memory controllers. The full GV100 GPU includes a total of 6144 KB of L2 cache. The figure in above table shows a full GV100 GPU with 84 SMs (different products can use different configurations of GV100). The Tesla V100 accelerator uses 80 SMs.
Now then, the Nvidia DGX-1 unit with Tesla V100 Spotted in GeekBench has 5,120 shader processors. For the record, a DGX-1 setup currently costs roughlt 129K and houses eight Tesla V100 cards, two Intel Xeon E5-2698 v4 processors, 512GB DDR4, four 1.92TB SSDs in RAID 0 and a power supply of more than three kilowatt. So let's call it what it is, a super computer in a box.
An entry was spotted in Geekbench 4, both with OpenCL and CUDA APIs. Where Tesla P100 systems scores up-to 320,000 points, the DGX-1 reached 481,504 points at OpenCL and 746,537 points using the CUDA API. The numbers are just staggering.