Architecture and Specifications
Product overview
Vega 10 is being fabbed on Global Foundries’ 14nm FinFET-based process technology, which is the densest foundry process available. FinFET transistors are crucial to reducing power consumption and enabling operating voltages that are 150mV lower than the previous generation, thereby cutting active power by 30% from a 1V baseline. For the last six years, graphics processors have relied on 28nm high-k/metal nodes. We have moved towards the last gen 14nm FinFET. The most affordable model will be the Radeon RX Vega 56 which will offer 10.5 TFLOPS of fp32 performance at a price of 399 USD. All three cards have been fitted with 8 GB of HBM2 graphics memory, two stacks thus with a 2048-bit memory bus. The 64 editions will offer 484 GB/s of bandwidth, the 56 shader cluster edition get 410 GB/s of graphics memory bandwidth. So when we break down the products in a more simplistic way we see the following:
Basically, the Vega GPU is based on 64 shader clusters with 64 shader processors each. So that makes a nice 4096 shader processors in total. These shader partitions are tied to 64 ROPs. TMUs wise for a fully enabled unit AMD has always used a 4:1 ratio meaning 64 CUs x 4 = 256 texture units, which is a good number. Memory sits on a 2048-bit wide bus (two memory stacks of 1024-bits each) spread over the 64-bit controllers.
An bit more architecture
For the Radeon RX Vega 64 - sixty-four shader clusters with 64 shaders per cluster equal to 4096 shader processors that turbo to the 1.545 GHz dynamic clock frequency.
The Vega 10 GPU is based on a whopping 12.5 billion transistors and is fabbed on a 14nm FinFET LPP process. It comes with over 45MB of SRAM cache across the chip and thus holds either an 8 GB (consumer) or 16 GB (Pro) configuration. While we'll get into the details of Vega in our review, the GPU is based on a design with 4 synchronous compute pipes, four next gen geometry engines and for the fully activated SKU, 64 next gen compute units which will get you 4096 shaders processors and 56 x 64 = 3584 shader procs for the 56 model.
The full 64 model would be tied to 64 ROPs and 256 / 224 texture memory units. G33k stuff: L2 cache has been doubled to 4 MB. Render back-ends are now clients of the L2.
To the left, the new Vega 10 GPU with two 4 GB HBM2 stacks - to the right, the older Fiji GPU (Fury) holding four 1 GB stacks
Vega is fitted with HBM2 memory (vertically stacked graphics memory placed on die) as well as using a new IO gateway, the graphics memory cache is a synonym for High Bandwidth Cache. Earlier on, AMD stated that Vega is a GPU with 200 new features. A number of things are key for AMD with this scalable new GPU architecture. Vega 10 is using High Bandwidth Video Memory, VRAM, graphics memory or whatever you like to call it. The specific type used is HBM2 memory. The graphics engineers from AMD claim that HBM2 will offer you 5x the power efficiency compared to any other graphics memory including GDDR5, and yes, that is huge.
Another benefit obviously is capacity. HBM is on-chip vertically stacked (slabs of memory cells placed on top of each other) and you know it, when it comes to caches and memory bigger is simply better (in terms of storage volume). With the second iteration of HBM, HBM2, AMD now has 8x the density per stack with a 50% smaller footprint.