AMD Radeon R9 Fury X review

Graphics cards 1048 Page 9 of 38 Published by

teaser

High Bandwith Memory (HBM)

High Bandwidth Memory

HBM, let's call it the turbo-charged offspring to GDDR5 memory, used in today’s graphics cards. High Bandwidth Memory, something they started working on roughly seven years ago. HBM is short for High Bandwidth Memory, basically a new technology that was invented and brainstormed many years ago. With fabrication technology shrinking the time was right for AMD to move forward with their first implementation of HBM. HBM v1 will be introduced soon in products that cannot be discussed just yet. There are several advantages to HBM memory, but of course also some concerns. First and foremost, the memory bandwidth that this technology can potentially offer is colossal. But why the move to HBM? AMD and other manufacturers have been facing the issue that GDDR5 memory is slowly becoming rather inefficient when it comes to power used versus performance delivered. More chips, more density more voltage. HBM offers a three-fold improvement in performance per watt compared to GDDR5, and a 50-percent increase in power savings, according to AMD.

So while GDDR5 is capable (over the past few years GDDR5 has become faster, DRAM volume sizes have increased), more ICs in number are used and the DDR packages have become bigger, these things all require more power and along with it larger voltage regulators, higher latency and what not. With HBM memory on-die rather than on-card, it can be used for a lot of really interesting form factors as it will use less power, that in turn can be used for other stuff like the GPU. On-die e.g. on-chip, that will be something to get used to. The approach is going to be vastly different as the memory will no longer be placed next to the GPU on the actual silicon die. So the notion of DDR memory ICs spread out over the PCB are a thing of the past with HBM. That means smaller PCBs, and with the graphics memory so close to the GPU, shorter wires = less latency and less efficiency issues as well. Being so close to the GPU/APU/SoC is efficient, and the companies are not tied to incredibly complex designs, wires can be used in a multitude of ways and as such you'll see a move from say, a 256-bit wide memory bus towards a 1024-bit wide memory bus with High Bandwidth Memory (per package).

HBM is limited to 4GB Graphics memory

These stacked memory packages have limitations, in the first generation you are looking at four stacks per package with two 128MB chunks in each layer, so that is 256MB per layer. Four times 256MB = 1024MB (or 1 Gigabyte) per accumulated DRAM stack/package. Currently chip designs allow for four stacks per IC (GPU). So that is 4 packages x (4x256MB) = 4096 MB. Hence the one limitation (if you can call it that) is HBM memory being limited to a maximum of 4GB of graphics memory for the graphics card.

But is it 1024-bit or 4096-bit?

Here we'll need to 'zoom in' towards one DRAM layer first. Each layer has two memory partitions each at 128-bit meaning HBM is using 128-bit wide channels, so eight of them allow for a full 1024-bit interface. Total bandwidth is in the 128GB/s range with die stacks of four DRAM dies. Important to know is that each memory controller is independently timed and controlled. So 256-bit x 4 = 1024-bit per package. If an SoC/Processor/GPU/APU is fitted with four stacked packages then that would boil down to 4096-bit (wide IO). HBM would allow for up-to 512 GB/s if optimal. This means that future GPUs built with HBM might reach 512GB/s to 1TB/s of main memory bandwidth in later revisions, and that is huge. If we take the latest flagship product from the competition then a GeForce Titan X with its 384-bit wide bus and 7 Gbps GDDR5 memory will get you you 337 GB/s. As you can see, many roads lead to Rome, HBM is one of them. A big difference however is that HBM will bring more bandwidth for roughly 50% less power and less latency. Since HBM is fabbed by stacking multiple DRAM cells on-top of each other these DRAM cells must be interconnected, it is done through what’s called TSVs (Through Silicon VIAs). This is a vertical connection allowing memory to transfer more data per cycle. So yes, while the memory itself is clocked at a significantly lower frequency than GDDR5, it can be up to 8, maybe 9 times faster. 

 

Untitled-1


You know, if you build a house with Lego it needs that big flat base chunk of Lego to build it on? That's the interposer. The interposer is a very thin layer on which all these components are connected, and communicate with each other, that's a whole chunk of electrical conductive wires right there. So you'll see:

  • Top level - 1x CPU/GPU with PHY (interconnect)
  • Top level - Four DRAM units, each has 4 layers and at the bottom a logic Die with PHY (Interconnect)
  • Middle level - Interposer connecting it all
  • Lower level Package Substrate
For those wondering, will the HBM DRAM blocks be much higher (in physical size) than say a GPU? E.g. isn't the HBM DRAM package like a skyscraper and the GPU or CPU the size of a house? We asked AMD, and the answer was no, the difference looks blown up in the screenshots and diagrams, but the real-world difference is not big enough to make a substantial or relevant enough difference. Also bear in mind that both the GPU and DRAM packages will get materials around them. Thus, concerns like heat conduction versus coolers are not a relevant issue, according to AMD.
 

 
A photo I took of a Fiji die, you can clearly see the four stacks of HBM memory surrounding the GPU core.

The primary factor to remember is that HBM achieves higher bandwidth with less power compared to DDR4 and GDDR5 thanks to the stacking of several memory chips. Whether or not the difference will be significant in terms of performance on this 1st generation remains to be seen. So the AMD Radeon R9 Fury X has four stacks of HBM, equalling to 4GB of 4096-bit wide memory.

Share this content
Twitter Facebook Reddit WhatsApp Email Print