The Threadripper Processor Series
Threadripper 3rd gen
Threadripper processors are CPUs based upon Ryzen architecture ZEN2 dies that you know from the 'regular' Ryzen 3000 series processors. Zen 2 architecture is an advancement of Zen, and Zen had some bottlenecks that needed to be dealt with. These are solved in this design and, at the same time, thanks to the smaller 7nm transistors, added extra functionality in important places. There are differences between the three cache levels. The L1 instruction cache has become smaller at 32 Kbytes, the data cache is the same as last gen, 32 Kbytes, both per-core of course. While a lot of IO changes have been made to facilitate it, basically on-chip you'll spot four (and eight for 64-cores) 8-core Ryzen processor dies sitting around a big IO chip, all in one package.
This means these processors are set up in a 8+8+8+8+8+8+8+8 (8x8) fashion for the 64 core 3990X. The processor dies are physically similar, identical even to the 8-core Ryzen 3000 / ZEN2 design, it is the very same die that is used. There is however one distinction, the 8-core dies have all been sorted for the best-performing cores at the lowest possible voltage.
The L2 cache is the same at 512 kBytes per core, however, the L3 cache was doubled up from Gen2 moving to 32MB L3. So in retrospect, AMD reduced the L1 instruction cache from 64 kB to 32 kB. The instruction cache contains the x86 instructions that are retrieved from the memory for processing. However, by giving this cache more inputs and outputs, 8-way associative instead of 4-way associative, it will make up for that design choice. Also, by optimizing algorithms for pre-fetching instructions and increasing the caches at other levels (like the L3 cache), the effect of the smaller instruction cache is limited. The L1 data cache was 32 kB in Zen and remains at 32 kB for Zen 2. Unchanged is the L2 cache, which is still 512 kB per core. The L3 cache is shared by the cores and that one has doubled up in size. Four cores are partitioned together in a group called a core complex (CCX). The earlier generation Zen processors had 8 MB of L3 cache, this has been doubled up to a whopping 16 MB of L3 cache and 32 MB per CCD. Why the doubled L3 caches? Well, AMD needed to address the latencies for accessing working memory to cope with the chiplet design, whereby the memory controller is physically located in a different chip, ergo a doubled L3 cache. Increasing any sort of cache is costly. It takes up a substantial portion of the available transistor budget, here is where 7nm helps out greatly.
Starting at Zen 2 architecture AMD moved towards a chiplet design. Multi-die chips holding their CPU cores are paired with multiple chips in one package. So, for Threadripper 3000, that would be four processor dies interconnected by an IO chip, that IO chip is similar to the chipset IC. It’s one of the many answers to be able to fight off Moore's Law, now and in the future. AMD was already using the technology to connect multiple processors in Threadripper and, for servers, Epyc. Actually, also Intel with Kaby Lake-G. Chiplets, are multiples of chips put together on an interposer that forms the actual chip. Chiplets with Zen 2 feature a I/O die along with 7nm CPU chiplets (each holding eight cores per die). To be able to accomplish that, AMD has been updating its Infinity Fabric that connects the different dies that hold the cores. Current Epyc, Ryzen and Threadripper CPUs are all connected via the Infinity Fabric. With the Zen 2 architecture, AMD places one I/O die chip that sits in the middle, which is connected to four 8-core dies and, with the 64-core part, a staggering eight 8-core dies. These AMD CPU chiplets are connected through Infinity Fabric (the interlink wires that connect them all). Why chiplet designs? One of the bigger issues at hand when manufacturing large monolithic CPU/GPU dies is that yields decrease nearly exponentially and costs go up due to non-working dies. Multiple smaller chips in one package have higher yields, less loss and thus can be more profitable.
The Ryzen Threadripper processor family
On the market, you will spot Ryzen series 3000 3, 5, 7, 9 and now Threadripper series 3000 processors based on ZEN2 architecture. It’s plain and simple and, as always, that works out as the best way to understand the product positioning. Below, an overview of the Threadripper lineup.
Unlocked & loaded
All Threadripper processors are unlocked. The motherboards need a chipset that is unlocked as well though, but the TRX40 covers all of that. Be warned though, all-core overclocks on so many cores... let's just say we advise you to stick to proper water-cooling and let XFR2 do its thing as overclocking really isn't an option anymore with many cores these days. Despite that fact, we'll give it a go later on in the article.
TRX40 Chipset - TREX
With Gen3 Threadripper processors came TREX. TRX40 is specifically for Threadripper 3000 and future products. It was imperative for AMD to get the most out of Threadripper 3000 and thus they wanted to double up the PCIe Gen 4.0 interlink between the processor and motherboard chipset. This chipset has a PCIe 4.0 x8 interlink, which is unheard of and creates massive possibilities for things like storage. So, that bandwidth between the processor and the chipset has quadrupled compared to the current Threadripper platform. As a result, much more bandwidth is available for all I/O options offered by the chipset. What you are also going to notice is a further increase in PCIe Gen4 lanes, 72 available lanes on the Threadripper 3000 platform. Threadripper 3000 brings 64 PCIe Gen4 lanes to the table, 8 of those have been reserved for the chipset link and then the chipset link brings in another 24 PCIe Gen 4 links to the table with 8 reserved for that interconnect. In total, you are looking at 88 lanes, with 72 lanes available to the end-user. The socket has been named sTRX4. You are going to see a number of motherboard announcements today, the new Threadripper processors and platforms will become available by the 25th of November. So yes, PCIe Gen 4.0 everywhere. The socket has been named sTRX4, the chipset TRX40.
|PCIe Version||Line Code||Transfer Rate||x1 Bandwidth||x4||x8||x16|
|1.0||8b/10b||2.5 GT/s||250 MB/s||1 GB/s||2 GB/s||4 GB/s|
|2.0||8b/10b||5 GT/s||500 MB/s||2 GB/s||4 GB/s||8 GB/s|
|3.0||128b/130b||8 GT/s||984.6 MB/s||3.938 GB/s||7.877 GB/s||15.754 GB/s|
|4.0||128b/130b||16 GT/s||1.969 GB/s||7.877 GB/s||15.754 GB/s||31.508 GB/s|
Quad-channel DDR4 memory
AMD’s DDR4 support is good these days and with Zen 2 it has become great - pretty much all brands are supported, with an increase in frequency support as well as a drop in latency. Obviously you get quad-channel memory support with the slowest default rating at 3200 MHz / 3200MT/s (JEDEC). Much like Ryzen 3000, a 2:1 multiplier switches on at DDR4-3733 or higher frequencies so do keep in mind that it will have an effect on the speed at which the various core complexes within the CPU can communicate with each other. For the memory itself it can now hold 256 GB with 4x8 Single Rank supported out of the box at 3200 MHz. Of course, the memory used in real practice can go faster, in fact, we'll be using a 64GB 3600 MHz CL16 kit from Corsair (Dominator) on the platform. You can even go 256GB in an 8x32 Dual Rank configuration, here however the JEDEC spec drops to 2667 MHz (but you can run higher frequencies).
|Memory config||Rank||Official JEDEC frequency support|
What's the difference between Single and Dual Rank memory is a question we receive often. Speaking in theory, Single Rank memory is faster than Dual Rank memory; when a computer accesses Single Rank memory, explained extremely simply, it means it only has to go around 'its' track once, whereas with Dual Rank it would have to go around the track twice as it is a separated circuit. See it as two DDR4 DIMMs on one DIMM PCB.
- A Single Rank DIMM has one set of memory chips that is accessed while writing to or reading from the memory. A Dual Rank DIMM is similar to having two Single Rank DIMMs on the same module, with only one rank accessible at a time. There's also a Quad Rank DIMM these days, effectively, two Dual Rank DIMMs on the same module. Only one rank is accessible at a time.
- Dual and Quad Rank DIMMs provide the greatest capacity with the existing memory technology. For example, if current DRAM technology supports 8 GB Single Rank DIMMs, a Dual Rank DIMM would be 16 GB, and a Quad Rank DIMM would be 32 GB.
The main idea behind memory ranking - to cram more memory into a single-slot module, decreasing the number of banks needed. Ranks have more to do with density and pricing than actual performance. Obviously, always check with your mainboard manufacturer if the DDR4 modules are supported, they often offer a QVL list. Also, ECC DDR4 is supported on the Threadripper platform.
AMD has further secured their technology in hardware, there is Zen 2 hardening for Spectre v4 exploits built into the processor, which is awesome news. The AMD processors are less susceptible to other security issues such as Meltdown, Foreshadow, and MDS.