AMD Releases More Architecture Details on ZEN

Published by


As we get closer and closer to release, AMD today once again shared more info about Zen. Yesterday AMD talked to media about a ZEN microarchitecture deep-dive presentation from Hot Chips, and shared a lot of slides alright. 

A warning first, this is a bit of a deep-dive in technology.

The main identifier will be a 40% performance per core gain compared to Excavator. As you guys know the The new “Zen” architecture is designed to scale across all of AMD’s CPU business, from enthusiast desktop platforms to enterprise-class servers and notebooks, to embedded and semi-custom products. The initial “Zen” CPU core is stated to deliver more than 40 percent improvement in instructions per clock cycle over the previous generation cores and will come to market first in an 8-core, 16-thread system-on-chip for desktops (=Summit Ridge). For the new architecture AMD focused on three key areas when designing this special architecture:

  • Performance of the engine itself with completely new branch prediction, introduction of a micro-op cache and a much wider instruction window;
  • Throughput, to keep that high-performance engine fed with data and instructions out of memory through pre-fetching and a completely new cache memory hierarchy with 8 MB of L3 cache; and finally,
  • Efficiency having performance and throughput without increasing power, by leveraging a 14nm FinFET process and a wealth of power saving design techniques in the architecture.
If you look closely at the block diagram of the ZEN core then you'll notice something completely different from Bulldozer which was basically a mixed and merged design cores wise. ZEN in essence is a more traditional processor design however far more complex and advanced. 

Next to cores next-gen ZEN architecture holds CPU-Complex (CCX) which are four cores sharing an 8 MB L3 cache. Very similar to Intel processors where the cores share nothing beyond L3 cache, making them independent. ZEN, according to AMD should offer a better core overall in both efficiency and performance.

For us performance consumers Summit Ridge is likely going to be the processor that appeals the most with it's 8 cores and 16 threads. Bristol ridge will become the 4 core 8 threads processor series. Expectations are high and Summit Ridge may prove to be a make or break product for AMD. The Zen architecture will be built on a more efficient 14 nanometer FinFET process at GloFo, rather than the 32 nm and 28 nm processes of previous AMD FX CPUs and AMD APUs, respectively. Four variants of ES Zen are actually already spotted in the industry:

  • AM4 8 cores with 95W TDP (Summit Ridge)
  • AM4 4 cores with 65W TDP (Bristol Ridge)
  • SP3 24 cores with 150W TDP
  • SP3 32 cores with 180W TDP (Naples)

The "Summit Ridge" Zen family will feature a unified AM4 socket with its GPU-equipped "Bristol Ridge" APU counterparts, and feature DDR4 support and a an expected 95W TDP.  Newer roadmaps don't confirm the TDP for desktop products, they suggest a range for low-power mobile products with up to two Zen cores from 5 to 15W and 15 to 35W for performance-oriented mobile products with up to four Zen cores.

Each Zen core will have four integer units used for calculations (ALU). Two load/store units and two floating point units, and the decoder can decode four instructions per clock cycle.

Cache wise things have improved. The data prefetchers are much improved, these predict what data is needed short term by instructions and fetches that from the relatively slow RAM memory.  As usual you'll have your typical L1, L2 and L3 caches. I like to use the L for latency, the lower the number the less latency and that is faster bandwidth to work in. L1 now however is write-back. L1 data cache size is 32 KiB and separated from that there is a 64 KiB Instruction cache. The L1 cache is  write-back (and not write-through). L3 cache will be shared between full-fledged cores, and each core having a dedicated 512 kB L2 cache and in total 8 MB of L3 cache shared between four cores in a CCX (CPU Complex). Within a CCX there is L3 cache that has 16 parallel communication paths, with in total thus that 8MB cache (2MB per core). Each core can address that L3 cache pool at the same speed/ latency. The 80-core Summit Ridge processors thus would get two CXX units. This is twice as much as Intel processors offer. The combination of these improved caches should give AMD 5x more bandwidth compared to last-gen products. 

ZEN will also offer SMT multi-threading (hyper threading), each core is addressed to as two threads, with each thread competing for the resources on the core. Thus an 8-core processors is seen as 16-core processor by Windows. The processors are fabbed at 14nm FiNFET and AMD focused on power-draw from the start of the ZEN project from the start. ZEN processors have aggressive clock-gating. 14nm should bring a lot of additional power savings to ZEN, obviously ZEN is intended for mobile platforms as well. 

Instruction sets will be plentyful as well. ZEN will support all SSE and 128-bit AVX but to name a few: AVX, AVX2, BMI1, BMI2, AES, RDRAND, sMEP, SHA1/SHA256, ADX, CFLUSHopt, XSAVEC/XSAVES/XRSTORS, and SMAP. There also will be new AMD-exclusive instruction sets, including CLzero, and PTE Coalescing.

As stated, AMD ZEN is 40% faster per core compared to Excavator, but AMD doesn't leave it at that.In the further future there will be ZEN+, even more advanced designs and offering even faster performance. Obviously we cannot wait until the first ZEN Procesors will be released, that would be Summit Ridge which is an 8-core processor with 16 threads.

Have a peek at the slides I captured during the presentation.

Share this content
Twitter Facebook Reddit WhatsApp Email Print