Intel Architecture day 2021: Alder Lake

Processors 199 Page 2 of 3 Published by

teaser

Architecture - Power efficient Gracemont cores

Alder lake Core architecture

The following are a set of slides from the presentation that you can view. As previously indicated, we acquired this information hours before the embargo was lifted and will fill in the gaps throughout the day.

Gracemont power-efficient cores

Alder Lake is Intel's codename for the 12th-generation of Intel Core processors based on a hybrid architecture utilizing Golden Cove high-performance cores and Gracemont power-efficient cores. According to Intel, Alder Lake is a "performance hybrid" in their portfolio, as it is focused on performance rather than power consumption. Gracement, wasn't that Intel Atom related? Gracemont is an upcoming microarchitecture for low-power processors that will also be used in Intel's systems on a chip (SoCs). It will be the successor of the Tremont microarchitecture. Additionally, it will be deployed as low-power cores in a hybrid architecture for Intel's Alder Lake processors, similar to its predecessor.  The cores have been further enhanced. Gracemont is actually the 4th generation out-of-order low-power Atom microarchitecture, built on the Intel 7 manufacturing process. Intel also says that Alder Lake will provide the most performance per watt of any of its processors. You're going to notice a number of things, among them, increases in L1 caches; the Instructions cache for example was doubled to 64KB with an up to 4MB L2 cache, remember we're still talking the energy-friendly cores here. Microsoft will have to introduce support for these sophisticated scheduling features to x86-64 Windows in order for them to be supported for the next-generation hardware schedulers.

Key changes:

  • 64KB per core Level 1 instruction cache
  • DDR5 memory
  • PCIe 5.0 support
  • Support for AVX, AVX2, and AVX-VNNI instructions
The Hybrid Architecture in Alder Lake is much different from that of hybrid concepts such as those we are familiar with in smartphones, such as Arm's big.Little processor. The most important goal of various smartphone technologies in the smartphone world is to save on energy consumption. This is undeniably one of the benefits of Intel's Hybrid Architecture, but the increased efficiency will also result in a higher overall perf level as a result of the increased efficiency.
  

Untitled-6

  

The tiny and swift cores, on the other hand, have the potential to be equally active. When dealing with a fully multithreaded workload, it is possible that the smallest cores will be overloaded. The Gracemont cores are grouped in fours and each has a 4MB L2 cache. The L1 cache is larger than Golden Cove for instructions but greatly smaller for data. In comparison to Tremont, Intel doubles the instruction cache. As with its predecessor, it processes six instructions each clock cycle and features a whopping seventeen execution ports for parallel processing.


Untitled-7 

   

 As you can see from the diagrams, Gracemont lacks a micro-opcache, it incorporates two 3-wide decoders capable of processing six instructions each clock cycle. 


Untitled-8


Each decoder retrieves instructions from the L1 cache, but unlike the larger Performance core, it lacks an op-cache for recycling previously decoded instructions. Gracemont definitely is not a scaled-down version of Golden Cove. A relatively large instruction cache of 64kB is used by both decoder clusters.


Untitled-9

 

The backend then; here math takes place, data getting executed if you will. You'll run into seventeen execution ports accessible in the backend. Five are dedicated to floating-point calculations and twelve to integers (and memory access).  The out-of-order buffer can contain up to 256 entries which were 208 in Tremont from the top of my head.

   

Untitled-10

  

Each core now has four integer alu's (was three), four AGU's (there were two previously), two jumps (there were one previously), and two storedata's (was one).

 

 Untitled-11

Two additional store data and three ALU's are available for floating-point operations, while the new third ALU's functionality has been kept very minimal. Two loads and two stores are shared among the four AGUs, which switch using a 32kB data cache and a 4MB L2 cache. 


Untitled-12

Share this content
Twitter Facebook Reddit WhatsApp Email Print