The APU - Kaveri & Steamroller
An APU is a processor series with integrated graphics (in the processor die), an idea much like what Intel did with Sandy Bridge, Ivy Bridge processors and Haswell. APU is an abbreviation for Accelerated Processing Unit. Back in 2005, when AMD bought ATI, rumors immediately popped up about the technology you will learn about today. Earlier in 2011, you may have noticed the introduction of APUs like the E350 (Zacate and Ontario chips), these however can be seen as "Atom" like processors for netbooks and entry level notebooks. The Llano series was intended to address the entry-level to mid-range segment of both the notebook, but also desktop market. The APU processor reviewed today for example is targeted against Intel's Haswell Core i3 and i5 processors. Now, processor performance wise things will be easy, Intel has the upper hand, but the GPU architecture in Kaveri is quite interesting next to its compute units. That makes the product very strong on the IGP side and Compute functionality. The overall experience as such is much more powerful. Combine that with a much more advanced motherboard chipset and the new 8 series motherboards, and you'll notice that AMD has a lot to bring to the table. I do need to note that for AMD things have changed a bit with Kaveri, as it is no longer a CPU and GPU separated on the ASIC die, no it is a true hybrid convergence of the two units. I'd almost state that both the CPU and GPU functions as co-processors, embedded with each other sharing the very same data pool making it a very efficient product.
So What Is Kaveri?
The previous generation of AMD APUs where labeled as Trinity and Richland architecture, the 2nd and 3rd generation APUs. We now have arrived in the 4th generation AMD APUs which is a truly new architecture. Trinity and Richland were based on a 32nm fabrication. Kaveri has moved onwards into a smaller 28nm SHP fabbed at Global Foundries. For the CPU part this APU moved away from piledriver cores (FX) and now has faster Steamroller cores, the APUs all have four of these and compared to the last generation that alone will bring in a good 25% extra performance on the processor side of things alone already. The Kaveri APU features a multi-threaded architecture that is focusing on enhancing the IPC (Instruction-Per-Cycle) by up-to 25%.
We'll talk more about Steamroller in the next chapter though. The most important thing with Kaveri is that the chip as a whole has HSA compliance. This architecture allows both the CPU and GPU cores to work together in some tasks utilising the hUMA memory system. All summed up, the new Kaveri chips should offer a total compute power of 856 gigaflops when you combine its up-to 12 compute cores (4 CPU cores and up-to 8 Radeon R7 GPU cores). Below a quick overview of the models that will be released versus their specifications.
|APU||Boost Clock||Base Clock||CPU cores||GPU cores||Shader Cores||GPU Clock||Tdp||L2 Cache||Process|
|A10-7850K||4.0 GHz||3.7 GHz||4||8||512||720 MHz||95W||4MB||28nm|
|A10-7700K||3.8 GHz||3.4 GHz||4||6||384||720 MHz||95W||4MB||28nm|
|A8-7600||3.8 GHz||3.3 GHz||4||6||384||720 MHz||65W||4MB||28nm|
|A8-7600||3.3 GHz||3.1 GHz||4||6||384||720 MHz||45W||4MB||
For the GPU part of the APU things have changed a bit. Kaveri now has a GCN based Radeon GPU with 384 up-to 512 shader cores. The distinct difference is that Llano had an architecture based on the Radeon 5000 while Trinity and Richland makes use of Radeon 6000 architecture and Kaveri now used Radeon series 7000 (GCN) architecture similar to the Radeon 7000 series Radeon products. Also an upgraded model video-encoder has been integrated
- AMD calls the GPU embedded into the A10 5800 the Radeon HD 7660D, it runs at 800 MHz.
- AMD calls the GPU embedded into the A10 6800K the Radeon HD 8670D, it runs at 844 MHz.
- AMD calls the GPU embedded into the A10 6790K the Radeon HD 8670D, it runs at 844 MHz.
- AMD calls the GPU embedded into the A10 7600 the Radeon Series 7 384 SP, it runs at 720 MHz.
- AMD calls the GPU embedded into the A10 7700K the Radeon Series 7 384 SP, it runs at 720MHz.
- AMD calls the GPU embedded into the A10 7850K the Radeon Series 7 512 SP, it runs at 720MHz.
- Trinity came with a dual-channel memory controller with official support up-to 1866 MHz, new is low voltage memory support which will make the use of 1.25V and 1.5V very easy.
- Richland comes with a dual-channel memory controller with official support up-to 2133 MHz, with support for low voltage memory support which will make the use of 1.25V and 1.5V very easy.
- Kaveri comes with a dual-channel memory controller with official support up-to 2400 MHz, with support for low voltage memory support which will make the use of 1.25V and 1.5V very easy.
- Trinity and Richland have a 32nm 246 mm² die-size and 1.3 Billion transistors.
- Intel has a 216mm² die on 32nm Sandy bridge processors which have 1.16 Billion transistors.
- Ivy bridge and Haswell (22nm) have 1.4 Billion transistors.
- Kaveri has a 28nm 245 mm² die-size and 2.41 Billion transistors.
So the 2.41 Billion transistors really stand out. A lot of it goes into cache and the integrated IGP. One of the biggest improvements for Kaveri is its hUMA shared memory controller that fetches data to both x86 and GCN processing units and also works like a hub. The new APUs also support heterogeneous queuing that characterizes how processors cooperate equally and enables uniform visibility into the entire memory domain for both the GPU and CPU. Thanks to hUMA and heterogeneous queuing, applications can effortlessly run processes on various types of cores, such as x86 CPU or GCN GPU compute units. Keep in mind though that not all applications can offload computing to stream processors of graphics adapters, hence only a number of programs will take advantage of only four x86 processors.
Steamroller CPU Cores
We now leave Piledriver and Bulldozer cores behind us, Kaveri comes with up to 4 “Steamroller” x86 computing cores arranged as two dual-core units.
- Support for the latest ISA instructions including FMA4/3, AVX, AES, XOP
- Up to 2MB L2 cache per dual-core module (up to 4MB total)
- Maximum Turbo Frequencies up to 4GHz
Inside the APU you'll spot two clusters. Inside the modules we see two integer CPU-cores which both share a floating point unit. The L1 instruction cache size was increased from 64KB to 96KB. The L2 cache is 2MB per dual-core module, so that's 1MB per core so to speak. Basically is is the on-die cache hierarchy:
- L1 Code cache 96KB x2
- L1 Data cache 16KBx4
- L2 Cache 2048KB x2
What you will see with Kaveri are performance benefits over competitor and older-generation architecture in the more generic compute applications, especially those utilizing OpenCL for calculation where a lot of numbers need to be crucnhed. The joint architecture also enables a retained support for security features including FMA4/3, AVX, AES, and XOP. It all comes together into one package that is more than powerful for desktop computing applications, yet does not neglect the essentials required by typical users. One more improvement for Kaveri is the latest revision of AMD Turbo Core, now at v3.0, the technology has been enhanced. The Turbo mode can clock the processor cores up and down real fast when the power usage and temperature allow for it. Kaveri has three primary elements merged into the APU; the Northbridge, the CPU and the GPU. Intel places more focus on raw CPU performance, AMD places more focus on the multi-media experience and compute side of things, thus the GPU. There's a heck of a lot more to be found inside the APU though, a DDR3 memory controller, Unified Video Decoder core logic, that Northbridge, a PCI Express interface and of course a DDI interface to output to digital monitors. But let's talk about the GPU as well.