AMD Next Horizon Event - Epyc 2 64 Cores - 7nm "ZEN 2" and Radeon Instinct MI60
AMD is running its 'Next Horizon' event with announcements and more details about the company's pending 7nm products are shared. The event is intended for the server and data-center centric announcements, but anything EPYC will find it's way towards the desktop as well of course.
In this news-item an overview of what is presented. AMD will be launching Radeon Instinct MI60 based on 7nm, a Hardware Virtualized GPU (via Twitter). But obviously, today is all about data-center applications, Epyc 2 at 7nm.
AMD will be previewing ROME the next-gen 7nm DataCenter EPYC CPU. The new 7nm CPU parts will be based on what we all know as ZEN2 which will offer increased IPC. (Zen had +52% IPC improvement). It was briefly mentioned that a boost of performance by 25 percent is expected (process-based), compared to Zen+. Mark Papermaster calls it a new phase of high perf. ZEN2 is now sampling and ZEN3 is on track as Papermaster mentions. ZEN3 is very likely the ZEN2 architecture, yet based on an improved fabrication process, 7nm+ (much like we all have seen with the tick/tock strategy from Intel).
ZEN2 will find its way into the market in 2019. Papermaster called it a gamechanger as well as mentioning that AMD is betting big on 7nm, as AMD felt 10nm was not the right call to make. He reconfirms the partnership with TSMC for 7nm production. (MI60)
Chiplets
The fabrication processes bring twice the density, half the power (at same perf) and 1.25x perf (at same power). It has architecture changes, like CPU Core Execution enhancements, improved branch prediction, better pre-fetching and larger op-caches. Chiplets, Zen 2 seems to feature a 14nm I/O die along with 7nm CPU chiplets (for EPYC). AMD also mentions that the process will bring great value.
AMD is updating its Infinity Fabric that connects the different dies that hold the cores. Current Epyc, Ryzen and Threadripper CPUs all are connected via the Infinity Fabric. With the Zen 2 architecture, AMD places one I/O die chip that sits in the middle, which is connected to two core dies. These are called AMD CPU chiplets, connected with the 2nd gen generation Infinity Fabric. The i/o is fabbed at 14nm process, the CPU cores (two) on 7nm, upcoming EPYC chips will include multiple Zen 2 CPU modules. Interesting addition, it supports eight DDR DRAM interfaces.
David Wang takes the stage, he is the new SVP, Engineering for the Radeon technologies group. He's talking about AMD Radeon graphics moving to the Datacenter.
Hey now, PCIe 4.0! 1TB/s bandwidth / end-to-end ECC. The MI60 based on 7nm is fitted with 32GB of HBM2 and offers Infinity Fabric GPU to GPU at 100 GB/s per link. Lots of talk about machine learning etc. It available in Q4.
AMD Rome photos courtesy of Anandtech / Toms hardware
Lisa Su has taken back the stage and is announcing up-to 64 Zen2 cores (128 threads) per socket (Rome). It has an 8-core die within the middle the 14nm IO die. It is a PCIe 4.0 capable x86 Server CPU with increased IPC. The road to Rome is through Naples (socket compatible). Lisa mentions 2x performance per socket. It can be paired with up to 4TB of DRAM. AMD was showing a demo on Rome, clearly outclassing a dual-socket Intel server system. The 64-core AMD setup was air-cooled.
A very impressive presentation on Rome alright, and again .. what goes into the data-center end up in the Desktop. 2019 is going to be a very exciting year.
That's it for this new update, we'll follow up with press releases and such in the morning.
Senior Member
Posts: 13040
Joined: 2003-05-11
AMD Gimped the Tesla V100
https://wccftech.com/amd-radeon-mi60-resnet-benchmarks-v100-tensor-not-used/
Senior Member
Posts: 999
Joined: 2001-08-12
AMD Gimped the Tesla V100
https://wccftech.com/amd-radeon-mi60-resnet-benchmarks-v100-tensor-not-used/
From my understanding the Tensor cores can only be used in certain circumstances, in this case AMD picked the one where Tensor cores couldn't be used to show their advantage. So cherry picked really, after all why would you pick the circumstances where your competitor has an edge when show casing your own product?
Tensor cores apparently don't work in FP32, they only work using using in FP16 or in a mixed FP16 and FP32 workloads where the tensor cores do the 16FP and the card processes 32FP, thus the test is litigate, but only if you use only FP32 instructions.
Senior Member
Posts: 2087
Joined: 2006-09-28
I have to admit I'm very uncertain myself at this point.
One of the AMD slides claimed that the IO chiplet design had "improved latencies and power", which I take to mean "reduced" when it comes to latency. Of course there's no telling what kind of latency they're talking about here, it might not refer to memory at all and even if it does it could mean improved compared to cross-CCX latency. *shrug* It's all marketing stuff so likely to show the ideal case.
That said I can quite easily imagine Ryzen 3k ending up with the same 7nm 8-core CPU chiplet as Rome and a different IO die. It would allow AMD to keep reusing the same 7nm dies and it's presumably cheaper to design different IO dies on 14nm than a monolithic Ryzen 3k SoC on 7nm, given that the IO doesn't scale as well.
I suppose it primarily comes down to
whether the IO, especially memory, latency of the CPU + IO chiplet is competitive or improved compared to a monolithic design for lower core-counts and
whether the CPU chiplet used in Rome can run at higher clocks/voltage suitable for desktop computing without any design tweaks as well as
cost/size constraints of each approach.
If the CPU chiplet will require a major redesign for desktop usage regardless I could well see a monolithic design for Ryzen 3k.
The again a two "high performance" 4-core CPU chiplets and separate IO design might also make sense in that scenario.
Bah, too many options at this point! It's been a long time since I felt as invested in tech news as I am today.