AMD is looking into chiplets to connect future CPUs and GPUs
Multi-die chips are what we're talking about to be able to fight off Moore's Law in the future. AMD already is using the technology to connect multiple processors in Threadripper and for servers, Epyc, but also Intel with Kaby Lake-G. However new in research are Chiplets, are parts of chips stacked together on an interposer that form the entire chip.
One of the bigger issues at hand when manufacturing large CPU/GPU dies is that yields decrease and costs goes up. As you create larger products, that is an issue. So why not combine multiple dies onto one merged package? In an example, as posted in a paper (below) you can see four GPUs with a CPU partition combined into one chip. The data is then moved via an interposer, basically sending the right data to the right place. AMD published a new study in which it studied chiplets for new multi-die CPU and GPU designs as posted at spectrum.ieee.org, check it out:
The time may be coming when computers and other systems are made not from individually packaged chips attached to a printed circuit board but from bare ICs interconnected on a larger slice of silicon. Researchers have been developing this concept called “chiplets” with the idea that it will let data move faster and freer to make smaller, cheaper, and more tightly integrated computer systems. The idea is that individual CPUs, memory, and other key systems can all be mounted onto a relatively large slice of silicon, called an active interposer, which is thick with interconnects and routing circuits.
“In some sense if this were to pan out it’s somewhat similar to the integration story—Moore’s Law and everything else—that we’ve been writing for decades,” says Gabriel Loh, Fellow Design Engineer at AMD. “It allows the industry to take a variety of system components and integrate them more compactly and more efficiently together.”
There’s (at least) one problem: Though each chiplet’s own on-chip routing system can work perfectly, when they’re all connected together on the interposer’s network a situation can arise where a network tries to route data in such a way that a traffic jam occurs that winds up seizing up the computer. “A deadlock can happen basically where you have a circle or a cycle of different messages all trying to compete for same sorts of resources causing everyone to wait for everyone else,” Loh explains.
“Each of those individual [chiplets] could be designed so that they never have deadlocks,” says Loh. “But once I put them together, there are now new paths and new routes that no individual had planned for ahead of time.” Trying to avoid these new deadlocks by designing all the chiplets together with a particular interposer network in mind would defeat the advantages of the technique: Chiplets, then, couldn’t be designed and optimized easily by separate teams, and they couldn’t easily be mixed and matched to quickly form new systems. At the International Symposium on Computer Architecture earlier this month, engineers at AMD presented a potential solution to this impending problem.
The AMD team found that deadlocks on active interposers basically disappear if you follow a few simple rules when designing on-chip networks. These rules govern where data is allowed to enter and leave the chip and also restricts which directions it can go when it first enters the chip. Amazingly, if you follow those rules you can pretend everything else on the interposer—all the other logic chiplets, memory, the interposer’s own network, everything—is just one node on the network. Knowing that, separate teams of engineers can design chiplets without having to worry about how the networks on other chiplets work or even how the network on the active interposer works.
It may be some time before this trick is even needed. So-called passive interposers—silicon that contains interconnects but no network circuits—are already in use; AMD has been using one for its Radeon R9 series, for example. But adding an intelligent network to the interposer could lead to a big change in how systems are designed and what they can do.
Senior Member
Posts: 1309
Joined: 2003-09-14
Exciting times ahead, for sure

Junior Member
Posts: 2
Joined: 2016-10-12
This is actually rather old news. Last year AMD published a White Paper along with GH Loh and others titled: "Design and Analysis of an APU for Exascale Computing". It goes on in detail to describe the Multi Chip Module for the Exascale Node Architecture.
A pdf can be found here: http://www.computermachines.org/joe/publications/pdfs/hpca2017_exascale_apu.pdf
Senior Member
Posts: 1992
Joined: 2013-06-04
This is actually rather old news. Last year AMD published a White Paper along with GH Loh and others titled: "Design and Analysis of an APU for Exascale Computing". It goes on in detail to describe the Multi Chip Module for the Exascale Node Architecture.
A pdf can be found here: http://www.computermachines.org/joe/publications/pdfs/hpca2017_exascale_apu.pdf
Nice way to make your first post.
Senior Member
Posts: 162
Joined: 2018-06-15
Future looks bright.
Senior Member
Posts: 2039
Joined: 2008-07-16
And so the cycle repeats...
1) First computers were made out of individual components (vacuum tubes and relays, then transistors)
2) Many of these transistors eventually got integrated into "integrated circuits", which live to this day (a CPU or GPU is still an I.C.)
3) Eventually we learned to put so much stuff on one single I.C. that it became the computer itself (System-on-a-Chip, or SoC)
4) But we're reaching the limit of how much stuff can we put on one single chip, so... it's getting split into these "chiplets"
Future:
5) Eventually with new technology (beyond silicon) it will be possible to put so many "chiplets" on one interposer, as it will become an "integrated chiplet array" or some weird name like that, no different than one I.C. of today.
6) Even more into the future *, we'll be able to stack chiplets and cooling inside the same "array", resulting in gigantic computing power in the shape of a cube of sorts, cooled by water flowing through the chip itself.
* This has already been experimented by IBM, but never produced on a large scale (too expensive).