Nvidia might be moving to Multi-Chip-Module GPU design

user1

2017-07-05 17:02

I'm pretty sure Infinity Fabric uses PCIe lanes for communication, maybe it can use other transports as well. Between the CPU's on an Epyc chip, there are 64PCIe lanes going in between each cpu, if i read the slides correctly. They can cut the latencies due to the short hops in between the on-chip cpu's, and the bandwidth should be plenty. A GPU that has 2x16 PCIe lanes, could use the second set for intra-GPU signalling. Ideally you'd want 4 sets, like the North/South/East/West links on those DEC Alpha chips. That way, each GPU Die would be only one hop from any other, up to a certain number of dies.

The I.F. protocol can be implemented over different kinds of links, The Epyc and thread ripper MCM's dies are connected to each other over GMI links on the interposer (~42gb/s bidirectional perlink, each zepplin die has 4 GMI controllers), which are independent of the pcie controllers. I.F. runs over pcie lanes only in the Dual socket configuration of epyc, in that configuration it is known as xGMI. A Gpu mcm by amd would probably use the same or similar GMI controllers. Its possible that vega already has these since not much is known about the die and amd has stated they are using the I.F. on vega

#5449709

ivymike10mt

2017-07-05 17:20

You can forgot about that, it will be same like in cpu case, devs need to do multicore stuff between cluster of gpu(in this case!) so we will have gpu multithreading too.

I think game devs waitig till someone do their hardest jobs. Like Microsoft do DirectX for "them". Like nVidia "help" with gameworks. Like Unreal Engine do engines etc.. etc.. Thats why I call game developers lazy actually 😀 They give up with initiative way of programming.. mostly for fast profits. Multithreads in GPU tasks make sense. CPU arhitecture already work on multitasking.

#5449713

Han2K

2017-07-05 17:29

They were right! http://www.3dfx.ch/gallery/d/16281-1/3dfx+Voodoo+5+6000+AGP+128MB+Rev_A1+1500+Octa+fan+card+a.JPG

#5449722

Denial

2017-07-05 18:15

I think game devs waitig till someone do their hardest jobs. Like Microsoft do DirectX for "them". Like nVidia "help" with gameworks. Like Unreal Engine do engines etc.. etc.. Thats why I call game developers lazy actually 😀 They give up with initiative way of programming.. mostly for fast profits. Multithreads in GPU tasks make sense. CPU arhitecture already work on multitasking.

Game developers are not lazy. During crunch time on a game, they often work 6-7 days a week for 12+ hours a day for several months. http://kotaku.com/crunch-time-why-game-developers-work-such-insane-hours-1704744577 That's not to mention that there are only handful of developers with the knowledge of being able to develop low level APIs, low level engine systems, network code that can scale across thousands of servers and clients, etc. They don't teach any of that stuff in game design school - they teach you like LUA scripting and some C++/Java. The real good work comes from people specialized in certain fields, for example network engineering, that happen to take interest in gaming. So when Unreal developers, or Nvidia with Gameworks, or AMD with GPUOpen come in and build out a bunch of libraries for developers, it's extremely helpful. It shouldn't reflect on the developers that utilize it. Honestly the series of videos that Star Citizen has been putting out lately provides excellent insight into what it takes to build and scale a game out over multiple studios. They show you how they have to build a production pipeline with a few extremely talented people before they think about hiring a mass of artists and designers for content check-in. Just scheduling and bringing new hires up to speed on the engine, scripting, design of races/ships/etc takes months. I would argue that the level of production/talent/work in modern AAA games probably exceeds what most big budget movie studios are doing.

#5449787

Exascale

2017-07-05 21:46

Intel seems to be slowing heading in this direction too. AMD woke the sleeping giant. Current CPUs coming out are old tech. I'm very interested to see what's coming in 2 years when we're on 10/7nm tech. I'll still get a Volta Titan or Ti when it's out, but I'm guessing the first card to use this tech will be whatever comes after Volta. Exciting times coming in 2020...

x86 CPUs havent changed much, but there are still extremely advanced SPARC, and now ARM CPUs being made. The SPARC M7 and SPARC XIfx come to mind as two of the most advanced CPUs around, and we are seeing huge performance increaces from generation to generation in other architectures besides x86. NEC is even announcing a new vector CPU today. Memory bandwidth and capacity, as well as data locality are goimg to be the next big things to focus on, because shrinking transistors isnt the way it used to be. Going smaller isnt just positives any more.

#5449791

Prince Valiant

2017-07-05 21:51

Edit: On second thought, best to not get too far off track.

#5449817

Crazy Serb

2017-07-05 23:26

They explain all this in the PDF in the article.
outperforms the baseline multi-GPU by an average of 51.9%

They already benched this paper GPU?! Since this companies like to take best case scenarios, number would probably be lower in general. On top of that, this is probably difference over performance gains from 2nd GPU, and we already know how good multi-GPUs are scaling, especially in games.

#5449822

Denial

2017-07-05 23:55

Doing work because you have to is very different from working hard all the time. Video game development scheduling is notoriously bad too. Don't get me wrong, I have plenty of respect for the skilled people doing low level stuff, designing engines, libraries, and other hard work. What I can't respect is pushing out half-broken games with poorly implemented features (ie. bad positional audio), game breaking bugs, or an endless stream of moderate bugs (annoying and constant but not game-breaking). I'd rather see longer development cycles if that's what needs to be done to get a good end result.

I definitely agree that games need either longer development cycles or do what the Hellblade dev is doing and cut the content down so they don't need to sacrifice quality for it.

They already benched this paper GPU?! Since this companies like to take best case scenarios, number would probably be lower in general. On top of that, this is probably difference over performance gains from 2nd GPU, and we already know how good multi-GPUs are scaling, especially in games.

Nvidia has a Cadence Palladium system that allows them to design/prototype and validate virtual GPU's without having to build one. They can simulate performance with a high degree of accuracy across a number of different benchmarks. They've designed/prototyped every GPU since Kepler on Cadence EDA tools/hardware. https://www.cadence.com/content/cadence-www/global/en_US/home/tools/system-design-and-verification.html As for your second paragraph, it's also answered in the PDF that apparently no one is reading but feels the need to comment on:

A system with 256 SMs can also be built by interconnecting two maximally sized discrete GPUs of 128 SMs each. Similar to our MCM-GPU proposal, each GPU has a private 128KB L1 cache per SM, an 8MB memory-side cache, and 1.5 TB/s of DRAM bandwidth. We assume such a configuration as a maximally sized future monolithic GPU design. We assume that two GPUs are interconnected via the next generation of on-board level links with 256 GB/s of aggregate bandwidth, improving upon the 160 GB/s commercially available today [17]. We assume the multi-GPU to be fully transparent to the programmer. This is accomplished by assuming the following two features: (i) a unified memory architecture between two peer GPUs, where both GPUs can access local and remote DRAM resources with load/store semantics, (ii) a combination of system software and hardware which automatically distributes CTAs of the same kernel across GPUs. In such a multi-GPU system the challenges of load imbalance, data placement, workload distribution and interconnection bandwidth discussed in Sections 3 and 5, are amplified due to severe NUMA effects from the lower inter-GPU bandwidth. Distributed CTA scheduling together with the first-touch page allocation mechanism (described respectively in Sections 5.2 and 5.3) are also applied to the multi-GPU.

So it's essentially a best case multi-GPU setup, an optimized version of said setup that they also simulated, and a simulated MCM design. They don't test games, so the issue of SLI scaling due to memory or previous frame data being required is not related.

#5449888

Exascale

2017-07-06 06:53

Its kind of weird that they talk about the next generation board level links being 256GB/s when NVLink 2.0 is basically out and has 300GB/s link speed. Its also crazy how far ahead of everyone else Fujitsu is. In 2015 they started shipping SPARC XIfx systems with 250GB/s link speeds using optical links. I cant wait to see their next generation.

#5449892

Fox2232

2017-07-06 07:15

You can forgot about that, it will be same like in cpu case, devs need to do multicore stuff between cluster of gpu(in this case!) so we will have gpu multithreading too.

Well, that is if GPU manufacturer ops to use "Full Stand alone GPUs" to be glued together. But once There is one main I/O GPU block and then there are modules (which alone can't do anything) attached to it, you get one GPU from practical standpoint. In Case of AMD, we would say that this module would be ACE and everything what is under its control. And since AMD has infinity fabric, even memory controller can be well distributed. Can't tell you how good would this approach be for gaming, but from compute standpoint I expect no loss in performance (in comparison to monolithic GPU) due to high granularity of workloads.

#5449897

chronek

2017-07-06 07:42

Multi-Chip-Module GPU will be cheaper to produce, i hope that will be easier to cool too

#5449922

Exascale

2017-07-06 09:13

Multi-Chip-Module GPU will be cheaper to produce, i hope that will be easier to cool too

Theyll probably have extremely high transistor density and therefore power density as well. Most of these will go into liquid cooled(woth proper brazed copper or Asetek made coolers with microfinned coldplates) or chilled datacenters.

#5449932

robintson

2017-07-06 10:00

They will just adopt the "CPU model" on the GPU's. One GPU with many cores and threads and it will function in a similar way like Intel's i9, or AMD Ryzen CPU for example. Sooner or later GPU manufacturers will be forced to go with "GPU Multi Core" and "GPU Multi threading" as well, no surprise here.

#5449945

ivymike10mt

2017-07-06 10:44

Game developers are not lazy. During crunch time on a game, they often work 6-7 days a week for 12+ hours a day for several months. http://kotaku.com/crunch-time-why-game-developers-work-such-insane-hours-1704744577 That's not to mention that there are only handful of developers with the knowledge of being able to develop low level APIs, low level engine systems, network code that can scale across thousands of servers and clients, etc. They don't teach any of that stuff in game design school - they teach you like LUA scripting and some C++/Java. The real good work comes from people specialized in certain fields, for example network engineering, that happen to take interest in gaming. So when Unreal developers, or Nvidia with Gameworks, or AMD with GPUOpen come in and build out a bunch of libraries for developers, it's extremely helpful. It shouldn't reflect on the developers that utilize it. Honestly the series of videos that Star Citizen has been putting out lately provides excellent insight into what it takes to build and scale a game out over multiple studios. They show you how they have to build a production pipeline with a few extremely talented people before they think about hiring a mass of artists and designers for content check-in. Just scheduling and bringing new hires up to speed on the engine, scripting, design of races/ships/etc takes months. I would argue that the level of production/talent/work in modern AAA games probably exceeds what most big budget movie studios are doing.

I like agree. But I just can't. Alot people complain these days. About bad PC ports.. Broken game day one. Patches brings more bugs. Complains about series getting very bored, coz they very schematic. Using same effects, same engines, same procedures etc.. Also funny is, when developer say that "multi-gpu support is impossible".. Then some days latter, we can find sli-bits in net working good. I still remember times when titles in day one was fully functional v1.0 meant something. Today!? its a huge joke (in consumer face) devs relasing expensive DLC's, when game is bad condotion /unpleyable/bugged/unoptimized etc. Why so many small studios open when, Crytek, Microsoft, and some other cut jobs..? I not even expect they will speak the whole truth. We living in times when, small studios can bring more fresh on the table, than huge devs. Its really shame. Its shame programmers work under pressure or streess. Its have very negative effect on evrything. Especially product quality. I remember older times (10-20 yrs bck) when alot games use OpenGL. They look and perform really amazing, that times. Sorry for abit long post, and greetings 🙂

#5449959

BlazeInterior

2017-07-06 11:45

Thanks for the SHARING, it's a great ANSWER. Looking forward to seeing what they achieve with their first commercial Multi-Chip-Module GPU.LOL

#5449986

Xendance

2017-07-06 13:36

both companies need to mask the amount of gpus from the OS driver level so the system only sees 1 and the onboard bios of the gpu decides out how the gpu dishes out the utilization otherwise we will be stuck waiting and hoping the developers figure it out Same goes for CPU i really want to find the documents on this it was discussed way back in mid 2000's how its possible but no one wants to do it.. and from what I can find it has been done back in the earlier days aka voodoo and someother company forget which one where os and drivers only seen it as 1

That makes no sense at all.

#5450003

Exascale

2017-07-06 14:48

That makes no sense at all.

He didnt put it very elegantly but it makes perfect sense that the data locality and latency needs to be handled on as low a level as possible, so that software developers dont need to specially code their software to scale properly across all GPM partitions. Its actually discussed in the PDF if anyone actually read it. It would be something like automatic vectorization.

#5450044

TieSKey

2017-07-06 17:00

I like agree. But I just can't. Alot people complain these days. About bad PC ports.. Broken game day one. Patches brings more bugs. Complains about series getting very bored, coz they very schematic. Using same effects, same engines, same procedures etc.. Also funny is, when developer say that "multi-gpu support is impossible".. Then some days latter, we can find sli-bits in net working good. I still remember times when titles in day one was fully functional v1.0 meant something. Today!? its a huge joke (in consumer face) devs relasing expensive DLC's, when game is bad condotion /unpleyable/bugged/unoptimized etc. Why so many small studios open when, Crytek, Microsoft, and some other cut jobs..? I not even expect they will speak the whole truth. We living in times when, small studios can bring more fresh on the table, than huge devs. Its really shame. Its shame programmers work under pressure or streess. Its have very negative effect on evrything. Especially product quality. I remember older times (10-20 yrs bck) when alot games use OpenGL. They look and perform really amazing, that times. Sorry for abit long post, and greetings 🙂

I think it all comes down to the global tendency for quicker ROI. Why risk innovating in a good and polished game when u can just release same crap with a different hat (Simpson pun) and get a lot of $$ anyway... Small studios open cuz they have passion for games or whatever they do, they are willing to risk work hours and the few "cents" they have, while big companies only care about maximizing earning margins in the shorter term possible. Add to that a huge portion of the customers don't give a s... and only care about the graphics, so AAA game studios (AAA in graphics and C in everything else) invest too much on artists and too few in technical/story stuff. Back on topic. Since rendering is inherently positional, there's a lot of space in driver/engine land to mess up with data locality. For instance, divide screen in 4 pieces, keep mesh+texture data of object in 1st quadrant in chip A, move it to chip B once it moves (in game) to 2nd quadrant, etc.