AMD EPYC Processors Integrated into New NVIDIA DGX A100

#5797805

dampflokfreund

2020-06-10 09:47

AMD+Nvidia is a very powerful combination...

#5797809

Kaarme

2020-06-10 10:13

Nvidia already announced this a while back. I guess AMD was waiting for a slow news day to have something to tell.

#5797847

Noisiv

2020-06-10 12:31

That's $14,000 from $200,000 the machine that everyone wants going to AMD. Minus any discount given to Nvidia.

#5797919

schmidtbag

2020-06-10 16:16

Not that I really care but isn't 128 cores overkill for a GPGPU server? CPUs don't tend to work very hard if the GPUs are crunching big numbers. You don't gain any usable PCIe lanes when adding a 2nd socket. But, Nvidia must know what they're doing - the price premium going from single socket to dual socket EPYC is hefty (though amusingly, still super cheap compared to Intel).

#5797923

Fox2232

2020-06-10 16:28

For a moment, I took "integrated" word seriously there. And I was thinking if there is some chip that uses nVidia's IP in form of nV-Link.

#5797932

Bagus Hanindhito

2020-06-10 16:47

schmidtbag:

Not that I really care but isn't 128 cores overkill for a GPGPU server? CPUs don't tend to work very hard if the GPUs are crunching big numbers. You don't gain any usable PCIe lanes when adding a 2nd socket. But, Nvidia must know what they're doing - the price premium going from single socket to dual socket EPYC is hefty (though amusingly, still super cheap compared to Intel).

Not really overkill considering its workload. Last year, I did a lot of profiling on multiple server configuration to run MLPerf benchmark (" a SPEC-like benchmark for Machine Learning"). The CPUs do the data preprocessing and batching before sending them to GPUs for performing major calculations. I saw that the dual Xeon Platinum with 24 cores each has utilization around 60% for 4 x NVIDIA Tesla V100 systems. That is for single-user only. In the case of DGX-A100, I believe that it supports multiple users doing multiple different things, and thus CPU performance also plays important roles here, especially to be able to feed enough data for 8 x A100 GPUs. Finally, the number of lanes for PCIe is also mattered; it will connect the GPUs to CPUs as well as to NVME storage and Mellanox NICs. Having 128 lanes of PCIe 4.0 should be really helpful.

#5797956

schmidtbag

2020-06-10 17:38

Bagus Hanindhito:

Not really overkill considering its workload. Last year, I did a lot of profiling on multiple server configuration to run MLPerf benchmark (" a SPEC-like benchmark for Machine Learning"). The CPUs do the data preprocessing and batching before sending them to GPUs for performing major calculations. I saw that the dual Xeon Platinum with 24 cores each has utilization around 60% for 4 x NVIDIA Tesla V100 systems. That is for single-user only. In the case of DGX-A100, I believe that it supports multiple users doing multiple different things, and thus CPU performance also plays important roles here, especially to be able to feed enough data for 8 x A100 GPUs. Finally, the number of lanes for PCIe is also mattered; it will connect the GPUs to CPUs as well as to NVME storage and Mellanox NICs. Having 128 lanes of PCIe 4.0 should be really helpful.

Understood; I have a PC built for BOINC and I'm aware some projects can have some temporary heavy CPU utilization to prepare the workload. But it's only especially a problem when all workloads are starting at the same time. Once the GPU workloads are running, there isn't a whole lot of CPU usage going on. So, as long as the workloads are slightly different enough where they don't all complete at the same time, that ought to give the CPU enough breathing room between starting/completing each workload. In other words, only when you first initialize the system will you see most of the cores getting utilized, where CPU usage becomes more "spread out" after each workload completes. Of course, I'm making a lot of assumptions here. It's very possible that all workloads Nvidia intends to do will complete at the same time (or close to it). It's very possible that there is cross-communication between the rest of the system while GPU workloads are running, which will cause an increase in CPU usage throughout the workload being processed. But even then.... 256 threads? I know this is some powerful hardware Nvidia is working on but it just seems very surprising to me that a single 64-core Epyc would be a bottleneck. Also, I'm only questioning the core count; I totally understand Nvidia's demand for the 128 PCIe 4.0 lanes; I'd make the same priority in their shoes too. But whether you do dual 64-core Epycs or a single 32-core, you're still getting 128 usable PCIe lanes. So, if a single 64-core Epyc isn't going to be a bottleneck, doubling up on them seems to be a very hefty expense for no real gain. But like I said, Nvidia obviously knows what they're doing if they're spending that much more.

#5798071

Gomez Addams

2020-06-10 22:14

It probably is a bit of overkill but DGX systems are intended for containerized usage, for the most part, where many users are running lots of different jobs on them. A dual-CPU system with these processors could support a LOT of users and that's what they are intended for. This is THE ideal machine for GPU computing as a service on a large scale.