AMD EPYC Processors Integrated into New NVIDIA DGX A100

Published by

Click here to post a comment for AMD EPYC Processors Integrated into New NVIDIA DGX A100 on our message forum
data/avatar/default/avatar19.webp
AMD+Nvidia is a very powerful combination...
https://forums.guru3d.com/data/avatars/m/248/248994.jpg
Nvidia already announced this a while back. I guess AMD was waiting for a slow news day to have something to tell.
data/avatar/default/avatar32.webp
That's $14,000 from $200,000 the machine that everyone wants going to AMD. Minus any discount given to Nvidia.
https://forums.guru3d.com/data/avatars/m/246/246171.jpg
Not that I really care but isn't 128 cores overkill for a GPGPU server? CPUs don't tend to work very hard if the GPUs are crunching big numbers. You don't gain any usable PCIe lanes when adding a 2nd socket. But, Nvidia must know what they're doing - the price premium going from single socket to dual socket EPYC is hefty (though amusingly, still super cheap compared to Intel).
https://forums.guru3d.com/data/avatars/m/243/243702.jpg
For a moment, I took "integrated" word seriously there. And I was thinking if there is some chip that uses nVidia's IP in form of nV-Link.
data/avatar/default/avatar23.webp
schmidtbag:

Not that I really care but isn't 128 cores overkill for a GPGPU server? CPUs don't tend to work very hard if the GPUs are crunching big numbers. You don't gain any usable PCIe lanes when adding a 2nd socket. But, Nvidia must know what they're doing - the price premium going from single socket to dual socket EPYC is hefty (though amusingly, still super cheap compared to Intel).
Not really overkill considering its workload. Last year, I did a lot of profiling on multiple server configuration to run MLPerf benchmark (" a SPEC-like benchmark for Machine Learning"). The CPUs do the data preprocessing and batching before sending them to GPUs for performing major calculations. I saw that the dual Xeon Platinum with 24 cores each has utilization around 60% for 4 x NVIDIA Tesla V100 systems. That is for single-user only. In the case of DGX-A100, I believe that it supports multiple users doing multiple different things, and thus CPU performance also plays important roles here, especially to be able to feed enough data for 8 x A100 GPUs. Finally, the number of lanes for PCIe is also mattered; it will connect the GPUs to CPUs as well as to NVME storage and Mellanox NICs. Having 128 lanes of PCIe 4.0 should be really helpful.
https://forums.guru3d.com/data/avatars/m/246/246171.jpg
Bagus Hanindhito:

Not really overkill considering its workload. Last year, I did a lot of profiling on multiple server configuration to run MLPerf benchmark (" a SPEC-like benchmark for Machine Learning"). The CPUs do the data preprocessing and batching before sending them to GPUs for performing major calculations. I saw that the dual Xeon Platinum with 24 cores each has utilization around 60% for 4 x NVIDIA Tesla V100 systems. That is for single-user only. In the case of DGX-A100, I believe that it supports multiple users doing multiple different things, and thus CPU performance also plays important roles here, especially to be able to feed enough data for 8 x A100 GPUs. Finally, the number of lanes for PCIe is also mattered; it will connect the GPUs to CPUs as well as to NVME storage and Mellanox NICs. Having 128 lanes of PCIe 4.0 should be really helpful.
Understood; I have a PC built for BOINC and I'm aware some projects can have some temporary heavy CPU utilization to prepare the workload. But it's only especially a problem when all workloads are starting at the same time. Once the GPU workloads are running, there isn't a whole lot of CPU usage going on. So, as long as the workloads are slightly different enough where they don't all complete at the same time, that ought to give the CPU enough breathing room between starting/completing each workload. In other words, only when you first initialize the system will you see most of the cores getting utilized, where CPU usage becomes more "spread out" after each workload completes. Of course, I'm making a lot of assumptions here. It's very possible that all workloads Nvidia intends to do will complete at the same time (or close to it). It's very possible that there is cross-communication between the rest of the system while GPU workloads are running, which will cause an increase in CPU usage throughout the workload being processed. But even then.... 256 threads? I know this is some powerful hardware Nvidia is working on but it just seems very surprising to me that a single 64-core Epyc would be a bottleneck. Also, I'm only questioning the core count; I totally understand Nvidia's demand for the 128 PCIe 4.0 lanes; I'd make the same priority in their shoes too. But whether you do dual 64-core Epycs or a single 32-core, you're still getting 128 usable PCIe lanes. So, if a single 64-core Epyc isn't going to be a bottleneck, doubling up on them seems to be a very hefty expense for no real gain. But like I said, Nvidia obviously knows what they're doing if they're spending that much more.
https://forums.guru3d.com/data/avatars/m/277/277212.jpg
It probably is a bit of overkill but DGX systems are intended for containerized usage, for the most part, where many users are running lots of different jobs on them. A dual-CPU system with these processors could support a LOT of users and that's what they are intended for. This is THE ideal machine for GPU computing as a service on a large scale.