NVIDIA announces RTX IO, GPU to Directly Access SSD

schmidtbag

2020-09-02 15:41

Fox2232:

I used to have storage statistics via MSI Afterburner. So I kind of know at what rate games loaded data and when. And games gained almost nothing from my NVMe drives.

I'm not surprised - storage is hardly a bottleneck in games nowadays. I'm still using SATA because I know most games barely load faster with NVMe. It's everything that comes after storage (decompression, transferring over PCIe, dropping into VRAM, etc) that slows things down. As the article mentioned, you could speed things up by taking out some of this overhead. For games that don't have official support, my "prefetch" idea (I meant to say prefetch, not paging file) with already decompressed data could make a measurable performance improvement. Storage in theory should still be the bottleneck. But if you eliminate the very long and complicated path that game data takes to reach its destination, it will likely become a bottleneck. That isn't such a bad thing either - you want the slowest part in the system to be under 100% load. The fact that it isn't is a problem. DS can help alleviate that problem.

#5824845

pharma

2020-09-02 15:43

DirectStorage is coming to PC Sept 1, 2020

We’re excited to bring DirectStorage, an API in the DirectX family originally designed for the Velocity Architecture to Windows PCs! DirectStorage will bring best-in-class IO tech to both PC and console just as DirectX 12 Ultimate does with rendering tech. With a DirectStorage capable PC and a DirectStorage enabled game, you can look forward to vastly reduced load times and virtual worlds that are more expansive and detailed than ever.

https://devblogs.microsoft.com/directx/directstorage-is-coming-to-pc/

#5824911

Denial

2020-09-02 18:18

Cplifj:

Did nvidia just copy or license the AMD HBCC technology ?

It's not the same. GPUDirect completely bypasses system memory, allowing the GPU to pull from the hard disk. HBCC uses system memory as a VRAM cache and intelligently pulls from it. It's two completely different technologies. Also GPUDirect is made by Microsoft, not AMD/Nvidia. As for everyone else talking about how they copied AMD, Nvidia announced GPUDirect as part of it's Magnum IO API stack back in November last year.

#5824915

tsunami231

2020-09-02 18:23

so this gona be asnwer to console ps5/xbox faster loading? is this all built into the drivers and windows or "Extra" software that need to be installed? like say "drivex" and seeing it involves dx12 do i need newer version of windows still on 1907 here and is this gona be universal thing? meaning old game will support this? or is the game gona have to be patch to support this seeing is involves dx 12, what about DX9/10/11 games yes games are still using DX9 to this day, 10 to lesser degree, DX11 more the the other 2 fast as I can tell.

#5824926

Denial

2020-09-02 18:49

Cplifj:

Radeon pro SSG did something similar. Using SSD up to 1TB for storage via it's own m.2 slot. That i call similar to this Nvidia tech, just only slightly different since Nvidia uses the system SSD.

The SSG with the 1TB storage was kind of similar - it had a PCI-E switch on it and essentially bypassed the CPU to write data directly to the SSD - but it's not like Windows could see that drive or you could install games to it.

#5825080

richto

2020-09-03 00:11

Undying:

Ps5 we have instant loading! Nvidia hold my beer...

Just to note that the Xbox Series X has similar nvme4 accelerated decompression. Its not just on the PS5.

#5825099

user1

2020-09-03 01:08

Denial:

It's not the same. GPUDirect completely bypasses system memory, allowing the GPU to pull from the hard disk. HBCC uses system memory as a VRAM cache and intelligently pulls from it. It's two completely different technologies. Also GPUDirect is made by Microsoft, not AMD/Nvidia. As for everyone else talking about how they copied AMD, Nvidia announced GPUDirect as part of it's Magnum IO API stack back in November last year.

The HBCC can pull directly from any storage device according to some early slides, it specifically can use any storage as a cache (including things like network storage), , the HBCC is aware of different available memory pools and uses a tiered storage like solution presented as vram, the whitepaper doesn't detail using anything other than nvram or ram, so maybe it was cancelled or subject to some erratum. [spoiler] https://www.custompcreview.com/wp-content/uploads/2017/01/amd-vega-ces-2017-press-deck_Page_36.jpg [/spoiler] its not the same as RTX io/directstorage. though maybe amd can implement support for Directstorage in the same or similar way.

#5825110

Denial

2020-09-03 02:00

user1:

The HBCC can pull directly from any storage device according to some early slides, it specifically can use any storage as a cache (including things like network storage), , the HBCC is aware of different available memory pools and uses a tiered storage like solution presented as vram, the whitepaper doesn't detail using anything other than nvram or ram, so maybe it was cancelled or subject to some erratum. its not the same as RTX io/directstorage. though maybe amd can implement support for Directstorage in the same or similar way.

HBCC creates what AMD calls a HBC (High Bandwidth Cache) which resides in both VRAM/SDRAM in a tiered hierarchy, with VRAM as the last level cache. If the GPU requires an asset that's outside of this cache, the controller can request the CPU to fetch it and pull it within the HBC, than the GPU can utilize it. So while it can request data from any location, the data is moved into the HBC first and it's all done by the CPU. It's really not that much different than how GPUs worked prior to HBCC, but HBCC creates storage tiers and manages pages/swaps/etc for the developer. https://www.reddit.com/r/Amd/comments/7x552w/exploring_vega_hbcc_and_its_effect_on_the_system/ This post does a good job investigating the effects of HBCC on the CPU. _ GPUDirect Storage on the other hand allows the DMA on the NVMe drive to push the request data directly into the GPU's memory, bypassing both system memory, the CPU and the GPU's DMA engine entirely. I think this section from Nvidia explains it pretty well:

The PCI Express (PCIe) interface connects high-speed peripherals such as networking cards, RAID/NVMe storage, and GPUs to CPUs. PCIe Gen3, the system interface for Volta GPUs, delivers an aggregated maximum bandwidth of 16 GB/s. Once the protocol inefficiencies of headers and other overheads are factored out, the maximum achievable data rate is over 14 GB/s. Direct memory access (DMA) uses a copy engine to asynchronously move large blocks of data over PCIe rather than loads and stores. It offloads computing elements, leaving them free for other work. There are DMA engines in GPUs and storage-related devices like NVMe drivers and storage controllers but generally not in CPUs. In some cases, the DMA engine cannot be programmed for a given destination; for example, GPU DMA engines cannot target storage. Storage DMA engines cannot target GPU memory through the file system without GPUDirect Storage. DMA engines, however, need to be programmed by a driver on the CPU. When the CPU programs the GPU’s DMA, the commands from the CPU to GPU can interfere with other commands to the GPU. If a DMA engine in an NVMe drive or elsewhere near storage can be used to move data instead of the GPU’s DMA engine, then there’s no interference in the path between the CPU and GPU. Our use of DMA engines on local NVMe drives vs. the GPU’s DMA engines increased I/O bandwidth to 13.3 GB/s, which yielded around a 10% performance improvement relative to the CPU to GPU memory transfer rate of 12.0 GB/s shown in Table 1 below.

The technologies are similar in that they both work to provide data to the GPU but the similarities kind of end there. HBCC creates a tiered VRAM/SDRAM cache and simply requests data the traditional way, but intelligently manages this cache. GPU Direct Storage allows the data on, what I think is any device with a DMA engine, to directly write to GPU's storage.

#5825119

sykozis

2020-09-03 02:33

Denial:

The SSG with the 1TB storage was kind of similar - it had a PCI-E switch on it and essentially bypassed the CPU to write data directly to the SSD - but it's not like Windows could see that drive or you could install games to it.

I'd like to see a solution where an SSD is installed on the graphics card and accessible by Windows.....

#5825127

user1

2020-09-03 03:50

Denial:

HBCC creates what AMD calls a HBC (High Bandwidth Cache) which resides in both VRAM/SDRAM in a tiered hierarchy, with VRAM as the last level cache. If the GPU requires an asset that's outside of this cache, the controller can request the CPU to fetch it and pull it within the HBC, than the GPU can utilize it. So while it can request data from any location, the data is moved into the HBC first and it's all done by the CPU. It's really not that much different than how GPUs worked prior to HBCC, but HBCC creates storage tiers and manages pages/swaps/etc for the developer. https://www.reddit.com/r/Amd/comments/7x552w/exploring_vega_hbcc_and_its_effect_on_the_system/ This post does a good job investigating the effects of HBCC on the CPU. _ GPUDirect Storage on the other hand allows the DMA on the NVMe drive to push the request data directly into the GPU's memory, bypassing both system memory, the CPU and the GPU's DMA engine entirely. I think this section from Nvidia explains it pretty well: The technologies are similar in that they both work to provide data to the GPU but the similarities kind of end there. HBCC creates a tiered VRAM/SDRAM cache and simply requests data the traditional way, but intelligently manages this cache. GPU Direct Storage allows the data on, what I think is any device with a DMA engine, to directly write to GPU's storage.

thing is that accessing system memory in anyway requires using the cpu, its not really useful to show that turning on hbcc uses more cpu energy/sycles since fundamentally there is no other way to access that memory, the fact that the SSG variant has its own ssd it can read from via pcie, is managed by the HBCC, and the slides show network access , pcie ,xdma ect, strongly suggests that it is doesn't have to talk to the cpu inorder to use storage as a cache. kinda like how amd used to use xdma engines for crossfire over the pcie bus without cpu involvement. also found this slide from the SSG press release [spoiler] https://pics.computerbase.de/7/9/3/5/2/1-630.3959041560.png [/spoiler] so the question remains whether the inclusion of the cpu block in this diagram for accessing "storage", is due to no apis/os support , or a hard limitation.

#5825169

wavetrex

2020-09-03 07:20

Don't forget that GPU is physically connected to the CPU... the 16 lanes come from the CPU's I/O area (internal North Bridge), and in case of Zen 2, it's a dedicated die. Even if the GPU accesses the SSD -directly-, without involving the CPU cores, it will still happen through the CPU I/O (but not through execution of CPU code)

#5825191

Monolyth

2020-09-03 08:54

This is a pretty big game changer regardless of who got there first. It may not be as sexy as ray tracing to demo but this kind of tech will be the unsung hero as textures get ever larger over the foreseeable future. And I agree that we will probably see it sooner than we expect. These kinds of low level features and enhancements can be added without necessarily altering core storage access APIs.

#5825287

NewTRUMP Order

2020-09-03 15:09

Excuse my ignorance on the subject but can someone tell me how much of a difference it makes from the other way of going thru the cpu? Seconds, miliseconds, can / can't tell the difference while gaming? Will it give you an edge over someone online using the cpu method? Is this a game changer, pardon the pun, or who cares?

#5825291

Mufflore

2020-09-03 15:28

NewTRUMP Order:

Excuse my ignorance on the subject but can someone tell me how much of a difference it makes from the other way of going thru the cpu? Seconds, miliseconds, can / can't tell the difference while gaming? Will it give you an edge over someone online using the cpu method? Is this a game changer, pardon the pun, or who cares?

I saw mention it is 10% faster getting the data directly to the GPU vs going through ram+using the CPU. Plus there will be benefits from using less CPU and ram bandwidth/space.

#5825302

Fox2232

2020-09-03 15:46

NewTRUMP Order:

Excuse my ignorance on the subject but can someone tell me how much of a difference it makes from the other way of going thru the cpu? Seconds, miliseconds, can / can't tell the difference while gaming? Will it give you an edge over someone online using the cpu method? Is this a game changer, pardon the pun, or who cares?

Online games usually preload all data for given level (loading screen with progress bar for each player). That means, no benefit at all unless everyone has same loading capability. (Except of feeling that you was fastest.) But there are games which take like 5~8 seconds to load even from NVMe as CPU is limiting factor. Would there be no CPU bottleneck, such game would load within second. Then there is compression ratio. Once GPU takes care of data, compression used can be better which will mean that even in situation where storage is limiting factor, more data will be extracted per second. But problem is again with people who have no access to this decompression. So it either has to have dynamic compression decided on per system basis, or decompression can't exceed reasonable CPU requirements.

#5825367

mbk1969

2020-09-03 18:32

So how many people in the world have NVMe disks in their rigs? 100%?

#5825384

Astyanax

2020-09-03 19:04

Mufflore:

I saw mention it is 10% faster getting the data directly to the GPU vs going through ram+using the CPU. Plus there will be benefits from using less CPU and ram bandwidth/space.

well its more the fact the current method uses the cpu for decompression which adds latency to getting the data onto the gpu.

#5825398

Astyanax

2020-09-03 19:24

GPUDirect Storage is not RTX IO, RTX IO is derived from it to a degree but where as GPDS is a full stack nvidia implementation, RTX IO cuts out the front end and replaces it with MSDS API.

#5825423

Caesar

2020-09-03 20:39

Astyanax:

GPUDirect Storage is not RTX IO, RTX IO is derived from it to a degree but where as GPDS is a full stack nvidia implementation, RTX IO cuts out the front end and replaces it with MSDS API.

Human language please:D!

#5825453

Astyanax

2020-09-03 21:58

Caesar:

Human language please:D!

the machines using GPUDirect are on linux, nvidia and their device partners for the DGX systems are handling all the work themselves.