AMD Could Do DLSS Alternative with Radeon VII through DirectML API

Alessio1989

2019-01-17 15:09

DirectML driver could potentially work on every Direct3D 12 hardware, not a big surprise it that it can be used on the upcoming radeon, and it can run on CPU side too. The question is: will AMD spend time and money to optimize DirectML meta-commands and it DLSS implementation on all it's hardware?

#5628835

schmidtbag

2019-01-17 15:23

Part of me feels they did this just to spite Huang's comment. Not that I'm complaining.

#5628850

lord_zed

2019-01-17 15:36

ATM after moving from AMD to NV it looks like this: NV comes up with Gsync and charges for it. AMD sees it and not wanting to be BEHIND introduces Freesync NV starts to push Ray Tracing. AMD sayw we can do this TO NV comes up with DLSS and AMD says we can do that TO It's hard for Me to think of a tech that AMD came up with that NV ripped off in last few years. Well I purchased 290x cause MAntle and True Audio ware things that AMD came up with. Played BF4 on Mantle and TrueAudio was DOA not played anythign that uses it. I guess this is what happens when You are a Leader on marker, You put some tech out and chasing competition NEEDS to adopt the technology or they will loose market %. I knew NV will keep Freesync option as ACE CARD when they will need it aka RTX does not sell as good as they expected.

#5628856

schmidtbag

2019-01-17 15:52

lord_zed:

It's hard for Me to think of a tech that AMD came up with that NV ripped off in last few years. Well I purchased 290x cause MAntle and True Audio ware things that AMD came up with. Played BF4 on Mantle and TrueAudio was DOA not played anythign that uses it. I guess this is what happens when You are a Leader on marker, You put some tech out and chasing competition NEEDS to adopt the technology or they will loose market %.

Not sure what your point is. Are you suggesting that these royalty-free alternatives are a problem? Are you suggesting these technologies are the only reason to buy a product? AMD doesn't typically come up with new technologies because they don't have the time and money to research such things. Their first priority is (or should be, anyway) to get something with good all-around performance. I don't see that as a problem. I appreciate Nvidia trying to push new technologies, but I personally have no interest in funding them if they're proprietary. But anyway, I'm pretty sure AMD knew Mantle was DOA before they even released it; it was supposed to be a proof of concept, at which it was a success. Thanks to Mantle, we have DX12 and Vulkan. As far as I'm concerned, that [so far] was a greater success than Raytracing or DLSS.

#5628867

dr_rus

2019-01-17 16:14

BlackZero:

Also, having it additionally run on CPU could be of huge benefit for older cards if they could run it concurrently on GPU and CPU.

Yeah, good luck running ML matrix multiplication load on CPU with any kind of satisfactory performance.

#5628868

tunejunky

2019-01-17 16:16

imho, AMD was probably pushed a tad by Google. they've been working very closely together on Project Stream...which atm is damn good (still beta). the racks and racks of servers all running Radeon Pro and running AC:Origins @1080p 80+ fps on a browser. my work computer is seeing as good of AC:Origins on a browser as from the ssd and a RX580...but then again i live very close to Google and the server farm. the "dlss" feature could be implemented through streaming with no performance hit (other than whatever latency you get from your ISP and the "distance" from the server).

#5628878

Alessio1989

2019-01-17 16:38

dr_rus:

Yeah, good luck running ML matrix multiplication load on CPU with any kind of satisfactory performance.

That's the last of the problem. GPU-CPU readback and then CPU-GPU upload is what it would make it unsuitable on gaming.

#5628882

Denial

2019-01-17 16:54

BlackZero:

Because sending a lot of 0s and 1s takes up huge amounts of CPU time and bandwidth?

It's the latency, it's syncing the data to the CPU, running code, sending it back to the GPU, recombining it with GPU data and doing all of that in 2-3ms before you render it out. It's a nightmare to code for and at best you get basically no performance because the copy to GPU/CPU takes longer then just letting the GPU spend another ms on the task and keeping it all there. It's the same reason why mGPU will never take off. It takes to long to transfer the data and when you only have 16ms to do it + recombine it's just not worth the effort. That's why they just do alternate frame rendering but that's basically broke with any interframe postprocess effect.

#5628889

xrodney

2019-01-17 17:33

Denial:

It's the latency, it's syncing the data to the CPU, running code, sending it back to the GPU, recombining it with GPU data and doing all of that in 2-3ms before you render it out. It's a nightmare to code for and at best you get basically no performance because the copy to GPU/CPU takes longer then just letting the GPU spend another ms on the task and keeping it all there. It's the same reason why mGPU will never take off. It takes to long to transfer the data and when you only have 16ms to do it + recombine it's just not worth the effort. That's why they just do alternate frame rendering but that's basically broke with any interframe postprocess effect.

Isn't this one of reasons AMD is working on HBCC and IF to be able share data with minimal latency? Plus I am pretty sure data transfer and operations between CPU and GPU takes microseconds and not milliseconds unless said operation take hundreds of clock cycles. Especially on Zen where through IF different resources could have direct access without need to wait for CPU cores to do all actions. I am not saying that CPU must be able to do it (performance) but latency between CPU, GPU, memory and cache should not be big deal unlike operations itself. Question here is about controlling stuff, but this is something that needs to be solved anyway to be able use chiplets in GPU and still be visible as one GPU unlike current SLI/CF and its something AMD is likely working on and Nvidia probably too. BTW, in big data and ERP multi-node systems you have server to server (each different physical frame) data latency in range of 100+ microseconds and that's for two systems that have to talk through network where network latency is bottleneck.

#5628890

Fox2232

2019-01-17 17:34

lord_zed:

ATM after moving from AMD to NV it looks like this: NV comes up with Gsync and charges for it. AMD sees it and not wanting to be BEHIND introduces Freesync NV starts to push Ray Tracing. AMD sayw we can do this TO NV comes up with DLSS and AMD says we can do that TO It's hard for Me to think of a tech that AMD came up with that NV ripped off in last few years. Well I purchased 290x cause MAntle and True Audio ware things that AMD came up with. Played BF4 on Mantle and TrueAudio was DOA not played anythign that uses it. I guess this is what happens when You are a Leader on marker, You put some tech out and chasing competition NEEDS to adopt the technology or they will loose market %. I knew NV will keep Freesync option as ACE CARD when they will need it aka RTX does not sell as good as they expected.

Read Direct3D change log for feature levels. That's not Microsoft's wish list. That's what has been developed in cooperation with AMD/intel/nVidia and game studios. Based on it being feasible for HW implementation down the road or HW already ready for such operations. Just because AMD is not jerking publicly each and every feat of technology/software does not mean they sit idle. Quite contrary, great number of revolutionary technologies which are actually important came from AMD. And not some petty: "Let's try raytracing again." or "new way to do image filter/upscaling" ... AMD's feats are more to the core of innovation itself. Here a bit: https://developer.amd.com/tools-and-sdks/ or https://www.amd.com/en/technologies/store-mi From latest years... HBM, interposers, chiplets, real working MCM for desktops. Going back AMD64, HSA, ... I wonder how would you play games on IA-64 processors. That extra compute performance required for raytracing. AMD pushed that kind of compute long time before nVidia in consumer market. As of TrueAudio. It has not been implemented, but it is technology which delivers exactly what it promises. I would prefer that in games instead of "Too little, too soon raytracing". Good audio realism provides better immersion than bit more accurate reflections of ugly objects.

#5628897

Alessio1989

2019-01-17 17:46

BlackZero:

Because sending a lot of 0s and 1s takes up huge amounts of CPU time and bandwidth? Anyway, if they could make it happen, it could be useful.

It's more about transfer time (plus eventual decoding) then computation time, especially from GPU to CPU, readback operation can become easily a bottleneck since they break rendering pipeline. Also, abuse of CPU to GPU upload can become a problem too, especially on discrete GPUs.

#5628899

-Tj-

2019-01-17 17:53

I rather have directML then dlss. At least when I saw that car reconstruction picture. The biggest reason is quality, unless you use 2x dlss to get over that upsampling , but then it's kind of a moot point - no perf boosts.. I saw really detailed review about dlss @ ffxv and to be honest it looked crap 90% of the time. The worst part was fence lines shimmering and some smeared pixels with loss of texture detail and even object detail in the distance.

#5628906

Denial

2019-01-17 18:18

xrodney:

Isn't this one of reasons AMD is working on HBCC and IF to be able share data with minimal latency?

Yes.

xrodney:

Plus I am pretty sure data transfer and operations between CPU and GPU takes microseconds and not milliseconds unless said operation take hundreds of clock cycles. Especially on Zen where through IF different resources could have direct access without need to wait for CPU cores to do all actions.

The latency would depend on the size of the data but it's not really relevant. In this case Microsoft found GPU processing on DirectML with metacommands on to be 275x faster then running it on the CPU. http://on-demand.gputechconf.com/siggraph/2018/video/sig1814-2-adrian-tsai-gpu-inferencing-directml-and-directx-12.html - @24 minutes into presentation - the entire presentation is good though and covers a lot stuff being said here. Point is even if the latency is only 100-200us to transfer the CPU, the GPU could have performed whatever operation that was sent to CPU multiple times over again.The more data you send the longer the time to get it back. It's simply never worth sending it there - especially with the order of magnitude in performance.

xrodney:

I am not saying that CPU must be able to do it (performance) but latency between CPU, GPU, memory and cache should not be big deal unlike operations itself. Question here is about controlling stuff, but this is something that needs to be solved anyway to be able use chiplets in GPU and still be visible as one GPU unlike current SLI/CF and its something AMD is likely working on and Nvidia probably too.

https://hps.ece.utexas.edu/people/ebrahimi/pub/milic_micro17.pdf They both are working on it but it requires massive amounts of bandwidth, changes to the scheduling, etc and even then it's still not scaling perfectly in terms of performance.

xrodney:

BTW, in big data and ERP multi-node systems you have server to server (each different physical frame) data latency in range of 100+ microseconds and that's for two systems that have to talk through network where network latency is bottleneck.

You have a source for 100 microseconds? Typically the latency between two multnode systems ~350-400us for the network alone - but admittedly it's been a while since I worked on anything like this (2011/12 @ RIT).

#5628911

holler

2019-01-17 18:27

i could really care less about DLSS, its image quality improvements are of questionable value. just because you can do it, doesn't mean you should...

#5628921

HWgeek

2019-01-17 18:49

Plus, how many Turings are out there? there are much more Pascal GPU's on the market (Bought new by gamers or 2nd hand ex-mining cards) so Pascal gamers wil left in the dark while New turing cards will enjoy the DirectML era, vs AMD's Vega GPU's that will get Free performance boost over 1080Ti.

#5628924

vbetts

Moderator

2019-01-17 19:00

HWgeek:

Plus, how many Turings are out there? there are much more Pascal GPU's on the market (Bought new by gamers or 2nd hand ex-mining cards) so Pascal gamers wil left in the dark while New turing cards will enjoy the DirectML era, vs AMD's Vega GPU's that will get Free performance boost over 1080Ti.

If DirectML is just DX12 API based, any DX12 capable cards would be compatible one would assume.

#5628925

Denial

2019-01-17 19:07

holler:

i could really care less about DLSS, its image quality improvements are of questionable value. just because you can do it, doesn't mean you should...

The quality of the image is based on a bunch of factors that are constantly improving. Look at neural net imaging from 5 years ago to today, it's night and day.

HWgeek:

Plus, how many Turings are out there? there are much more Pascal GPU's on the market (Bought new by gamers or 2nd hand ex-mining cards) so Pascal gamers wil left in the dark while New turing cards will enjoy the DirectML era, vs AMD's Vega GPU's that will get Free performance boost over 1080Ti.

Given Nvidia's market saturation and the time it will take AMD to pull together a GPU Open variant of DLSS on DirectML (Then train a few games and integrate it) Turing will most likely have shipped more cards then AMD with Vega.

vbetts:

If DirectML is just DX12 API based, any DX12 capable cards would be compatible one would assume.

Capable yes but the performance is kind of an unknown. Microsoft showed a 275x improvement over CPU on a Titan V on Tensor cores. Vega won't nearly get that much performance and Pascal doesn't packed FP16 so the performance will be even worse - if it's even worth doing at that point. Microsoft basically said under a certain point and you can't do it in real time reliably.

#5628926

dr_rus

2019-01-17 19:09

vbetts:

If DirectML is just DX12 API based, any DX12 capable cards would be compatible one would assume.

It's what was stated by MS during DirectML announcement. There's also no reason why Vega or any other cards would get any performance boost from DirectML.

#5628928

Cyberdyne

2019-01-17 19:14

Maybe I'm an idiot, but DirectML doesn't seem like some Tensor AMD response. Tensor cores are at their heart interesting for DLSS because the processing is done separately, and the idea is there is little/no performance impact as the Cuda cores continue to focus on the game.

#5628929

Exodite

2019-01-17 19:18

-Tj-:

The biggest reason is quality, unless you use 2x dlss to get over that upsampling , but then it's kind of a moot point - no perf boosts..

Honestly I think this is where Nvidia went wrong with DLSS, the marketing. The technology is quite interesting and I were more keen on DLSS than the whole RT paradigm when the RTX cards were announced, simply because RT isn't going to see wide adoption for quite a while. Unfortunately Nvidia chose to market DLSS as a performance uplift, in conjunction with upscaling, and compared to TAA no less - when postfx AA has been a curse inflicted on PC gaming rather than anything that actually benefits image quality. I believe Nvidia should have sold us on DLSS not as a performance enhancer at 4K but as a way to reintroduce SSAA for modern titles with a lower performance hit. Looks almost as good as TAA at 4K? Yeah, not impressed. Computationally cheap SSAA at 1080/1440p? Sold! Give me a way to help relegate postfx AA to the garbage bin of history and I'll start throwing money at the screen. 🙂