AMD Might Cancel Implicit Primitive Shader Driver Support

Neo Cyrus · 2018-01-24 00:35

Denial:

Maxwell supports async compute, it's implementation just wasn't as robust as AMDs specific definition, so I'm not sure how that relates.

Really? The last explanation of it I heard seemed to say Maxwell does jackshit as far as actual async compute goes.

Alessio1989 · 2018-01-24 00:59

Denial:

Maxwell supports async compute, it's implementation just wasn't as robust as AMDs specific definition, so I'm not sure how that relates.

An asynchronous command flush upon a static instruction scheduler, what a wonderful optimization.. Wait, it is completely useless in real-time application and it fights against driver optimizations? Never-mind.. [sorry, I couldn't resist..] No, it's not a conspiracy theory, "Pascal" was a huge improvement in "mixing" compute and graphics task.

user1 · 2018-01-24 01:14

Denial:

To be fair these features, both DSBR and Primitive Shaders require a lot of software development work, testing, etc. We have no idea what went on behind the scenes as far as RTG's budget and what got allocated towards the Zen project in the last few years. Typically it's about 2-3 years from initial design of a GPU architecture to the final shipping product. So Raja most likely planned these features with an expectation that he would have a software team capable of delivering them and then who knows what happened. I'm not even sure what RTG's way forward is at this point - I think AMD needs to be firing on all cylinders for it's CPU division to keep Zen competitive, especially now that Intel is tripping up. On the GPU side, I don't think they have a chance at beating Nvidia. I think their polaris strategy of just targeting the masses with mid/low tier cards at extremely competitive price points is really the only way forward because it's extremely safe. I know people are hoping for a crazy MCM setup on the GPU side but I think the engineering effort both software/hardware is not something AMD can afford right now, if anything goes wrong and the product is a failure it's game over. It's going to be interesting to see how it all plays out.

All of this smells like what happens when you don't have enough budget and time to meet performance expectations, seems like they tried a radical approach inorder to hit their performance targets, at high risk. less than a year ago they were still saying the NGG path had 10x the perf of the old path, so i can see why they did it. Ultimately based on the hints provided from the AMDVLK code dump, while the hw is not completely borked, there are bugs that make implementing primitive shaders and NGG difficult. Here are some examples

// When a transition from a legacy tessellation pipeline (GS disabled) to an NGG pipeline, the broadcast logic // to update the VGTs can be triggered at different times. This, coupled with back pressure in the SPI, can cause // delays in the RESET_TO_LOWEST_VGT and ENABLE_NGG_PIPELINE events from being seen. This will cause a hang. // NOTE: For non-nested command buffers, there is the potential that we could chain multiple command buffers // together. In this scenario, we have no method of detecting what the previous command buffer's last bound // pipeline is, so we have to assume the worst and insert this event. https://github.com/GPUOpen-Drivers/pal/blob/28a98ba3e787278dad958afd2cadbdabf28bacfc/src/core/hw/gfxip/gfx9/gfx9WorkaroundState.cpp // There is a bug where the WD will page fault when it writes VGT_EVENTs into the NGG offchip control sideband // (CSB) because there is no page table mapped for the VMID that was left in an NGG pipeline state. // Since page tables are allocated by the kernel driver per-process, when the process is terminated the page // table mapping will be invalidated-and-erased. This will leave no page tables mapped for the current VMID, and // the WD request for a virtual memory address translation of the CSB buffer will consequently fail. // NOTE: This is not an issue for mid-command buffer preemption nor when another process immediately follows // this one with rendering work, as the kernel performs an invalidate-and-swap with the page tables, // instead of invalidate-and-erase. Since the NGG buffers are mapped into every page table, these cases // will not cause the same page fault. https://github.com/GPUOpen-Drivers/pal/blob/dev/src/core/hw/gfxip/gfx9/gfx9SettingsLoader.cpp

Denial · 2018-01-24 01:29

Alessio1989:

An asynchronous command flush upon a static instruction scheduler, what a wonderful optimization.. Wait, it is completely useless in real-time application and it fights against driver optimizations? Never-mind.. [sorry, I couldn't resist..] No, it's not a conspiracy theory, "Pascal" was a huge improvement in "mixing" compute and graphics task.

Neo Cyrus:

Really? The last explanation of it I heard seemed to say Maxwell does jackshit as far as actual async compute goes.

I never said it was an optimization - I said it supported it. It's support was geared towards specific applications and not gaming - but that's where AMD's marketing team came in. Ryan Smith from Anandtech wrote it best:

Moving to Maxwell, Maxwell 1 was a repeat of Big Kepler, offering HyperQ without any way to mix it with graphics. It was only with Maxwell 2 that NVIDIA finally gained the ability to mix compute queues with graphics mode, allowing for the single graphics queue to be joined with up to 31 compute queues, for a total of 32 queues. This from a technical perspective is all that you need to offer a basic level of asynchronous compute support: expose multiple queues so that asynchronous jobs can be submitted. Past that, it's up to the driver/hardware to handle the situation as it sees fit; true async execution is not guaranteed.

But then after AOTS the entire community became obsessed with the idea that when implemented the way AMD has it increases gaming performance. Which is fine but remember the whole complaint was that Nvidia had numerous whitepapers/slides/a statement that said Maxwell supported it (which again is technically true and even useful for some applications) - but because it didn't work specifically in games everyone continues to believe to this day that Maxwell never supported it - which is untrue. Regardless, TJ's analogy is still bad because Nvidia didn't go around touting Async Compute as a performance gain that would be enabled in a future driver update like AMD did with this. No one even knew what Async Compute was until AOTS beta came out nearly a year after Maxwell launched.

Venix · 2018-01-24 02:09

Is it my idea or when ever they want a feature to die they pass it to developer judgement to implement ?Might as well when that happens assume amd /intel/nvidia took x feature out on the field and shoot it with a shootgun .

Neo Cyrus · 2018-01-24 02:28

Denial:

it didn't work specifically in games everyone continues to believe to this day that Maxwell never supported it - which is untrue.

That makes it a lie on nVidia's part because async compute in games is the only thing 99.99% of us give a single fuck about and they know it. They knew people would think it's referring to games. It a lie without being a strictly technical lie, even more of a lie than their "4GB" nonsense.

sykozis · 2018-01-24 03:55

warlord:

Well no performance loss, but there will be no performance gain either. Vega owners are ****** as fury owners did after years. A wannabe enthusiast GPU "FINEWINE" edition. Many promises and useless features all over the place. AMD is a superior hype race officially. Neither Intel nor Nvidia lied so much ever before. Look at ryzen and vega slides. Then look at real performance numbers and satisfaction values. AMD chose the wrong way. They are gonna lose all RX and Ryzen users faster than the period took to acquire them. RIP.

Why is AMD going to lose Ryzen users? For most of us that bought into Ryzen, it performs and functions exactly as expected. My RX 470 has performed and functioned exactly as expected as well. This only affects Vega users. Has nothing to do with Ryzen CPUs or non-Vega GPUs.

NvidiaFreak650 · 2018-01-24 04:51

https://www.forum-3dcenter.org/vbulletin/attachment.php?attachmentid=62423&d=1516745733 Source: https://translate.googleusercontent.com/translate_c?act=url&depth=1&hl=en&ie=UTF8&prev=_t&rurl=translate.google.com&sl=auto&sp=nmt4&tl=en&u=https://www.forum-3dcenter.org/vbulletin/showthread.php?p=11612392&usg=ALkJrhiPVlRecs7cPWOeGUUjx3UCaEbnNA#post11612392

Keesberenburg · 2018-01-24 04:52

Primitive? We dont need 10.000bc shaders.

user1 · 2018-01-24 06:23

NvidiaFreak650:

[spoiler]https://www.forum-3dcenter.org/vbulletin/attachment.php?attachmentid=62423&d=1516745733[/spoiler] Source: https://translate.googleusercontent.com/translate_c?act=url&depth=1&hl=en&ie=UTF8&prev=_t&rurl=translate.google.com&sl=auto&sp=nmt4&tl=en&u=https://www.forum-3dcenter.org/vbulletin/showthread.php?p=11612392&usg=ALkJrhiPVlRecs7cPWOeGUUjx3UCaEbnNA#post11612392

Well based on that, might have just been a miscommuniction some where down the line. interesting that they mention that wolfenstiens compute shaders can achieve the same effect, perhaps it wont matter anyway.

McFly121 · 2018-01-24 08:51

-Tj-:

Idk, it is shaddy no denying that.. But then again Nvidia promoted async compute with maxwell and you saw how that turned out..

Yes Nvidia canceled async compute because their shaders are too more efficient than AMD and cannot compete directly 1:1 shader vs shader.Amd have more shaders for equally same performance.

Redemption80 · 2018-01-24 09:42

Timespy gained from Async being enabled as I think that used an implementation that wasn't designed just for AMD hardware. Ironically it was also faster than any other implementation, even on AMD GPU's. As for this, I'm sure some might be disappointed but I don't think it's lawsuit material, it is pretty damaging for AMD image wise as any future features that get announced will be met with cynicism. It also fuels the idea that AMD are lazy when it comes to drivers.

fantaskarsef · 2018-01-24 10:04

Redemption80:

As for this, I'm sure some might be disappointed but I don't think it's lawsuit material, it is pretty damaging for AMD image wise as any future features that get announced will be met with cynicism. It also fuels the idea that AMD are lazy when it comes to drivers.

I'm not sure. Miners don't care about such feautres. 🙄

Redemption80 · 2018-01-24 10:12

TBH, I don't think the mining craze has helped AMD's image either. I'm sure it has helped the bank account though.

Evildead666 · 2018-01-24 11:19

Denial:

To be fair these features, both DSBR and Primitive Shaders require a lot of software development work, testing, etc. We have no idea what went on behind the scenes as far as RTG's budget and what got allocated towards the Zen project in the last few years. Typically it's about 2-3 years from initial design of a GPU architecture to the final shipping product. So Raja most likely planned these features with an expectation that he would have a software team capable of delivering them and then who knows what happened. I'm not even sure what RTG's way forward is at this point - I think AMD needs to be firing on all cylinders for it's CPU division to keep Zen competitive, especially now that Intel is tripping up. On the GPU side, I don't think they have a chance at beating Nvidia. I think their polaris strategy of just targeting the masses with mid/low tier cards at extremely competitive price points is really the only way forward because it's extremely safe. I know people are hoping for a crazy MCM setup on the GPU side but I think the engineering effort both software/hardware is not something AMD can afford right now, if anything goes wrong and the product is a failure it's game over. It's going to be interesting to see how it all plays out.

The GPU's with the best margins are the Low/Midrange cards. They sell a lot more than the high end cards, and that generates a large amount of the revenue. The "Crazy MCM" setup is perfectly viable, and cheaper than a large monolithic GPU. Intel has probably paid AMD for their External dGPU for their chips. AMD can then re-use this, and pop 4 of them onto a single card. Each module has 1536 Shaders, assuming that is 100% of them. I think there might be 2048 Shaders on the Full version, as that would enable some leeway for yields. In any case, it would enable a ~6000-8000 shader card, with 16GB of VRAM. AMD won't be Game Over. They have investors, and their CPU's are doing very well. They are selling all the GPU's they can make atm. And things are only going to get better with the APU's arriving, and the Shrinks to Zen (and Vega, for the Pro's). Its going to be very interesting how it plays out 🙂 Yes.

Evildead666 · 2018-01-24 11:20

Redemption80:

TBH, I don't think the mining craze has helped AMD's image either. I'm sure it has helped the bank account though.

They are selling all they can make. 🙂 Thats good. The only problem is the price gouging, and that is not their fault. There are people out there making 100% profit on the resale of these cards....loony.

RooiKreef · 2018-01-24 12:42

Well this is sort of what I knew was coming... AMD normally come up with great ideas just like Nvidia... The difference is that developers jump on Nvidia features immediately while AMD features gets pushed to the side. Look at Mantle, AMD TrueAudio.... The thing is why must AMD spend more money and resources on something that the developers doesn't want to use??? It doesn't sound shady at all to me, just normal business practices...

Crazy Serb · 2018-01-24 15:40

And if we did not had mantle, we would never had mostly bad dx12 implementations and vulkan in its glory with doom. We would be stuck at 4 cores 'till 2030+. As for async on maxwell, nV maybe said that maxwell supports it, but maxwell was already out, so I dont get it while people still whine/talk about it. This was one of marketing points for vega, so obviously, this is far more shadier than maxwell's async (or even pascal for that matter). And, its not like that nV cards needs full hardware level async, I wont buy/recommend AMD cards ever again. For MCM, I dont think that AMD can deliver, because VEGA have insane power draw (since it is literally GCN1 4.0 or whatever with more "dead on arrival" features), and with weaker GPUs, they are going to loose perf/$ (in world without mining) because we all know how good SLI/Xfire scales. Scaling will be probably better because of faster link, but until we see changes how frames are rendered it will be still hit or miss. And if people expect from devs to optimize for MCM...

Fox2232 · 2018-01-24 15:55

Crazy Serb:

And if we did not had mantle, we would never had mostly bad dx12 implementations and vulkan in its glory with doom. We would be stuck at 4 cores 'till 2030+. As for async on maxwell, nV maybe said that maxwell supports it, but maxwell was already out, so I dont get it while people still whine/talk about it. This was one of marketing points for vega, so obviously, this is far more shadier than maxwell's async (or even pascal for that matter). And, its not like that nV cards needs full hardware level async, I wont buy/recommend AMD cards ever again. For MCM, I dont think that AMD can deliver, because VEGA have insane power draw (since it is literally GCN1 4.0 or whatever with more "dead on arrival" features), and with weaker GPUs, they are going to loose perf/$ (in world without mining) because we all know how good SLI/Xfire scales. Scaling will be probably better because of faster link, but until we see changes how frames are rendered it will be still hit or miss. And if people expect from devs to optimize for MCM...

Would not be 1st time AMD/ATi developed something good and it ended up in trash. Trueform was probably one of more important technologies. Happens time to time, you just do not notice because support is removed without telling community. Secondly: "I wont buy/recommend AMD cards ever again", very shortsighted unless you believe that AMD's GPU division is going to be closed soon. And says that you are bit weak-minded. People buy HW based on performance it delivers at time of purchase, not based on presumption that it will perform better in year. Your view on MCM as SLI/CF is wrong. Look at it as having Big GPU cut in half, and then connected again with interposer. it will work in exactly same way as whole chip with exception of tiny latency increase. If it is cut in similar way as Ryzen CPUs (interconnect buss), then there will be close to no performance impact in comparison to whole chip. => So, having BIG 8192 SP chip or 1 control chip + 4x 2048SP chips will prove to deliver same performance within margin of error. Actually that glued together MCM package may have more stable power delivery, better cooling and better clock.

Denial · 2018-01-24 17:19

Fox2232:

Your view on MCM as SLI/CF is wrong. Look at it as having Big GPU cut in half, and then connected again with interposer. it will work in exactly same way as whole chip with exception of tiny latency increase. If it is cut in similar way as Ryzen CPUs (interconnect buss), then there will be close to no performance impact in comparison to whole chip. => So, having BIG 8192 SP chip or 1 control chip + 4x 2048SP chips will prove to deliver same performance within margin of error. Actually that glued together MCM package may have more stable power delivery, better cooling and better clock.

I wouldn't say "exactly same way" when it requires a massive redesign in both hardware and drivers, to both cache hierarchy and the scheduler - the latter of which AMD has had notorious problems with. Nvidia also estimates it would require a minimum of ~768GB/s per link to approach a monolithic die's performance and that's if you included the architecture changes to optimize for the MCM setup and if the workload is highly paralizable with minimal memory intensity - aka pretty much the opposite of games. So yeah, while MCM is definitely the future it's not as simple as "just split it and performance is within margin of error" it's going to take a lot of work to get it to similar performance as monolithic dies and it's going to be easy to run into software compatibility problems (mostly related to scheduling) that are difficult to solve - which again, does not seem to be AMD's forte. It's definitely possible they will go the route regardless.. like if you had asked me about HBM on consumer cards as early as 2015, in 2013, I'd say that was crazy, clearly AMD proved me wrong - but I just don't think it's the best decision for them short term. Nvidia doesn't seem to be doing it until 2020+ and I don't think AMD needs to beat them to the punch at it.