AMD Big Navi would see two product versions with different GDDR6 sizes (12- and 16 GB)

#5816439

Astyanax

2020-08-09 18:32

devastator:

i meant vram on gpu

and my point stands.

#5816441

Fox2232

2020-08-09 18:45

devastator:

can u have 10gb or 12gb on a 256bit bus by doing some ram chips at 2gb per chip and 1gb per chip like 4x2gb and 4x1gb giveing 12gb or 2x2gb then 6x1gb which give 10gb??????

Astyanax:

not on a pc where you don't have metal access to ram usage.

Sure you can. GTX 970 is example. (Bit different, but still.) Memory is accessed by GPU. Problem is with address overlap. In case you have 8 memory chips, you have full bandwidth while filling 1st 8GB. But then you'll end up with 4 chips that still have 1GB free each. And that would be accessed at half speed. That's undesirable situation. That is actually undesirable design choice in XSX. But they can make 20GB version (10x 2GB) and it would have max bandwidth from 1st to last MB.

#5816448

devastator

2020-08-09 19:25

Fox2232:

Sure you can. GTX 970 is example. (Bit different, but still.) Memory is accessed by GPU. Problem is with address overlap. In case you have 8 memory chips, you have full bandwidth while filling 1st 8GB. But then you'll end up with 4 chips that still have 1GB free each. And that would be accessed at half speed. That's undesirable situation. That is actually undesirable design choice in XSX. But they can make 20GB version (10x 2GB) and it would have max bandwidth from 1st to last MB.

so u cant have 2 diff ram sizes then have to be all the same size shame u could different confit that way without changeing memory bus to say 256bit to 320bit or even 384bit

#5816459

Saabjock

2020-08-09 20:03

How did the 5700XT stack up against the GTX1080TI in actual gameplay? The Ti had a bigger game buffer at 11gb of GDDR5x versus 8gb of GDDR6. How much faster is the XT?

#5816472

ttnuagmada

2020-08-09 21:37

Saabjock:

How did the 5700XT stack up against the GTX1080TI in actual gameplay? The Ti had a bigger game buffer at 11gb of GDDR5x versus 8gb of GDDR6. How much faster is the XT?

The 1080ti is generally 5% faster give or take depending on resolution.

#5816476

Fox2232

2020-08-09 22:13

Saabjock:

How did the 5700XT stack up against the GTX1080TI in actual gameplay? The Ti had a bigger game buffer at 11gb of GDDR5x versus 8gb of GDDR6. How much faster is the XT?

VRAM amount is of no importance unless you run out of it. On 4K, tehre are few games that proven to hit VRAM capacity limitation somewhere between 4GB and 6GB. Having more VRAM results in more data being cached, but SSDs and ancient LoD technology have no problems with caching. Worst thing that happens is texture LoD pop in when you have enough of VRAM. Insufficient amount of VRAM results in drastic reduction of fps. And neither 5700 XT or 1080 Ti runs out of VRAM capacity in games today. What makes practical difference between 5700 XT and 1080 Ti memories is bandwidth. 1080Ti has 484GB/s and 5700 XT 448GB/s. That's 8%. Then there is latency difference which is likely bit better with GDDR6 (5700 XT). Then there is GPU which processes data. From raw power perspective 1080 Ti has some 15% higher TFLOPs. On 1080p, this 8% higher memory band width and 15% raw horse power 1080 Ti has results in only 7% higher average performance. On 1440p it is 9% and on 4K 12%, but neither of those is suitable for 4K gaming anyway. But then comes real problem of approximation of performance based on official TFLOPs. On nV's side of things, card like 2070 has official boost 1620MHz on GPU, but most of models boost 1800MHz+ out of the box and some even over 2GHz. And therefore when one looks at benchmarks and tries to approximate actual performance per TFLOP, it is quite necessary to remember that official numbers do not match reality. (In this case 2070 has official 7.5TFLOPs of FP32, but in reality most of them come as 9.2TFLOPs of FP32.) When this is applies to 1080 Ti, official boost is 1582, most of them have 1950MHz boost out of the box which gives 1080 Ti peak performance of 14TFLOPs of FP32. (43% more than RX 5700 XT.) As far as official and out of the box boost for RX 5700 XT goes... Offical is 1905MHz, actual is around 1930MHz for reference model. OCed to around 2030~2050MHz without going crazy on voltage. So, TFLOPs are not understated or overstated much and therefore actual fps per TFLOPs is much more accurate. Pascal is definitely way behind RDNA1/Turing. And from my POV, Turing is pretty close to RDNA1 in performance per TFLOP. In some cases it is better, in some cases it is worse. Would Turing not gone for RT/Tensors, it would shine in traditional rendering while being cheaper. Those upcoming GPUs on both sides are surely going to be exciting and will be lovely to overanalyze while judging performance per transistor and clock. Performance per TMU/ROP/SM/... and clock. And performance per Watt too. And while I do not think this generation is going to be prime for DX-R, its performance should and will be considered this time around too.

#5816640

JamesSneed

2020-08-10 15:05

Fediuld:

Maths 😛 2080Ti is 35%-40% faster than 40CU RDNA1. (5700XT) If AMD goes for 50% per watt keeping the same power consumption on 40CU RDNA2 that means it would be +15%-10% faster than the 2080Ti. An 80CU RDNA 2 would be double the performance of the 40CU RDNA2 so +130% to 120% faster (over twice as fast) than the 2080Ti. Not 50% 😛

I think they were trying to say 7nm EUV would be allowing more improvements than the 50% perf per watt from the architecture related changes. I'm not going to even care to correct the math since its speculative math based on rumors so its pointless anyhow. I think its safe to say based off all the rumors if AMD makes a 80+ CU 505nm2 chip it is faster than the 2080 TI even without process or architecture improvements to the 5700XT.

#5816724

tunejunky

2020-08-10 18:32

a couple of thoughts on AMD's (hopeful) price structuring; 1) AMD is the only gpu manufacturer with the experience from owning fabs. this is important because of their early and close relationship with TSMC. in just plain old brass tacks this means that the uArch hits the road with deep insights as to the pluses and minuses of the process as the process matures enabling far greater efficiencies at the node. meaning less wasted silicon, meaning lower costs. even a 1% (which is substantial) improvement in yield, yields greater profit at lower costs at a far greater rate than the 1% at manufacture as it is compounded at each step. and remember we're talking about (hopefully for AMD) millions of chips from thousands of wafers. 2) because of Nvidia's determination to be a upmarket brand (indisputable), this leaves PLENTY of room at the high end (aka big navi) for a manufacturer with a lower cost of manufacture (also indisputable) like AMD. AMD can have a real world high end card at a price point well below a 3080 (or Ti). it really doesn't matter if it 100% of the performance of a 3080 (or better) if it is within the price per performance envelope. the reality (not "halo" products) is each card at a lower price point sells more in volume than the model above. that is a universal constant in gpu sales. 3) i hope as a tech nerd that the biggest Navi is around $750 and out performs a 2080 super by 20(ish)+ percent. if it does better than that it will be a huge hit. if it does at least that it will be a welcome relief to the marketplace. the cut down model could be reasonably priced around $600.00. this would put a lot of pressure on Nvidia even if the Ampere series is better on paper simply because Nvidia isn't about to sell their product for less for branding reasons (RTX series). at some point gamers will realize their "holy grail" doesn't entail spending over 1k as the nodes shrink and the number of manufacturers expand. Intel will have a lot to say, but only if it treats the 1st gen gpu's as "loss leaders" rather like microsoft with the 1st Xbox.

#5816811

deksman2

2020-08-10 22:28

Fediuld:

Maths 😛 2080Ti is 35%-40% faster than 40CU RDNA1. (5700XT) If AMD goes for 50% per watt keeping the same power consumption on 40CU RDNA2 that means it would be +15%-10% faster than the 2080Ti. An 80CU RDNA 2 would be double the performance of the 40CU RDNA2 so +130% to 120% faster (over twice as fast) than the 2080Ti. Not 50% 😛

I actually revised my math a bit since that post because I wasn't fully thinking about it. However, 2080ti is actually 50% faster than 5700xt at 4k... so keeping that in mind (along with 50% performance per watt improvement for RDNA 2, and 7nm+ EUV allowing for 20% higher density and 15% better efficiency/or 10% higher performace), this is the following: It could mean that RDNA 2 at 40CU's would have the same performance of 2080ti at 115W (at 4K). 48CU RDNA 2 would be 20% more powerful at 161W 60CU RDNA 2 would be 50% more powerful at 230W 72CU RDNA 2 would be 80% more powerful at 299W 80CU RDNA 2 would be 100% more powerful at 345W. The above values are assuming RDNA 2 gets 10% IPC boost (and AMD drops frequencies by about 20% - essentially IPC enhancements keep the performance - without clock reductions the above power consumption values go all the way up to 540W for 80CU version - because power consumption increases exponentially)... if not, then frequencies would still need to drop about 20% (which would drop performance by about 10% across the board) for the above TDP values to remain the same (resulting in: 48CU RDNA 2 being 10% more powerful, 60CU RDNA 2 being 40% more powerful, 72CU RDNA 2 being 70% more powerful, and 80CU RDNA 2 being 90% more powerful than 2080ti - all this at 4k). If the increase in performance per watt however also includes node enhancements... then it would certainly affect the above values... by how much... not entirely sure, but maybe something like this: 48CU RDNA 2 would in that case be more like equivalent of 2080ti at 4k and 161W (with IPC improvement of 10%... if not, then 10% slower than 2080ti at 161W). 60CU RDNA 2 would be about 25% faster at 230W (with IPC improvement of 10%... if not, then 15% faster) 72CU RDNA 2 would be about 50% faster at 299W (with IPC improvement of 10%... if not, then 40% faster) 80CU RDNA 2 would be about 80% faster at 350W (with IPC improvement of 10%... if not, then 70% faster). That's even assuming that AMD decides to use 72-80CU GPU's (and there's also the fact AMD will have raytracing too on these new GPU's). Instead of being twice as big as 5700XT, the 80CU GPU's would be about 80% larger on the EUV node... and probably expensive.

#5816828

Fediuld

2020-08-10 23:51

JamesSneed:

I think they were trying to say 7nm EUV would be allowing more improvements than the 50% perf per watt from the architecture related changes. I'm not going to even care to correct the math since its speculative math based on rumors so its pointless anyhow. I think its safe to say based off all the rumors if AMD makes a 80+ CU 505nm2 chip it is faster than the 2080 TI even without process or architecture improvements to the 5700XT.

True, I didn't even calculated RDNA1 to RDNA2 changes also. Just an estimate. TSMD N7+ allows for 17% smaller dies so I wouldn't be surpriced if the 80CU (if ever becomes true) to be around 415-420mm2.

#5816838

Fox2232

2020-08-11 00:30

deksman2:

I actually revised my math a bit since that post because I wasn't fully thinking about it. However, 2080ti is actually 50% faster than 5700xt at 4k... so keeping that in mind (along with 50% performance per watt improvement for RDNA 2, and 7nm+ EUV allowing for 20% higher density and 15% better efficiency/or 10% higher performace), this is the following: It could mean that RDNA 2 at 40CU's would have the same performance of 2080ti at 115W (at 4K). 48CU RDNA 2 would be 20% more powerful at 161W 60CU RDNA 2 would be 50% more powerful at 230W 72CU RDNA 2 would be 80% more powerful at 299W 80CU RDNA 2 would be 100% more powerful at 345W. The above values are assuming RDNA 2 gets 10% IPC boost (and AMD drops frequencies by about 20% - essentially IPC enhancements keep the performance - without clock reductions the above power consumption values go all the way up to 540W for 80CU version - because power consumption increases exponentially)... if not, then frequencies would still need to drop about 20% (which would drop performance by about 10% across the board) for the above TDP values to remain the same (resulting in: 48CU RDNA 2 being 10% more powerful, 60CU RDNA 2 being 40% more powerful, 72CU RDNA 2 being 70% more powerful, and 80CU RDNA 2 being 90% more powerful than 2080ti - all this at 4k). If the increase in performance per watt however also includes node enhancements... then it would certainly affect the above values... by how much... not entirely sure, but maybe something like this: 48CU RDNA 2 would in that case be more like equivalent of 2080ti at 4k and 161W (with IPC improvement of 10%... if not, then 10% slower than 2080ti at 161W). 60CU RDNA 2 would be about 25% faster at 230W (with IPC improvement of 10%... if not, then 15% faster) 72CU RDNA 2 would be about 50% faster at 299W (with IPC improvement of 10%... if not, then 40% faster) 80CU RDNA 2 would be about 80% faster at 350W (with IPC improvement of 10%... if not, then 70% faster). That's even assuming that AMD decides to use 72-80CU GPU's (and there's also the fact AMD will have raytracing too on these new GPU's). Instead of being twice as big as 5700XT, the 80CU GPU's would be about 80% larger on the EUV node... and probably expensive.

You are confused on so many levels, that after reading your post I have no strengths left at this hour to even begin stating your errors.

#5816855

Quakeme666

2020-08-11 01:42

All of this does not matter if game developers keep making crap games. COD season 5 for example:

#5816857

deksman2

2020-08-11 02:08

Fox2232:

You are confused on so many levels, that after reading your post I have no strengths left at this hour to even begin stating your errors.

AMD stated that RDNA 2 will have 50% more performance per watt. However, they mentioned nothing about using 7nm+ EUV for Zen 3 or RDNA 2. In fact, they will be using N7P which is enhancement of existing 7nm and offers about 5-7% more performance or 10% greater efficiency. That means that without any modifications to the clocks, the RDNA2 40CU version would equal 2080ti at 4K with TDP of 202.5W Power consumption jumps exponentially compared to % of clock improvements... so, 10% performance gain requires roughly 20% clock speed increase, resulting in 40% increased power consumption (similar if you want to improve power efficiency). Taking that in mind, and RDNA 2 NOT getting 10% IPC improvements, I get the following: 40CU RDNA 2 = 10% slower than 2080ti at 4K with TDP of 121.5W 48CU RDNA 2 = 10% faster than 2080ti at 4K with TDP of 170.1W 60CU RDNA 2 = 40% faster than 2080ti at 4k with TDP of 243W 72CU RDNA 2 = 70% faster than 2080ti at 4k with TDP of 315.9W 80CU RDNA 2 = 90% faster than 2080ti at 4K with TDP of 364.5W If RDNA 2 gains 10% in IPC, that would increase performance by 10% on the above values at same (estimated) TDP... however, if that happens (and depending on which TDP values they target), AMD could further reduce power consumption about 20% by dropping the clocks by about 10%, resulting in 5% performance gain.... which would actually put 80CU RDNA 2 at 95% faster than 2080ti at 4k with TDP of 291.6W (or effective 300W if you want to round it up). That's assuming AMD decides to even use 72 or 80CU GPU's in the first place. There's also the fact AMD will have Raytracing in RDNA2 which could further impact available space on the GPU and available resources.

#5816900

Fox2232

2020-08-11 08:23

deksman2:

AMD stated that RDNA 2 will have 50% more performance per watt. However, they mentioned nothing about using 7nm+ EUV for Zen 3 or RDNA 2. In fact, they will be using N7P which is enhancement of existing 7nm and offers about 5-7% more performance or 10% greater efficiency. That means that without any modifications to the clocks, the RDNA2 40CU version would equal 2080ti at 4K with TDP of 202.5W

Completely false. Performance per watt improvements did always include manufacturing node and final clock. 40CUs will not magically gain 50% more performance without 50% higher clock which is out of question. Performance per watt improvement in this case means that 40CUs in RDNA2 will have some 33% lower power draw at reference clock which is 1.75GHz.

deksman2:

Power consumption jumps exponentially compared to % of clock improvements... so, 10% performance gain requires roughly 20% clock speed increase, resulting in 40% increased power consumption (similar if you want to improve power efficiency).

This is again complete misconception of things. Power draw goes up in linear fashion with clock. What you ignore is voltage which may or may not need increase. (10% higher clock in case of need for 10% higher voltage would result in 21% higher power draw. But chances that AMD would move from 1.2V limit RDNA1 has to 1.32V with RDNA2 is rather low.) And performance goes up in linear fashion with clock as long as you can provide sufficient memory bandwidth because everything in GPU is faster including caches. Non-linear scaling comes from having more CUs due to scheduling complexity increase without increasing clock of caches, command processor, ... .

deksman2:

Taking that in mind, and RDNA 2 NOT getting 10% IPC improvements, I get the following: 40CU RDNA 2 = 10% slower than 2080ti at 4K with TDP of 121.5W 48CU RDNA 2 = 10% faster than 2080ti at 4K with TDP of 170.1W 60CU RDNA 2 = 40% faster than 2080ti at 4k with TDP of 243W 72CU RDNA 2 = 70% faster than 2080ti at 4k with TDP of 315.9W 80CU RDNA 2 = 90% faster than 2080ti at 4K with TDP of 364.5W

Its all completely baseless. It should be enough to put it into perspective with 40CUs example you made: What you state is that 40CUs in RDNA2 will now provide 40% higher performance than 40CUs of RDNA1 while eating 46% less power. That is 2.6 times higher power efficiency than RDNA1 has. (Complete absurdity considering that you actually believe in 10% IPC gain. As rest of performance gain would have to come from 27% higher clock than RDNA1 has in 5700 XT.)

deksman2:

If RDNA 2 gains 10% in IPC, that would increase performance by 10% on the above values at same (estimated) TDP... however, if that happens (and depending on which TDP values they target), AMD could further reduce power consumption about 20% by dropping the clocks by about 10%, resulting in 5% performance gain.... which would actually put 80CU RDNA 2 at 95% faster than 2080ti at 4k with TDP of 291.6W (or effective 300W if you want to round it up).

IPC increase does not go hand in hand with power efficiency increase as it very often comes at cost of adding a lot of transistors that do more work per FLOP than before.

deksman2:

That's assuming AMD decides to even use 72 or 80CU GPU's in the first place. There's also the fact AMD will have Raytracing in RDNA2 which could further impact available space on the GPU and available resources.

Transistors take space, therefore there is no "could impact", it will have impact on space. And adding extra logic costs extra transistors in parts of GPU that take care of scheduling and moving data around. Nothing is for free as parts are connected into processing chains.

#5817034

JamesSneed

2020-08-11 16:36

Fox2232:

Completely false. Performance per watt improvements did always include manufacturing node and final clock. 40CUs will not magically gain 50% more performance without 50% higher clock which is out of question. Performance per watt improvement in this case means that 40CUs in RDNA2 will have some 33% lower power draw at reference clock which is 1.75GHz. This is again complete misconception of things. Power draw goes up in linear fashion with clock. What you ignore is voltage which may or may not need increase. (10% higher clock in case of need for 10% higher voltage would result in 21% higher power draw. But chances that AMD would move from 1.2V limit RDNA1 has to 1.32V with RDNA2 is rather low.) And performance goes up in linear fashion with clock as long as you can provide sufficient memory bandwidth because everything in GPU is faster including caches. Non-linear scaling comes from having more CUs due to scheduling complexity increase without increasing clock of caches, command processor, ... . Its all completely baseless. It should be enough to put it into perspective with 40CUs example you made: What you state is that 40CUs in RDNA2 will now provide 40% higher performance than 40CUs of RDNA1 while eating 46% less power. That is 2.6 times higher power efficiency than RDNA1 has. (Complete absurdity considering that you actually believe in 10% IPC gain. As rest of performance gain would have to come from 27% higher clock than RDNA1 has in 5700 XT.) IPC increase does not go hand in hand with power efficiency increase as it very often comes at cost of adding a lot of transistors that do more work per FLOP than before. Transistors take space, therefore there is no "could impact", it will have impact on space. And adding extra logic costs extra transistors in parts of GPU that take care of scheduling and moving data around. Nothing is for free as parts are connected into processing chains.

Man you really took some time out to answer everyone 🙂 I am going on record saying I truly think the full Navi 21 die will have 96 CU's which everyone says I'm crazy but lets see. I think the die will have will have 4 CU's disabled per repetitive memory controller/CU block(4 controllers with 24CU's each) yielding an 80 CU chip that can be easily be mass produced. This is what AMD did with the 5700 XT as it has two repetitive blocks of memory controllers with 24 CU's each with 4 CU's disabled per block yielding a 40 CU GPU.

#5817042

Fox2232

2020-08-11 16:51

JamesSneed:

Man you really took some time out to answer everyone 🙂 I am going on record saying I truly think the full Navi 21 die will have 96 CU's which everyone says I'm crazy but lets see. I think the die will have will have 4 CU's disabled per repetitive memory controller/CU block(4 controllers with 24CU's each) yielding an 80 CU chip that can be easily be mass produced. This is what AMD did with the 5700 XT as it has two repetitive blocks of memory controllers with 24 CU's each with 4 CU's disabled per block yielding a 40 CU GPU.

As far as Navi 10 goes, all die shots show 20 dual CUs, therefore 40 CUs, not 48. RX 5700 XT is full GPU. 5700 is cut down.

#5817069

PrMinisterGR

2020-08-11 19:10

The 80CU RDNA 2.0 cards will be interesting to say the least. I'm curious about their ray tracing performance and what they'll do about DLSS.

#5817089

bobblunderton

2020-08-11 20:39

THANK YOU AMD! We so need to get off the 8gb Nvidia band-wagon here. This 2070 Super is awesome, fast, power efficient enough, and quiet, but it only has 8gb of VRAM. Trying to make an entire city fit in 8gb or even just sections of it fit into that, when a single tunnel can be 500mb or more (PLUS TEXTURES !) it would be nice to have double that. Otherwise you might as well just be watching a Hanna-Barbera chase scene with ever-repeating background scrolling by.

#5817223

PrMinisterGR

2020-08-12 10:17

bobblunderton:

THANK YOU AMD! We so need to get off the 8gb Nvidia band-wagon here. This 2070 Super is awesome, fast, power efficient enough, and quiet, but it only has 8gb of VRAM. Trying to make an entire city fit in 8gb or even just sections of it fit into that, when a single tunnel can be 500mb or more (PLUS TEXTURES !) it would be nice to have double that. Otherwise you might as well just be watching a Hanna-Barbera chase scene with ever-repeating background scrolling by.

Or have a super fast I/O subsystem thus not really caring about this. But this is console heresy and it's not taken well in these forums :P

#5817316

deksman2

2020-08-12 15:58

Completely false. Performance per watt improvements did always include manufacturing node and final clock. 40CUs will not magically gain 50% more performance without 50% higher clock which is out of question. Performance per watt improvement in this case means that 40CUs in RDNA2 will have some 33% lower power draw at reference clock which is 1.75GHz.

Actually, the manuf. process which AMD will use for RDNA 2 was stated to be N7P (non EUV). That node only allows 10% higher efficiency or up to 7% higher performance (one or the other). As for the rest... Well, AMD created 'enhanced Vega' for Renoir for which they managed to get a total of about 56% higher performance per core. Putting that into context, only about 15% of that performance gain came from clock increases (which went up by about 30%). Performance never scales linearly with clock increases. About remaining 40% of those performance enhancements came from uArch improvements - some of which AMD said they will use in RDNA 2.

This is again complete misconception of things. Power draw goes up in linear fashion with clock. What you ignore is voltage which may or may not need increase. (10% higher clock in case of need for 10% higher voltage would result in 21% higher power draw. But chances that AMD would move from 1.2V limit RDNA1 has to 1.32V with RDNA2 is rather low.) And performance goes up in linear fashion with clock as long as you can provide sufficient memory bandwidth because everything in GPU is faster including caches. Non-linear scaling comes from having more CUs due to scheduling complexity increase without increasing clock of caches, command processor, ... .

Contrary to what you think, power consumption goes up EXPONENTIALLY with clock increases. I don't 'ignore' anything... AMD has a history of not optimizing voltages on their GPU's in order to increase the number of functional dies... If they do optimize the voltages, all the better... I was giving conservative estimates for worst case scenario.

Its all completely baseless. It should be enough to put it into perspective with 40CUs example you made: What you state is that 40CUs in RDNA2 will now provide 40% higher performance than 40CUs of RDNA1 while eating 46% less power. That is 2.6 times higher power efficiency than RDNA1 has. (Complete absurdity considering that you actually believe in 10% IPC gain. As rest of performance gain would have to come from 27% higher clock than RDNA1 has in 5700 XT.)

And you completely missed my point. 40CU's would be 50% higher performance than 5700XT at SAME power consumption (225w) N7P node allows 10% higher efficiency resulting in that power consumption to go down to 202.5W If you drop frequencies by 20% (which drops performance by 10% and power consumption by roughly 40%), you arrive at about 121W TDP at 40% higher performance than 5700XT. Scaling up from there with more CU's (depending on how much AMD adds), increases both clocks and power consumption. Now, CU's apparently don't add as much to power consumption as do frequency increases... but they DO add something (about 5%)... whereas clock increases result in exponential increase in power. So, bearing that in mind, my power and performance estimates would result in a rough ballpark (and ironically enough, certain rumors which were released to now agree with those estimates). Of course, I keep an open mind to the possibility I am horribly wrong... it was just a late night musing of mine, nothing else.

IPC increase does not go hand in hand with power efficiency increase as it very often comes at cost of adding a lot of transistors that do more work per FLOP than before.

IPC increase usually means you can get X % of performance at same frequencies (without increasing power consumption). That further depends on how the uArch is designed and modified to work on a given manuf. process and at which voltages.

Transistors take space, therefore there is no "could impact", it will have impact on space. And adding extra logic costs extra transistors in parts of GPU that take care of scheduling and moving data around. Nothing is for free as parts are connected into processing chains.

Of course nothing is for free. I took that into account.