AMD Greenland Vega10 Silicon To Have 4096 Stream Processors?

Published by

Click here to post a comment for AMD Greenland Vega10 Silicon To Have 4096 Stream Processors? on our message forum
https://forums.guru3d.com/data/avatars/m/169/169957.jpg
That clock can't happen because we do not have cooling for it. And it was to support idea, that each 14 and 16nm will hit certain clock at given point in time and technological advancement, and OC will be very difficult to achieve. And that meant both AMD and nVidia are going to end up with much more similar GPU clocks than they did on 28nm. 25% clock difference is very improbable to happen again. (28nm limitation for both AMD and nVidia came from transistor density which limited stability, 14nm clock limitation will come from cooling) Now what I meant by nVidia having to do more work on die shrink. 980Ti: 8B transistors, 1350MHz clock Fiji: 8.9B transistors: 1050MHz clock We can say that 980Ti performs 5% better in average. Therefore: 980Ti perf = 1.05 * Fiji perf * [Fiji to GM200 perf per transistor per clock ratio] 8 * 1350 = 1.05 * 8.9 * 1050 [Fiji to GM200 perf per transistor per clock ratio] 1.1 = [Fiji to GM200 perf per transistor per clock ratio] In other words, if GM200 and Fiji had same clock and were limited to same transistor count, Fiji would deliver 10% higher performance. And that's while Fiji is well rounded GPU with no significant weak points like low compute performance, or pixel shader length limitation. Taking Fiji has higher performance per clock per transistor and having more transistors than GM200. Doing just die shrink and hitting similar clock for both chips would make 14nm Fiji much better. And as I stated before, that's why both companies continue to improve. AMD knows they have certain inherited advantage which they can play. You can see their confidence at presentations. But if they sit on their hands, nVidia will take that advantage from them as it is not that big.
Pretty sure 980Ti is 1200mhz stock, and again, it makes no sense to me that you compare at clock parity when a Fiji cores caps out at 1150 and maxwell can do 1500. Everything about next gen clocks is conjecture, we just assume they will be higher than 28nm, and Im agreeing that thermals will be an issue, I just have no idea how to follow the logic that leads you to making statements about Fiji vs maxwell. Yeah Fiji is faster per clock, it also has 1300 more SPs... And runs 400mhz slower
https://forums.guru3d.com/data/avatars/m/243/243702.jpg
Pretty sure 980Ti is 1200mhz stock, and again, it makes no sense to me that you compare at clock parity when a Fiji cores caps out at 1150 and maxwell can do 1500. Everything about next gen clocks is conjecture, we just assume they will be higher than 28nm, and Im agreeing that thermals will be an issue, I just have no idea how to follow the logic that leads you to making statements about Fiji vs maxwell. Yeah Fiji is faster per clock, it also has 1300 more SPs... And runs 400mhz slower
SP is just way AMD/nV invested transistors (same could be said: that GM200 has 32 more ROPs = 50% more than Fiji). Comparison is performance per transistor per clock, because those are limiting factors of manufacturing technologies. And I can be wrong, maybe GPU frequency will not be limited by heat. Because Heat can be adjusted partly by transistor density. But that equals cost. Edit: as for clocks a lot of 980Ti owners claimed here that their card boosts to 1350~1450 without them doing anything. But that's probably per manufacturer. Some shops list official nVidia material value, other list those higher boost clocks. Maybe you can disable OC and see how your card boosts.
data/avatar/default/avatar03.webp
Taking Fiji has higher performance per clock per transistor and having more transistors than GM200. Doing just die shrink and hitting similar clock for both chips would make 14nm Fiji much better.
Hitting similar clocks! Just like that... Does that mean that Nvidia gets ****load of shaders, bazillion TFLOPS, async compute, True Audio and whatnot... because clean slate!? Or is clean slate reserved only for clocks, and everything else remains relatively the same? If hitting same clocks with GCN was that easy, AMD would do it on 28HPM ya know...
Now what I meant by nVidia having to do more work on die shrink.
Nvidia having more to more work on die shrink? This is obviously wrong. Considering that not even using high-end HBM was enough to catch up with 980Ti, obviously AMD had much bigger homework to do. Kind of what NV has already done Kepler -> Maxwell.
980Ti: 8B transistors, 1350MHz clock Fiji: 8.9B transistors: 1050MHz clock We can say that 980Ti performs 5% better in average. Therefore: 980Ti perf = 1.05 * Fiji perf * [Fiji to GM200 perf per transistor per clock ratio] 8 * 1350 = 1.05 * 8.9 * 1050 [Fiji to GM200 perf per transistor per clock ratio] 1.1 = [Fiji to GM200 perf per transistor per clock ratio] In other words, if GM200 and Fiji had same clock and were limited to same transistor count, Fiji would deliver 10% higher performance.
Exactly. There is only that little if Taking design that clocks poorly, relatively speaking, and saying that all they need to do clock it same as that high-clock design, is beyond ridiculous.
https://forums.guru3d.com/data/avatars/m/169/169957.jpg
SP is just way AMD/nV invested transistors (same could be said: that GM200 has 32 more ROPs = 50% more than Fiji). Comparison is performance per transistor per clock, because those are limiting factors of manufacturing technologies. And I can be wrong, maybe GPU frequency will not be limited by heat. Because Heat can be adjusted partly by transistor density. But that equals cost.
I see what you mean, but it still makes no sense not to account for clocks when talking about relative performance. By that token a 390X is miles faster than a 980, yet it's consistently outperformed by it, especially at a moderate 1500mhz Clocks matter.
Hitting similar clocks! Just like that... Taking design that clocks poorly, relatively speaking, and saying that all they need to do clock it same as that high-clock design, is beyond ridiculous.
Something else I want to point out regarding the popular view that AMD somehow better support for future apis, even in ashes of the singularity, AMD's best case scenario, a 980Ti outperforms a fury X when overclocked, despite 'async' being detrimental to performance in this game. Clocks matter. AMD has gotten a lot better in many respects recently, doesn't say anything at all about nvidia
data/avatar/default/avatar15.webp
Where are you getting them 30-50% increase? What about intel 22-14 nm shift. increaces in perf were about 5%. Whats so different about GPU?
https://forums.guru3d.com/data/avatars/m/80/80129.jpg
Where are you getting them 30-50% increase? What about intel 22-14 nm shift. increaces in perf were about 5%. Whats so different about GPU?
On a GPU you can increase core count and expect graphics performance to scare linearly with it. On a CPU increasing core count does little to nothing in most applications. Also this is a larger node jump, from 28 --> 16nm.
https://forums.guru3d.com/data/avatars/m/243/243702.jpg
I see what you mean, but it still makes no sense not to account for clocks when talking about relative performance. By that token a 390X is miles faster than a 980, yet it's consistently outperformed by it, especially at a moderate 1500mhz Clocks matter.
Comparison is performance per transistor per clock, because those are limiting factors of manufacturing technologies.
I have never excluded clock. And I do not intend to. nVidia smartly used lower transistor density on 28nm and reached higher clocks. But this trick on 14/16nm will help only little. Because once over certain clock, cooling solution will not cope with heat made by GPU. And therefore no 25% higher clock for nVidia (or AMD) on 14/16nm. It is similar with Apple's A9. People complain that 14nm of exactly same transistor design runs hotter than 16nm. Yeah, why it should not? While both chips have exactly same nand gate sequences connected, to achieve exactly same result upon calling same instructions. Each factory uses different transistor layout and 14nm chip has 8% smaller surface to radiate/transfer heat. Who's to say that Samsung's 14nm A9 would not actually be as cold as TSMC's 16nm one if they reduced transistor density a bit. We do not have direct comparison.
Where are you getting them 30-50% increase? What about intel 22-14 nm shift. increaces in perf were about 5%. Whats so different about GPU?
No, clue who you are referring to, Quote button is your friend. But GloFo materials show that in worst case scenario (Maximum clock) they still deliver 27% higher performance per watt than they did on 28nm. GPUs are in no position to get to those clocks and that means power efficiency is to be expected much higher than this value in range between 1000~1600MHz than before. [youtube]9jlwvuqzhjg[/youtube] Now, those are power consumption of entire systems. Let's presume that AMD has as low CPU overhead as nVidia and therefore CPU eats same watts as CPU in nVidia's (while in reality CPU has to work bit more to feed AMD card). Pick your CPU+MB+HDD+FANs+... consumption 25 / 35 / 45 / 55? Entire nV system: 144W => 119W / 109W / 99W / 89W GPU? Entire AMD system: 85W => 60W / 50W / 40W / 30W GPU? I think you get idea of 28 vs 14/16nm expected power efficiency as it looks like it is at lease twice as good. With possibly 3 times as good?
https://forums.guru3d.com/data/avatars/m/80/80129.jpg
8.9B Transistors on a 596mm2 die for Fury X 8.0B Transistors on a 601mm2 die for Titan X. Idk, I get where everyone is coming from in general -- but I don't think heat density will be an issue. There is quite a large gap when going from WC to Air cooling on a GPU. My 980 went from like 78c loaded to 35c loaded. I'm sure they can improve air coolers further to match the heat dissipation necessary. And if not, just go with water cooling. Gamers will get over it. Fury X works fine with it, I haven't heard of any major issues and it was the first iteration. Biggest problem and prevention of larger GPU's is going to be manufacturing costs. Both companies are going to want profitable chips and they are going to want it in similar price ranges to what we see now. I think it's pretty clear that we see out of the demos of Polaris, is that we are going to end up with Fury X/980Ti levels of performance at ~120w/300mm2 die chips. Hopefully architecture/small clock increases/more bandwidth can help the newer chips edge out of the older ones. I personally don't even care if something bigger is coming. Tired of playing Division at Medium settings on my 980. I'm upgrading to whatever is the fastest single card that's out by the end of August. If Nvidia can't get Pascal out by then I'm not even considering them. If better stuff comes out next year that's way faster I'll just upgrade again.
https://forums.guru3d.com/data/avatars/m/169/169957.jpg
8.9B Transistors on a 596mm2 die for Fury X 8.0B Transistors on a 601mm2 die for Titan X. Idk, I get where everyone is coming from in general -- but I don't think heat density will be an issue. There is quite a large gap when going from WC to Air cooling on a GPU. My 980 went from like 78c loaded to 35c loaded. I'm sure they can improve air coolers further to match the heat dissipation necessary. And if not, just go with water cooling. Gamers will get over it. Fury X works fine with it, I haven't heard of any major issues and it was the first iteration. Biggest problem and prevention of larger GPU's is going to be manufacturing costs. Both companies are going to want profitable chips and they are going to want it in similar price ranges to what we see now. I think it's pretty clear that we see out of the demos of Polaris, is that we are going to end up with Fury X/980Ti levels of performance at ~120w/300mm2 die chips. Hopefully architecture/small clock increases/more bandwidth can help the newer chips edge out of the older ones. I personally don't even care if something bigger is coming. Tired of playing Division at Medium settings on my 980. I'm upgrading to whatever is the fastest single card that's out by the end of August. If Nvidia can't get Pascal out by then I'm not even considering them. If better stuff comes out next year that's way faster I'll just upgrade again.
I'm also open to selling my 980Ti for gp104 if it's worth it, problem is I'm bound to nvidia because of Cuda and their libraries, so AMD have to work a small miracle to win me over, unless I separate my gaming and workstation and just buy a card for each. Dat cash. As for the heat density, I just recognize it's an issue ; just look at skylake. They soldered the heatspreader on because delidding it isn't the greatest option; the die is so small the surface area available for thermal transfer is just insufficient to cool it effectively, obviously this doesn't affect gpu's quite as severely because they're much bigger by nature. I think the cooling will be fine, and I also expect double everything. Double transistor density, and double efficiency in terms of compute throughput - my only worry came from you actually, that transistor cost has remained the same, if not slightly more. IF Polaris 10 and gp104 will only MATCH fury X and Ti then I'll wait for Vega and big pascal, or maybe volta
https://forums.guru3d.com/data/avatars/m/243/243702.jpg
8.9B Transistors on a 596mm2 die for Fury X 8.0B Transistors on a 601mm2 die for Titan X. Idk, I get where everyone is coming from in general -- but I don't think heat density will be an issue. There is quite a large gap when going from WC to Air cooling on a GPU. My 980 went from like 78c loaded to 35c loaded. I'm sure they can improve air coolers further to match the heat dissipation necessary. And if not, just go with water cooling. Gamers will get over it. Fury X works fine with it, I haven't heard of any major issues and it was the first iteration. Biggest problem and prevention of larger GPU's is going to be manufacturing costs. Both companies are going to want profitable chips and they are going to want it in similar price ranges to what we see now. I think it's pretty clear that we see out of the demos of Polaris, is that we are going to end up with Fury X/980Ti levels of performance at ~120w/300mm2 die chips. Hopefully architecture/small clock increases/more bandwidth can help the newer chips edge out of the older ones. I personally don't even care if something bigger is coming. Tired of playing Division at Medium settings on my 980. I'm upgrading to whatever is the fastest single card that's out by the end of August. If Nvidia can't get Pascal out by then I'm not even considering them. If better stuff comes out next year that's way faster I'll just upgrade again.
I Agree, except one thing. Those 600mm^2 28nm chips will be around 162mm^2 if done by GloFo 14nm. And around 175mm^2 if made by TSMC 16nm. And their power consumption will be as high as cooling solution allows (within 300W standard). Because those technologies can clock damn high. If Fiji was 14nm and 2.4GHz, it's power consumption would be around 515W. That's upper limit (without increasing voltage) and it would be pretty hard do cool. For cooling Capacity imaginary scenario, 12mm * 14mm = 168mm^2. Which can be representative for both Fiji & GM200 die shrinks. - What are GPUs/CPUs/ASICs with that size (or bit larger to give benefit of doubt to cooling) - And what are their rated TDPs? Do we know any? FX-9590 - 220W - 315mm^2: that's 220W in double area and poses challenge to many coolers. To have same thermal density, FX-9590 would have to eat 410W (Noctua rated NH-D14 as 220W TDP, therefore slightly sub-optimal) i7-3770k - 77W - 160mm^2: Little mistake in using TIM instead of soldering heat spreader, and this 77W small chip could pose quite challenge to be kept cool. OC 4.6GHz eats around 155W, Temperature is kind of manageable 80~90°C in prolonged load. But definitely not Sandy kept below 55°C. (Found person testing i7-3770K @4.6GHz with D14 vs. Custom loop, both did hit 90°C in course of 12hour of prime95) Since we are used to see GPUs running 70~80°C w/ air cooling, I way, 150W Power consumption is cool-able for 8~9B transistor count GPU as I do not expect Ivy-like TIM failure. And important is clock GPU reaches at time it hits this (or another) cooling limit. Considering polaris 14nm has between 2 and 3 times higher power efficiency than Maxwell 28nm. 8.9B Fiji at 14nm can eat around 90~100W (rough estimation in middle) @1050MHz, leaving space for higher clock. But not much till 150W (~1450MHz +-100MHz).
data/avatar/default/avatar11.webp
I have never excluded clock. And I do not intend to. nVidia smartly used lower transistor density on 28nm and reached higher clocks. But this trick on 14/16nm will help only little. Because once over certain clock, cooling solution will not cope with heat made by GPU. And therefore no 25% higher clock for nVidia (or AMD) on 14/16nm.
You're saying that the company with decent advantage in perf/W will not be able to cope with the heat (apparently there is some imaginary hard MHz wall; what does TSMC knows about 16FF+ anyway) yet company whose products have rejuvenated Fermi African village monthly food electrical bill jokes, the company that had to reach for water and HBM to have some resemblance of parity - will be peachy heat wise http://abload.de/img/icon_lolj8x3d.gif Furthermore... Using lower density <- that is a trick used by Nvidia to reach higher clocks? Silly me, I always thought that traditionally higher transistor density on AMD products has been one of their advantages. When in fact all they had to do is use lower trans. density and clock sky-high like Nvidia. BTW Maxwell has bigger, not smaller transistor density than Kepler. No hard feelings, but I'm done trying to talk sense into you. Maybe some other time. Cheers.
https://forums.guru3d.com/data/avatars/m/169/169957.jpg
You're saying that the company with decent advantage in perf/W will not be able to cope with the heat (apparently there is some imaginary hard MHz wall; what does TSMC knows about 16FF+ anyway) yet company whose products have rejuvenated Fermi African village monthly food electrical bill jokes, the company that had to reach for water and HBM to have some resemblance of parity - will be peachy heat wise http://abload.de/img/icon_lolj8x3d.gif Furthermore... Using lower density <- that is a trick used by Nvidia to reach higher clocks? Silly me, I always thought that traditionally higher transistor density on AMD products has been one of their advantages. When in fact all they had to do is use lower trans. density and clock sky-high like Nvidia. BTW Maxwell has bigger, not smaller transistor density than Kepler. No hard feelings, but I'm done trying to talk sense into you. Maybe some other time. Cheers.
Fiji has a higher transistor density than maxwell I also don't think watercooling on Fiji is a boon, I hate aio, much rather air cooling or go all the way and invest in a proper liquid cooling kit
data/avatar/default/avatar26.webp
Fiji has a higher transistor density than maxwell
Point being that even with higher transistor density, Maxwell has better perf/W than Kepler. But Tahiti, Cypress and RV770 also have higher transistor density compared to GK104, GF100 and GT200 yet they don't get smashed in perf/W or "clocks". In fact they even win. So you see, higher transistor density is a conscious design choice, with obvious benefit of packing more *** in the same area. Not something that automatically makes your GPU spew la-va.
https://forums.guru3d.com/data/avatars/m/243/243702.jpg
Fiji has a higher transistor density than maxwell I also don't think watercooling on Fiji is a boon, I hate aio, much rather air cooling or go all the way and invest in a proper liquid cooling kit
card		Tr.	Area[mm^2]	Tr./Area[M/mm^2]
GTX 680		3.54	294		12.041
HD 7970		4.313	352		12.253

GTX 780Ti	7.08	561		12.62
r9-290x		6.2	438		14.155

980		5.2	398		13.065
Fury X		8.9	596		14.933
980 Ti		8.0	601		13.311
Pretty much sums it up. Both companies increased transistor density. Not because they invented magic trick, but because TSMC improved their process and reduced leakage at higher density. But at any given time, transistor made for AMD was exactly same transistor as TSMC made for nVidia. With same leakage to density ratio. With same operational voltage. And with same Vdrop caused by leakage. And with same range at which transistor voltage is considered as 1 or 0 (simplified). And therefore with same voltage range at which transistor is in undetermined state. And same time required to get from 0 to 1 state with given leakage and voltage. And this time which decides minimum period required for stable operation. I hope that this helps in understanding how transistor density affects maximum clock. But maybe someone is right in thinking that Fury X transistor density could have been used in December 2011 for HD 7970. And therefore HD 7970 could have had whooping 22% higher transistor density and lower cost. And maybe someone can come with brilliant idea that TSMC makes different kind of transistors for nVidia which simply clock higher than those they have for AMD regardless of physics. (But I am happy we do not have people like that here.)
https://forums.guru3d.com/data/avatars/m/169/169957.jpg
card		Tr.	Area[mm^2]	Tr./Area[M/mm^2]
GTX 680		3.54	294		12.041
HD 7970		4.313	352		12.253

GTX 780Ti	7.08	561		12.62
r9-290x		6.2	438		14.155

980		5.2	398		13.065
Fury X		8.9	596		14.933
980 Ti		8.0	601		13.311
Pretty much sums it up. Both companies increased transistor density. Not because they invented magic trick, but because TSMC improved their process and reduced leakage at higher density. But at any given time, transistor made for AMD was exactly same transistor as TSMC made for nVidia. With same leakage to density ratio. With same operational voltage. And with same Vdrop caused by leakage. And with same range at which transistor voltage is considered as 1 or 0 (simplified). And therefore with same voltage range at which transistor is in undetermined state. And same time required to get from 0 to 1 state with given leakage and voltage. And this time which decides minimum period required for stable operation. I hope that this helps in understanding how transistor density affects maximum clock. But maybe someone is right in thinking that Fury X transistor density could have been used in December 2011 for HD 7970. And therefore HD 7970 could have had whooping 22% higher transistor density and lower cost. And maybe someone can come with brilliant idea that TSMC makes different kind of transistors for nVidia which simply clock higher than those they have for AMD regardless of physics. (But I am happy we do not have people like that here.)
This is all fine and nobody was contesting this, you're limited by the switching frequency of the transistors once you're past all the other considerations, still the increased density in Fiji vs Maxwell doesn't justify the lower clock, especially considering power overhead maxwell carries using gddr vs hbm
https://forums.guru3d.com/data/avatars/m/243/243702.jpg
This is all fine and nobody was contesting this, you're limited by the switching frequency of the transistors once you're past all the other considerations, still the increased density in Fiji vs Maxwell doesn't justify the lower clock, especially considering power overhead maxwell carries using gddr vs hbm
Transistor ending in uncertain state is consequence of Vdrop (inability to deliver enough power to that particular transistor/too low circuit resistance while power supply has too high internal resistance). 980Ti GPU sucks less current than Fiji, because Fiji GPU gets all that power consumption which HBM1 saved in comparison to having 16 GDDR5 chips (512bit bus like r9-290). Maybe having Vcc delivered at few more places in GPU would allow for higher stability, but on other hand maybe internal PSU resistance and internal VRMs resistance would need to improve a bit too (like close to 0 ohms).
https://forums.guru3d.com/data/avatars/m/169/169957.jpg
Transistor ending in uncertain state is consequence of Vdrop (inability to deliver enough power to that particular transistor/too low circuit resistance while power supply has too high internal resistance). 980Ti GPU sucks less current than Fiji, because Fiji GPU gets all that power consumption which HBM1 saved in comparison to having 16 GDDR5 chips (512bit bus like r9-290). Maybe having Vcc delivered at few more places in GPU would allow for higher stability, but on other hand maybe internal PSU resistance and internal VRMs resistance would need to improve a bit too (like close to 0 ohms).
One of the big advantages of scaling transistors down is the voltage required for switching is lower, at 14nm you can pretty much start counting individual electrons, the time taken for the transistor to switch also decreases due to similar effects, anyway ; my point about Fiji is... Well you basically repeated it, Fiji core is getting more power than gm200 by quite a big margin; gddr5 and the imc taking up a big chunk of board power, yet 980ti (even overclocked) consumes less and performs better while running on air. I also wanted to ask, are those 8.9 billion transistors on Fiji accounting for hbm?
https://forums.guru3d.com/data/avatars/m/243/243702.jpg
One of the big advantages of scaling transistors down is the voltage required for switching is lower, at 14nm you can pretty much start counting individual electrons, the time taken for the transistor to switch also decreases due to similar effects, anyway ; my point about Fiji is... Well you basically repeated it, Fiji core is getting more power than gm200 by quite a big margin; gddr5 and the imc taking up a big chunk of board power, yet 980ti (even overclocked) consumes less and performs better while running on air. I also wanted to ask, are those 8.9 billion transistors on Fiji accounting for hbm?
No, HBM is separate. And that Vdrop which does not allow Fiji to clock that high is as you wrote... More transistor seep more power for normal operation. Then higher density worsens leakage. That lower transistor count and lower density for GM200 is nice advantage. And you are absolutely right I now feel like broken record 😀
https://forums.guru3d.com/data/avatars/m/242/242471.jpg
So this topic went all south.. who cares how much power will it consume.. all im "interested" is in performance.. and that looks like it wont be anything special since hbm2.0 is now reserved for Vega.
https://forums.guru3d.com/data/avatars/m/80/80129.jpg
So this topic went all south.. who cares how much power will it consume.. all im "interested" is in performance.. and that looks like it wont be anything special since hbm2.0 is now reserved for Vega.
I don't see how HBM2.0 is a requirement for "special performance". AMD could easily put out a card 40-50% faster then current gen, on HBM1, and it would be fine for the most part.