Intel Steps Away From Ringbus - Skylake-X & SP Communicate through Mesh

Exascale

2017-06-16 09:37

Itll be interesting to see if they use this on smaller chips. Theyve been using a mesh topology on Knights Landing for the same reason.

#5443696

JamesSneed

2017-06-16 14:10

Well I wonder how this will impact multi-threaded testing where Ryzen shined against Broadwell-E. Intel always has something sneaky up there sleeves don't they.

#5443781

PrMinisterGR

2017-06-16 17:05

This is a better interconnect than AMD is using with Ryzen, but it essentially traps Intel into using large monolithic chips. It doesn't solve the biggest issue they have, which is manufacturing scalability. Ryzen seems to be a success, and compiler adaptation for it seems to be going well, so I believe that this ship has sailed, even if Intel is emphasizing how their own chips don't require code changes to facilitate the communication speeds between cores.

#5443788

Exascale

2017-06-16 17:22

This is a better interconnect than AMD is using with Ryzen, but it essentially traps Intel into using large monolithic chips. It doesn't solve the biggest issue they have, which is manufacturing scalability. Ryzen seems to be a success, and compiler adaptation for it seems to be going well, so I believe that this ship has sailed, even if Intel is emphasizing how their own chips don't require code changes to facilitate the communication speeds between cores.

Why would it limit their scalability? Their roadmap indicates the opposite and their next big Xeons are literally part of the Xeon SP family. The S being "scalable".

#5443789

Denial

2017-06-16 17:26

Why would it limit their scalability? Their roadmap indicates the opposite and their next big Xeons are literally part of the Xeon SP family. The S being "scalable".

It limits it in terms of cost competitiveness. AMD's solution is way more modular for fabrication purposes, which makes it easier/cheaper to bin. Intel can still scale it, but building one 600mm2 processor is more expensive than building 4, 150mm2 processors and connecting them via infinity fabric.

#5443802

Exascale

2017-06-16 17:52

It limits it in terms of cost competitiveness. AMD's solution is way more modular for fabrication purposes, which makes it easier/cheaper to bin. Intel can still scale it, but building one 600mm2 processor is more expensive than building 4, 150mm2 processors and connecting them via infinity fabric.

That assumes that Intel, a company with its own fabs, has the same defect rates with their lithography that AMDs chips do. Im not sure if they do or dont to be honest. It may be the case that AMD has to go through a whole bunch of smaller chips to get one big chip consisting of four small ones, erasing the cost advantage. If they were both fabless semiconductor companies using the same fab, on the same node, then AMDs solution would definitely be more cost competetive. Im not saying youre wrong. Im just thinking that people are calling this stuff prematurely. I think it will really come down to how well these 4x small chips in one actually perform. Ive been reading that Epyc and Threadripper are going to have eight channels of memory compared to six on Skylake SP, but i think thats another case of people missing a key piece of information. Current E7 Xeons, and Haswell E7s had "4 channel" memory that had a special buffer chip called Jordan Creek(i think) which allowed it to effectively be 8 channel memory or work in mirrored 4 channel memory mode for better fault tolerance. I believe Skylake SP has something similar, so thats another premature comparison.

#5443861

Fox2232

2017-06-16 20:48

That assumes that Intel, a company with its own fabs, has the same defect rates with their lithography that AMDs chips do. Im not sure if they do or dont to be honest. It may be the case that AMD has to go through a whole bunch of smaller chips to get one big chip consisting of four small ones, erasing the cost advantage. If they were both fabless semiconductor companies using the same fab, on the same node, then AMDs solution would definitely be more cost competetive. Im not saying youre wrong. Im just thinking that people are calling this stuff prematurely. I think it will really come down to how well these 4x small chips in one actually perform. Ive been reading that Epyc and Threadripper are going to have eight channels of memory compared to six on Skylake SP, but i think thats another case of people missing a key piece of information. Current E7 Xeons, and Haswell E7s had "4 channel" memory that had a special buffer chip called Jordan Creek(i think) which allowed it to effectively be 8 channel memory or work in mirrored 4 channel memory mode for better fault tolerance. I believe Skylake SP has something similar, so thats another premature comparison.

Look at this in following way: If AMD has 3 defects in average on 600 mm^2 area, they get 25% of perfectly working chips in average because their chips would be 150 mm^2 in Denial's scenario. And are able to sell their top CPUs. If intel had just 1 defect on average on 600 mm^2 area, they get no perfectly working chips in average. And therefore will have every single chip with some disabled cores. That's why die size maters. = = = = In reality there are many types of defects with different negative impacts. But if you have 3 times as large die, you have 3 times as large chance of being hit by those negative effects. And each defect reduces selling value of chip. = = = = AMD's Ryzen dies come out as usable for selling in 85%. Considering that they started to sell 8C/16T 1st, then 6C/12T, and 4C/8T came last... we can conclude that distribution of success rate favors 8C/16T. So lets assume: 8C/16T ~35% 6C/12T ~ 25% 4C/8T ~25% Trash ~ 15% If AMD used double die size: 8C/16T ~17.5% 6C/12T ~ 21.25% 4C/8T ~23.1% Trash ~ 38.15% If AMD used 4 times as large die: 8C/16T ~8.75% 6C/12T ~ 12.8% 4C/8T ~15.9% Trash ~ 62.55% - - - - Those values are just approximation of approximations because there are no publicly known chances per different type of defect per area. But it is close enough to show how huge positive effect smaller die size has on harvesting usable (sell-able) dies in the end.

#5443867

PrMinisterGR

2017-06-16 21:09

That assumes that Intel, a company with its own fabs, has the same defect rates with their lithography that AMDs chips do. Im not sure if they do or dont to be honest. It may be the case that AMD has to go through a whole bunch of smaller chips to get one big chip consisting of four small ones, erasing the cost advantage.

They don't. The size that AMD is working on are CCXs, usually glued by two in an octocore zeppelin. What they really need to produce is a quad CCX, everything else is a matter of gluing stuff together.

If they were both fabless semiconductor companies using the same fab, on the same node, then AMDs solution would definitely be more cost competetive. Im not saying youre wrong. Im just thinking that people are calling this stuff prematurely.

Intel does have the most advanced fabs in the world, but not that much more advanced so that much larger dies have equal/better yields than much smaller ones.

I think it will really come down to how well these 4x small chips in one actually perform. Ive been reading that Epyc and Threadripper are going to have eight channels of memory compared to six on Skylake SP, but i think thats another case of people missing a key piece of information. Current E7 Xeons, and Haswell E7s had "4 channel" memory that had a special buffer chip called Jordan Creek(i think) which allowed it to effectively be 8 channel memory or work in mirrored 4 channel memory mode for better fault tolerance. I believe Skylake SP has something similar, so thats another premature comparison.

Jordan Creek isn't really an octo-channel solution, and it's not even supporting DDR4. My whole question regarding EPYC's performance is to see how it scales with memory bandwidth. All of what we've seen from Ryzen is that it seems to like more bandwidth.

#5443873

Agent-A01

2017-06-16 21:23

Jordan Creek it's not even supporting DDR4.

Memory Types DDR4 1333/1600/1866, DDR3 1066/1333/1600, RDIMM/LRDIMM/LVDIMM

#5443885

PrMinisterGR

2017-06-16 21:42

Memory Types DDR4 1333/1600/1866, DDR3 1066/1333/1600, RDIMM/LRDIMM/LVDIMM

Aaaannd you're completely correct. 😀 It's said to be a Xeon product explicitly.

#5443918

BLEH!

2017-06-16 23:58

Look at this in following way: If AMD has 3 defects in average on 600 mm^2 area, they get 25% of perfectly working chips in average because their chips would be 150 mm^2 in Denial's scenario. And are able to sell their top CPUs. If intel had just 1 defect on average on 600 mm^2 area, they get no perfectly working chips in average. And therefore will have every single chip with some disabled cores. That's why die size maters. = = = = In reality there are many types of defects with different negative impacts. But if you have 3 times as large die, you have 3 times as large chance of being hit by those negative effects. And each defect reduces selling value of chip. = = = = AMD's Ryzen dies come out as usable for selling in 85%. Considering that they started to sell 8C/16T 1st, then 6C/12T, and 4C/8T came last... we can conclude that distribution of success rate favors 8C/16T. So lets assume: 8C/16T ~35% 6C/12T ~ 25% 4C/8T ~25% Trash ~ 15% If AMD used double die size: 8C/16T ~17.5% 6C/12T ~ 21.25% 4C/8T ~23.1% Trash ~ 38.15% If AMD used 4 times as large die: 8C/16T ~8.75% 6C/12T ~ 12.8% 4C/8T ~15.9% Trash ~ 62.55% - - - - Those values are just approximation of approximations because there are no publicly known chances per different type of defect per area. But it is close enough to show how huge positive effect smaller die size has on harvesting usable (sell-able) dies in the end.

That was usable as in 85% of dies have 8 cores usable...

#5443921

Fox2232

2017-06-17 00:09

That was usable as in 85% of dies have 8 cores usable...

I do remember statement, but not exact context. So you may very well be right. Anyway, it was just for illustration. To show that bigger dies reduce yields (return of investment) and increase price as a result.

#5443958

Exascale

2017-06-17 04:14

They don't. The size that AMD is working on are CCXs, usually glued by two in an octocore zeppelin. What they really need to produce is a quad CCX, everything else is a matter of gluing stuff together. Intel does have the most advanced fabs in the world, but not that much more advanced so that much larger dies have equal/better yields than much smaller ones. Jordan Creek isn't really an octo-channel solution, and it's not even supporting DDR4. My whole question regarding EPYC's performance is to see how it scales with memory bandwidth. All of what we've seen from Ryzen is that it seems to like more bandwidth.

Its Jordan Creek-2 that has 2 DDR4 channels per buffer, and also supports DDR3. Forgot the 2. While the 8 channels of memory arent handled directly on the CPU, it does effectively give it 8 channel memory in Perfirmance Mode. It has Lockstep for better reliability and fault tolerance, which makes it 4 channel speed, with each DIMM mirrored. What i was saying, is that if Intel keeps the memory buffer in the Skylake-SP, theyll have 6 "real" channels that can use the Jordan Creek style buffer and have 6 channel speeds with mirroring or 12 channel speeds without mirroring. And i agree about binning, generally speaking. But Intel has consolidated ALL of its big Xeons(E5 and E7) into a single socket and platform(socket 3647). That means that they can and will sell chips that have a bunch of cores disabled, made from their large dies. The net result financially depends on their wafer yields.

#5443992

__hollywood|meo

2017-06-17 09:23

It may be the case that AMD has to go through a whole bunch of smaller chips to get one big chip consisting of four small ones, erasing the cost advantage.

but they dont. this is a foundationless statement. not attacking you, & i understand your point...but info about zen CCX yield rates isnt hard to find. meanwhile fox is right anyway - the modular design is more efficient than intels brute force approach.

#5443999

Aura89

2017-06-17 09:40

Intel is no longer sandbagging us...great news, everyone!

Might be a bit early on that prediction....don't hold your breath.

#5444022

Exascale

2017-06-17 11:40

but they dont. this is a foundationless statement. not attacking you, & i understand your point...but info about zen CCX yield rates isnt hard to find. meanwhile fox is right anyway - the modular design is more efficient than intels brute force approach.

I said maybe because idk their yields, as i said. I wouldnt mind reading about them if you have links. Im not sure id call intels approach brute force compared to using 4 smaller chips as one big one. Itll be interesting to see how they compare in real workloads. I havent seen detailed info on the ccNUMA arrangements for either. X86 chips in general are brute force architectures 😉

#5444032

Aura89

2017-06-17 12:05

Im not sure id call intels approach brute force compared to using 4 smaller chips as one big one.

Ok, i have a question for you. Considering the fact that i can only think of one way "brute force" would be stated as how processors are made, is how intel is doing it, how would you define "brute force"?

#5444036

Loophole35

2017-06-17 12:50

That was usable as in 85% of dies have 8 cores usable...

Do you have a source or quote on that? I remember reading that 85% number as well but there was no context to it which suggested usable not perfect and they are done as individual 4 core CCX's IIRC.

#5444040

Exascale

2017-06-17 13:09

Ok, i have a question for you. Considering the fact that i can only think of one way "brute force" would be stated as how processors are made, is how intel is doing it, how would you define "brute force"?

Considering the delicacy with which processors are made, idk that brute force is really applicable at all. But for CPUs i would say an approach that doesnt involve the refinement of the chip itself or its architecture, but simply adds "more" could be considered brute force. Im not sure that refining wafer yields is to make huge chips is really worthy of being called brute force. I dont think that AMDs approach brute fore is either. I think its quite clever. Need to see how it runs the kind of workloads its competition runs before we know its value proposition though.

#5444074

chronek

2017-06-17 17:40

It is no brainer that amd approach as glue smaller die is cheaper to produce but there is one more thing, to produce more cored cpu amd have to just glue more smaller die where intel have to design whole new chip