AMD Flirting with big.LITTLE processor design - files a Patent

Fox2232

2020-08-10 11:09

One should distinct instructions used. That "small" CPU core may happen to be faster at certain workloads. It can as well be some minimalistic CPU with extra fixed function HW for some things that would make logic in larger Zen core too slow. What it can be missing? AVX?

#5816605

Fediuld

2020-08-10 12:54

Fox2232:

One should distinct instructions used. That "small" CPU core may happen to be faster at certain workloads. It can as well be some minimalistic CPU with extra fixed function HW for some things that would make logic in larger Zen core too slow. What it can be missing? AVX?

What is the point of AVX512? Everyone writing code for ML where could have some use, they write code for GPUs either CUDA/Tensor or through RoCm for GCN (and upcoming CDNA). They do not write for CPUs let alone customised code for AVX 512 which would run on what? 28 core CPUs when can buy 64 core CPU for less and don't care about AVX 512 extra performance since the brute force is enough? So let AVX 512 die. Is useless pile of crap.

#5816617

Denial

2020-08-10 13:53

Fediuld:

What is the point of AVX512? Everyone writing code for ML where could have some use, they write code for GPUs either CUDA/Tensor or through RoCm for GCN (and upcoming CDNA). They do not write for CPUs let alone customised code for AVX 512 which would run on what? 28 core CPUs when can buy 64 core CPU for less and don't care about AVX 512 extra performance since the brute force is enough? So let AVX 512 die. Is useless pile of crap.

Ya, opmask and scatter totally a useless pile of crap. I remember my first joke.

#5816644

JamesSneed

2020-08-10 15:18

This whole hybrid computing(its not big.little that is ARM specific term) is pointless unless you are severely TDP constrained think tables or smaller. You are simply one full node of process improvement away from having all large cores anyhow. The SoC fragmentation this causes is going to be a complete pain especially if everyone does there own hybrid computing approach. I have no doubt Microsoft would screw up the kernel trying to deal with these differences. I'm not a fan of this at all. For example we will see Intel's Tiger Lake get dominated by AMD's next gen APU's in 2021 because of the better process and all large cores.

#5816648

schmidtbag

2020-08-10 15:56

Yes! I'm glad AMD is doing this too, though I'm a little surprised Intel hasn't patented this in a way to cripple AMD's efforts, since Intel also appears to be going this route.

Fox2232:

One should distinct instructions used. That "small" CPU core may happen to be faster at certain workloads. It can as well be some minimalistic CPU with extra fixed function HW for some things that would make logic in larger Zen core too slow. What it can be missing? AVX?

Yes, I'm thinking the smaller cores will probably be able to clock faster for tasks that need it, while also sipping power for background workloads. There's plenty they could strip out. Just look at ARM - its instruction set is pretty sparse yet you can comfortably run a laptop experience on it, provided you have about 8 cores. I figure most calculations are pretty basic and don't need any fancy instructions.

JamesSneed:

This whole hybrid computing(its not big.little that is ARM specific term) is pointless unless you are severely TDP constrained think tables or smaller. You are simply one full node of process improvement away from having all large cores anyhow. The SoC fragmentation this causes is going to be a complete pain especially if everyone does there own hybrid computing approach. I have no doubt Microsoft would screw up the kernel trying to deal with these differences. I'm not a fan of this at all. For example we will see Intel's Tiger Lake get dominated by AMD's next gen APU's in 2021 because of the better process and all large cores.

I agree it is a problem if everyone's approach is a little too different, but I don't think you're looking at this the right way. TDP issues are why x86 has been unable to compete in the mobile market, and isn't too favorable in the robotics market either. For desktops, I imagine the small cores could be clocked significantly higher, so if you're running a simple workload, adding more instructions doesn't accomplish anything; it just makes the chip more expensive and less stable at higher speeds. So in a performance standpoint, these could be very useful. Meanwhile, if something doesn't demand too many "fancy instructions" but can utilize many cores, you can fit more of the small cores on a single die. So depending on your workload, you could get a significant performance uplift at a lower cost and lower wattage.

#5816704

JamesSneed

2020-08-10 17:58

schmidtbag:

Yes! I'm glad AMD is doing this too, though I'm a little surprised Intel hasn't patented this in a way to cripple AMD's efforts, since Intel also appears to be going this route. Yes, I'm thinking the smaller cores will probably be able to clock faster for tasks that need it, while also sipping power for background workloads. There's plenty they could strip out. Just look at ARM - its instruction set is pretty sparse yet you can comfortably run a laptop experience on it, provided you have about 8 cores. I figure most calculations are pretty basic and don't need any fancy instructions. I agree it is a problem if everyone's approach is a little too different, but I don't think you're looking at this the right way. TDP issues are why x86 has been unable to compete in the mobile market, and isn't too favorable in the robotics market either. For desktops, I imagine the small cores could be clocked significantly higher, so if you're running a simple workload, adding more instructions doesn't accomplish anything; it just makes the chip more expensive and less stable at higher speeds. So in a performance standpoint, these could be very useful. Meanwhile, if something doesn't demand too many "fancy instructions" but can utilize many cores, you can fit more of the small cores on a single die. So depending on your workload, you could get a significant performance uplift at a lower cost and lower wattage.

You are mistaking the symptoms for the cause. Intel simply didn't prioritize atom cores and making millions of them that would be needed to compete in the mobile segment. Speaking of Atom cores were fairly competitive to ARM. This is a good read an on point: https://www.extremetech.com/computing/227816-how-intel-lost-the-mobile-market-part-2-the-rise-and-neglect-of-atom#:~:text=The%20common%20explanation%20for%20why,compared%20with%20their%20ARM%20counterparts.&text=It's%20a%20simple%2C%20common%2Dsense,It%20mistakes%20symptoms%20for%20cause.

#5816716

schmidtbag

2020-08-10 18:19

JamesSneed:

You are mistaking the symptoms for the cause. Intel simply didn't prioritize atom cores and making millions of them that would be needed to compete in the mobile segment. Speaking of Atom cores were fairly competitive to ARM.

No, I'm not. x86 scales down poorly. The link you provided confirms that. Atom cores were competitive in speed but not in efficiency. In order for the Atom to use the same wattage as an ARM CPU in a phone, it had to be made slower than ARM's offerings. In theory, the Atom could offer more performance-per-watt under heavy load while taking advantage of the Atom's many additional instructions, but that's Intel not understanding how that market works. Idle power usage is absolutely critical to a phone, and phones aren't meant to do complex calculations. ARM cores use little to no power at all when idle, and they're fast enough for everyday phone needs. Meanwhile, ARM chips are cheaper, because of their simplicity and competitive licensing. So, Intel was at a huge disadvantage in every way, and didn't understand the market they were trying to tap into.

#5816882

sykozis

2020-08-11 05:16

Fox2232:

One should distinct instructions used. That "small" CPU core may happen to be faster at certain workloads. It can as well be some minimalistic CPU with extra fixed function HW for some things that would make logic in larger Zen core too slow. What it can be missing? AVX?

They could always pair Jaguar with Zen for a big.LITTLE implementation........ Use Jaguar for background tasks and save Zen for the important stuff.... Jaguar actually performs fairly well and uses almost no power. Even clocked at 2.4ghz, it's performance is good enough to handle background tasks....

#5816919

slyphnier

2020-08-11 09:35

interesting what amd trying to do, by make the processor itself have logic to decide what the run if that work flawless it will be great, but otherwise it will cause more issue in arm world, big.little still not that smooth either like this how developer get issue with different arm version : https://medium.com/@jaddr2line/a-big-little-problem-a-tale-of-big-little-gone-wrong-e7778ce744bb https://en.wikipedia.org/wiki/ARM_big.LITTLE "In practice, a big.LITTLE system can be surprisingly inflexible. One issue is the number and types of power and clock domains that the IC provides. These may not match the standard power management features offered by an operating system. Another is that the CPUs no longer have equivalent abilities, and matching the right software task to the right CPU becomes more difficult. Most of these problems are being solved by making the electronics and software more flexible." now if we thinking to use this to work on windows, with win10 current state how each update causing new issues than what they fix, yeah it will be long time before it reach prime-time. for specific device with custom OS(linux) seems it can be great though