AMD Working on 16-Core Processor with Integrated PCI Express 3.0 Controller

#4745738

BLEH!

2014-01-17 15:08

Assuming this is gunna be a monolithic die for server applications, though if they did export do desktop... 😀

#4745747

BLEH!

2014-01-17 15:15

By the time they integrate PCI-E 3.0, Intel will be on 4.0 lol.

I doubt that :P

#4745757

schmidtbag

2014-01-17 15:36

I doubt that :P

I would hope so. As far as I'm aware, we haven't even saturated 2.0 yet. Aside from some SSDs, I don't really understand the point of releasing PCIe 3.0. AMD already has several 16 core Opterons. I'm guessing their current generation is Steamroller based, and I figure this new 16 core will also only be an Opteron. AMD has stated before they're not targeting the high-end/enthusiast desktop PC market anymore and that's exactly what a 16 core would be classified as. I don't see a reason for a 16 core entering the desktop market anyway - most people still can't put good use to an i7. But, Opterons are still relatively cheap. You could probably make a pretty good desktop computer out of an Opteron system as long as you expect the motherboard you get likely lack Crossfire/SLi support, built-in audio, and a slew of USB ports.

#4745765

CPC_RedDawn

2014-01-17 16:04

http://www.quickmeme.com/img/c5/c5845ea73d1f102e7a49b4994a4e9cbfdef0b93ceafdccf19686db52b213bd6a.jpg

#4745768

BLEH!

2014-01-17 16:10

I would hope so. As far as I'm aware, we haven't even saturated 2.0 yet. Aside from some SSDs, I don't really understand the point of releasing PCIe 3.0. AMD already has several 16 core Opterons. I'm guessing their current generation is Steamroller based, and I figure this new 16 core will also only be an Opteron. AMD has stated before they're not targeting the high-end/enthusiast desktop PC market anymore and that's exactly what a 16 core would be classified as. I don't see a reason for a 16 core entering the desktop market anyway - most people still can't put good use to an i7. But, Opterons are still relatively cheap. You could probably make a pretty good desktop computer out of an Opteron system as long as you expect the motherboard you get likely lack Crossfire/SLi support, built-in audio, and a slew of USB ports.

Exactly. The current 16-core Opterons are a bit like the Pentium-D and Core2-Quads, though, dual 8-core Bulldozer/Piledriver dies on one chip, so having this as a monolithic CPU would be a big step up for AMD. Saying that, looking at Kaveri on the new 28 nm CPU has really shrunk the die size down quite a bit from what you'd expect, so assuming it'd be on the same process is reasonable. We might even see something that can complete with Haswell-E's 8-core if we're lucky. If I were aiming for a cheap server though, it would be AMD based, the intel Xeons are horrendously expensive, Opterons, not so much.

#4745781

Tugrul_512bit

2014-01-17 16:36

Mantle + 16 core gaming can be good. Crop some L3, add 4 more cores(which means more L1(and even L2)), add two more channels for memory, add some pipeline depth, decrease frequency and increase efficiency, increase working temperature so it works okay even over 75°C, make L2 or L3 caches addressable by APIs like CUDA and OPENCL and .... so better game physics, all these can be even better.

#4745803

GhostXL

2014-01-17 17:03

I would hope so. As far as I'm aware, we haven't even saturated 2.0 yet. Aside from some SSDs, I don't really understand the point of releasing PCIe 3.0..

I think you mean releasing PCIe 4.0. We've had 3.0 for a few years now, and I'm running 3.0 x8/x8 in SLI now. Intel has plans on releasing 4.0 this year or next. I thought I read somewhere by 2015 at the latest. I saw 780's running in a x8/x8 2.0 rig and were slower in benches and gaming. 4.0 may be needed for the super enthusiasts looking to push all that bandwidth. There are already bigger and badder cards than Maxwell planned. I myself am not upgrading though.

#4745819

poornaprakash

2014-01-17 17:31

PCI Express 4 ?? I think you mean releasing PCIe 4.0. We've had 3.0 for a few years now, and I'm running 3.0 x8/x8 in SLI now. Intel has plans on releasing 4.0 this year or next. I thought I read somewhere by 2015 at the latest. I saw 780's running in a x8/x8 2.0 rig and were slower in benches and gaming. 4.0 may be needed for the super enthusiasts looking to push all that bandwidth. There are already bigger and badder cards than Maxwell planned. I myself am not upgrading though.
There is still no graphics card can bottleneck a PCI-E x16 v2.0 then why you need v4 ??:puke2: I think you dont have any idea about PCI-E and its bandwidth......

#4745832

k3vst3r

2014-01-17 17:43

apparently skylake is getting pci-e 4.0 since AMD has removed the CF bridge for R9 an made them bridgeless apparently they can now saturate pci-e 3.0 @ 16x since they talk over the pci-e now

#4745847

AcidSnow

2014-01-17 18:02

There is still no graphics card can bottleneck a PCI-E x16 v2.0 then why you need v4 ??:puke2: I think you dont have any idea about PCI-E and its bandwidth......

I remember reading some benches a year ago about PCI-e 2.0 vs 3.0, and if memory serves me right, there was only a 3% degradation when using 2.0 (instead of 3.0). ...Things might change this year, but I won't be upgrading anything because I expect Mantle to supplement my CPU well enough that I'll be able to ride my i7 920 (& R9 290) for two more years 🙂

#4745848

sykozis

2014-01-17 18:02

Mantle + 16 core gaming can be good. Crop some L3, add 4 more cores(which means more L1(and even L2)), add two more channels for memory, add some pipeline depth, decrease frequency and increase efficiency, increase working temperature so it works okay even over 75°C, make L2 or L3 caches addressable by APIs like CUDA and OPENCL and .... so better game physics, all these can be even better.

Ummmm...huh? L1, L2 and L3 aren't addressable by APIs because it would cause problems for the processor. Increasing pipeline depth would be very bad. Decreasing frequency, while increasing pipeline depth would be suicide for AMD. To increase efficiency, you have to shorten the pipeline. AMD can't do anything that affects CUDA because they have no rights to it...also, allowing CUDA to access cache, wouldn't improve PhysX in the least as the system would be too unstable to be usable.

#4745853

-Tj-

2014-01-17 18:13

Its based on SteamRoller,

"Family 15h Models 30h - 3fh" "There's no clarity on which exact micro-architecture the CPU modules are based on" Family 15h Models 30-3Fh means Steamroller uarch.

post#3 http://www.techpowerup.com/196952/amd-readies-16-core-processors-with-full-uncore.html and SteamRoller looks very powerful http://www.anandtech.com/show/6201/amd-details-its-3rd-gen-steamroller-architecture

#4745877

Tugrul_512bit

2014-01-17 18:51

Increasing pipeline depth would be very bad.

Why increasing the pipeline depth cannot increase instructions per cycle? How can we increase total performance of CPU then? Increasing clock frequency versus increasing IPC?. Which one is more future-proof? Which one is more efficient in terms of "instructions per Joule"?

#4745888

sykozis

2014-01-17 19:06

Why increasing the pipeline depth cannot increase instructions per cycle? How can we increase total performance of CPU then? Increasing clock frequency versus increasing IPC?. Which one is more future-proof? Which one is more efficient in terms of "instructions per Joule"?

The longer the pipeline, the longer it takes for an instruction to complete, thus reducing performance (and efficiency) The long pipeline was among the drawbacks of Intel's NetBurst architecture. With Conroe, Intel drastically reduced the pipeline. Shorter pipeline results in instructions completing faster. Shorter pipelines are more efficient. The shorter pipeline of the Athlon series processors is part of the reason that they were just as fast, at lower clock speeds, as the Pentium 4 and Pentium-D processors.

#4745918

Tugrul_512bit

2014-01-17 19:42

The longer the pipeline, the longer it takes for an instruction to complete, thus reducing performance (and efficiency) The long pipeline was among the drawbacks of Intel's NetBurst architecture. With Conroe, Intel drastically reduced the pipeline. Shorter pipeline results in instructions completing faster. Shorter pipelines are more efficient. The shorter pipeline of the Athlon series processors is part of the reason that they were just as fast, at lower clock speeds, as the Pentium 4 and Pentium-D processors.

Then longer pipeline increases pipeline latency so this leads lesser instructions per second?(as long as instruction issue/fetcher remains same?) Then it is like: short pipeline(tripartitioned, single issue): 1 instruction = 3 cycles ----> 1 instruction per 3 cycles inefficient 2 instructions = 4 cycles ----> 1 instruction per 2 cycles...ok 3 instructions = 5 cycles ----> 3/5 better 4 instructions = 6 cycles ----> 2/3 even better but low probability 5 instructions = 7 cycles ----> 5 /7 best but very hard to maintain? long pipeline(tenfold, single issue): 1 instruction = 30 cycles ----> 1/30 yes very slow 2 instructions = 31 cycles -----> 2/31 nearly double of the first one 3 instructions = 32 cycles -----> 3/32 ---> cycles hardly increase but instructions increase faster ... ... 10 instructions = 39 cycles ----> nearly 1 instruction per 4 cycle long pipeline(tenfold, 20 issued): 20 instructions = 49 cycles ----> 2/5 very good from the beginning 40 instructions = 69 cycles -----> 4/7 even better 60 instructions = 89 cycles -----> 2 instructions per 3 cycles You are right. But faster issuing can help, can it? I meant instruction fetching by "issue"

#4745941

schmidtbag

2014-01-17 20:24

For all of you not understanding how bandwidth on PCI-e works, take your 3.0 port and a high-end 3.0 GPU, drop it from 16 lanes down to 8 and run benchmarks between both. You likely won't see a difference (maybe 1 or 2 FPS). Now, drop it down to 4 lanes. You might lose a few FPS here and there, but the game should still be playable. The only reason for increasing bandwidth per-lane is for the PCI-e 1x devices, such as TV tuners, SSDs, Thunderbolt cards, or USB 3.x cards. Otherwise, we don't even need the bandwidth of 3.0 for modern GPUs. Assuming PCIe 4.0 will continue the trend of doubling bandwidth, one lane will be equally as fast as 8 lanes from the first generation, which is good enough to run most mid-range GPUs. It won't be long until something like the Titan can run off a 1x slot.

#4745948

vbetts

Moderator

2014-01-17 20:35

16 cores is definitely nice, but AMD needs to focus on single threaded performance. I'll give AMD this though, their server cpu's are crazy!

#4745975

sykozis

2014-01-17 21:09

Then longer pipeline increases pipeline latency so this leads lesser instructions per second?(as long as instruction issue/fetcher remains same?) Then it is like: short pipeline(tripartitioned, single issue): 1 instruction = 3 cycles ----> 1 instruction per 3 cycles inefficient 2 instructions = 4 cycles ----> 1 instruction per 2 cycles...ok 3 instructions = 5 cycles ----> 3/5 better 4 instructions = 6 cycles ----> 2/3 even better but low probability 5 instructions = 7 cycles ----> 5 /7 best but very hard to maintain? long pipeline(tenfold, single issue): 1 instruction = 30 cycles ----> 1/30 yes very slow 2 instructions = 31 cycles -----> 2/31 nearly double of the first one 3 instructions = 32 cycles -----> 3/32 ---> cycles hardly increase but instructions increase faster ... ... 10 instructions = 39 cycles ----> nearly 1 instruction per 4 cycle long pipeline(tenfold, 20 issued): 20 instructions = 49 cycles ----> 2/5 very good from the beginning 40 instructions = 69 cycles -----> 4/7 even better 60 instructions = 89 cycles -----> 2 instructions per 3 cycles You are right. But faster issuing can help, can it? I meant instruction fetching by "issue"

If deeper pipelines were better, AMD's FX series would easily beat Intel's Core series processors and Intel's NetBurst architecture wouldn't have been replaced by Conroe.....or been slower than AMD's K7 and K8 architectures. Fact of the matter is, shorter pipes provide better performance. Shorter pipelines mean instructions complete faster.

#4745991

Tugrul_512bit

2014-01-17 21:24

If deeper pipelines were better, AMD's FX series would easily beat Intel's Core series processors and Intel's NetBurst architecture wouldn't have been replaced by Conroe.....or been slower than AMD's K7 and K8 architectures. Fact of the matter is, shorter pipes provide better performance. Shorter pipelines mean instructions complete faster.

Then the main idea is just keeping all compute cells busy using the pipelining and keeping the pipeline as short as possible? Putting a gpu inside of a cpu may have a similar reason then? Shorter ways, fewer stages between two?. Maybe thats why Nvidia will add stacked dram in GPUs? Then total length of data pipes between cores are important if we were to add more cores?(maybe some optimization algorithms can be used here? Like "simulated annealing" to find a golden geometry of cores and best "core to communication length" ratio?)

#4746006

alientorni

2014-01-17 21:50

cu seems more "compute units" than cpu for me, so i guess that first picture is a gpu.