PCI-SIG Announces Initial Draft of PCIe 7.0 Specification, Aiming for 2025 Release

Published by

Click here to post a comment for PCI-SIG Announces Initial Draft of PCIe 7.0 Specification, Aiming for 2025 Release on our message forum
https://forums.guru3d.com/data/avatars/m/271/271781.jpg
Can't wait to get my shiny new PCIe 7 compliant E-ATX motherboard in 2026, and find 1 slot on it, because as bandwidth per lane has been going up, slots have been disappearing. Its a shell game for desktop and even HEDT platforms.
https://forums.guru3d.com/data/avatars/m/263/263841.jpg
Ojref:

Can't wait to get my shiny new PCIe 7 compliant E-ATX motherboard in 2026, and find 1 slot on it, because as bandwidth per lane has been going up, slots have been disappearing. Its a shell game for desktop and even HEDT platforms.
It'll just be covered in M.2 style slots for everything
https://forums.guru3d.com/data/avatars/m/248/248627.jpg
We'll be able to run everything off x1 or x4 slots at that point, I'm still @ 3.0 and have 0 need to go any higher atm but my gpus will start to need 4.0 going forward.
https://forums.guru3d.com/data/avatars/m/246/246171.jpg
icedman:

We'll be able to run everything off x1 or x4 slots at that point, I'm still @ 3.0 and have 0 need to go any higher atm but my gpus will start to need 4.0 going forward.
I agree; I think for consumer-grade hardware, PCIe 4.0 ought to be the last generation where x16 slots are necessary. By the time such hardware will saturate 4.0 @ x16 or 5.0 @ x8, gen 6.0 will be available. More lanes just increases the overall cost of the system, and for what? The only consumer-grade hardware that benefits from newer generations are basically video capture cards and SSDs, neither of which require more than x4 lanes. In the server market, however, I understand there is a much greater need for more bandwidth. A lot more.
https://forums.guru3d.com/data/avatars/m/268/268248.jpg
Feels like we where stuck to pcix3 and now they just wake up and popping em out way faster than needed, maybe they are in a rush for ml application thos things inhale data.
https://forums.guru3d.com/data/avatars/m/246/246171.jpg
Venix:

Feels like we where stuck to pcix3 and now they just wake up and popping em out way faster than needed, maybe they are in a rush for ml application thos things inhale data.
As a sudden realization, in the consumer space, Occulink, M.2 drives, and Thunderbolt/USB4 are all demanding more and more bandwidth too.
https://forums.guru3d.com/data/avatars/m/236/236974.jpg
schmidtbag:

for consumer-grade hardware, PCIe 4.0 ought to be the last generation where x16 slots are necessary
This made no sense 4 years ago, it still makes no sense today: PCIe 6.0 Specification finalized in 2021 and 4 times faster than PCIe 4.0 Contrary to your assumptions, no-one stopped using x16 slots and implementing x16 links in PCIe 5.0 procesors/mainboards/cards - nor it will happen with PCIe 6.0 devices to be released in 2025-2026, or PCIe 7.0 devices in 5-6 years timeframe. x16 links are not going away as long as there are separate non-integrated GPUs, which still require high-speed local video memories like GDDR6/GDDR7 and HMB2/3/4 to extract maximum performance - unfortunately these are too expensive to provide in large quantities as required for wide buses, so there has to be a large enough system memory pool to complement local video memory. I guess anyone should be able to appreciate this simple fact now that mid-range video cards have firmly moved into the US$600-1000 price tier. This would also need high enough PCIe bandwith (and cache-coherency protocols like CXL 3.0) for optimal interoperation between CPU and GPU memory pools. There are now DDR5-8000 (PC5-64000) dual-channel memory kits that offer 120 Gbyte/s of real-world bandwidth, just as it was expected 4 years ago - but unfortunately matching PCIe 6.0 x16 links are still a few years away...
More lanes just increases the overall cost of the system, and for what?
If you don't need it, you don't have to pay for it, there are low-end APUs and motherboard/chipsets that have offer x8/x4 links in the physical x16 slots.
https://forums.guru3d.com/data/avatars/m/268/268248.jpg
DmitryKo:

This made no sense 4 years ago, it still makes no sense today: PCIe 6.0 Specification finalized in 2021 and 4 times faster than PCIe 4.0 Contrary to your assumptions, no-one stopped using x16 slots and implementing x16 links in PCIe 5.0 procesors/mainboards/cards - nor it will happen with PCIe 6.0 devices to be released in 2025-2026, or PCIe 7.0 devices in 5-6 years timeframe. x16 links are not going away as long as there are external GPUs, which still require high-speed local video memories like GDDR6/GDDR7 and HMB2/3/4 to extract maximum performance - unfortunately these are too expensive to provide in large quantities and wide buses, so there has to be a large enough system memory pool to complement local video memory. I guess anyone should be able to appreciate this simple fact now that mid-range video cards have firmly moved into the US$600-1000 price tier. This would also need high enough PCIe bandwith (and cache-coherency protocols like CXL 3.0) for optimal interoperation between CPU and GPU memory pools. There are now DDR5-8000 (PC5-64000) dual-channel memory kits that offer 120 Gbyte/s of real-world bandwidth, just as it was expected 4 years ago - but unfortunately matching PCIe 6.0 x16 links are still a few years away... If you don't need it, you don't have to pay for it, there are low-end APUs and motherboard/chipsets that have offer x8/x4 links in the physical x16 slots.
I do not think a raw bandwidth number can tell the whole story pcie3 x16 might be faster than pcie5 x4 because it has exactly 16 smaller lanes than 4 bigger lanes . For example as far you have bidirectional communication you have the chance for data smashing to each other causing each side to briefly stop the communication till they transmit again , 16 smaller lanes have smaller chances of that happening and when communication is halted only 1/16 of the available bandwidth is stopped . Well that would be the case with a normal network and what happen when packets smack into each other depends on the protocol that is used , I remember an essay and a simulation we had to do to find the most effective way to connect 2 distant points everyone used 1 or 2mbit lines in the simulation and I used 56kbps line with 20 channels in it (so1 mbit) and that thing smashed everything else ! I am imagining in computer communication works similarly .... But the keyword is I am imagining it might be completely wrong !
https://forums.guru3d.com/data/avatars/m/246/246171.jpg
DmitryKo:

Contrary to your assumptions, no-one stopped using x16 slots and implementing x16 links in PCIe 5.0 procesors/mainboards/cards - nor it will happen with PCIe 6.0 devices to be released in 2025-2026, or PCIe 7.0 devices in 5-6 years timeframe.
Where did I assume x16 slots were going to stop being used? I said by 5.0, x16 slots weren't necessary for consumer boards, but I didn't say they were going to go away. I still stand by that: consumer hardware just doesn't need that kind of bandwidth, and trying to maintain that signal quality is just making budget motherboards needlessly expensive.
x16 links are not going away as long as there are external GPUs, which still require high-speed local video memories like GDDR6/GDDR7 and HMB2/3/4 to extract maximum performance
Several problems in what you said there: 1. External GPUs aren't going to use any more than x8 lanes. Cutting a hole in your case to slip through an x16 riser doesn't count as an external GPU. 2. The GPU can only be fed as quickly as the slowest component. Best case scenario where 100% of a game's assets are loaded into DRAM and you have dual-channel DDR4 or DDR5 memory (the vast majority of gaming PCs do not have more than 2), you're still limited by DRAM bandwidth. Granted, DDR5 can offer enough bandwidth to keep up with PCIe 5.0 x16, but that's assuming 100% of its bandwidth can be devoted to the GPU, or more importantly: that doing so is necessary. That leads to point #3. 3. VRAM bandwidth in this context is largely irrelevant if you're feeding the GPU core from DRAM, which is going to be substantially slower than the memory configuration in just about any PCIe 4.0+ GPU equipped with GDDR6 or better. What matters is capacity. If you can store most of the relevant assets in VRAM, you hardly need any PCIe bandwidth. A properly-equipped GPU isn't going to demand much more than a few PCIe lanes most of the time. If you're starting up the game for the first time, you're going to be bottlenecked by storage. So no matter how you look at it, there is no realistic application where you're going to need PCIe 5.0 x16 for a home GPU. 4. We're still seeing consumer-grade GPUs on gen 4.0. Considering backward compatibility and modern motherboard compatibility with 5.0, I don't think it's a mystery why this is the case. Circling back to the maximum of x8 lanes for an external GPU, iGPUs don't use any more than x8 lanes and they feed solely from DRAM. Guess what: they're still bottlenecked by DDR5, and these GPUs have small requirements compared to what you'll find on a PCIe 4.0 dGPU, let alone 5.0.
This would also need high enough PCIe bandwith (and cache-coherency protocols like CXL 3.0) for optimal interoperation between CPU and GPU memory pools. There are now DDR5-8000 (PC5-64000) dual-channel memory kits that offer 120 Gbyte/s of real-world bandwidth, just as it was expected 4 years ago - but unfortunately matching PCIe 6.0 x16 links are still a few years away...
Now find me the real-world non-server application that will saturate the bandwidth of both that memory and PCIe lanes, let alone for any noteworthy amount of time. Even if you can, now tell me what the hardware configuration looks like where that would make sense; for example, people aren't going to spend good money on PC5-64000 to compensate for a GPU with only 8GB of VRAM. It doesn't matter if you can theoretically utilize PCIe 5.0 x16 on a consumer-grade PC if in reality, you won't even use half. Why do you think DirectStorage came to exist when DDR5 was right around the corner?
https://forums.guru3d.com/data/avatars/m/236/236974.jpg
schmidtbag:

I said by 5.0, x16 slots weren't necessary for consumer boards, but I didn't say they were going to go away
Sure, PCie 5.0 x16 links are unnecessary, but at the same time they are not going away - so maybe they are necessary? Give yourself some time to reflect about it.
you're still limited by DRAM bandwidth
If you can store most of the relevant assets in VRAM, you hardly need any PCIe bandwidth
people aren't going to spend good money on PC5-64000 to compensate for a GPU with only 8GB of VRAM.
Why do you think DirectStorage came to exist when DDR5 was right around the corner?
(Sigh). We've discussed all this in the previous thread, and the equation hasn't really changed four years later. PCIe 6.0 Specification finalized in 2021 and 4 times faster than PCIe 4.0 | Page 5 | guru3D Forums Gaming PCs still use the same multi-tiered approach, with a ratio of roughly 10:3:1 for disk storage/system memory/video memory capacity - that is, a few hundred GBytes of disk storage, up to 48 or 64 GBytes for system memory, and 16 or 24 GBytes for local video memory. Considering available bandwidth, the ratio is even worse, from 1:8:40 to 1:8:60 for disk storage/system memory/video memory: SSD storage has 16 Gbyte/s bandwidth (for sequential reads), which is still an order of magnitude slower that system DRAM, let alone VRAM; system memory goes up to 120 Gbyte/s for dual-channel DDR5-8000 kits; local VRAM is typically 640-960 GBytes/s for GDDR6 20 Gbps with 256 to 384 bit interface. Thus game developers will stream all game assets from disk into system memory, then either move some graphics assets to local video memory or let the GPU access them from the system memory. Your proposed solution is what, exactly - to hold all graphics assets in video memory, and stream these directly from disk storage, over 16 Gbyte/s PCIe 5.0 x4 link? Therefore instead of spending $200 on additional 32 Gbytes of system memory, we will have to buy expensive US$2000 video cards with 64 Gbytes of high-speed VRAM, and let motherboard and video card manufacturers save $15 by not implementing x16 physical links. Tempting, but no, thank you.
External GPUs aren't going to use any more than x8 lanes. Cutting a hole in your case to slip through an x16 riser doesn't count as an external GPU.
External to the CPU, i.e. not integrated onto the same die/package and connected with proprietary silicon links like Infinity Fabric. Not external enclosures with USB4/Thunderbolt4+ connection, that's only PCIe x4.
Venix:

pcie3 x16 might be faster than pcie5 x4 because it has exactly 16 smaller lanes than 4 bigger lanes with a normal network... when packets smack into each other depends on the protocol that is used
PCie is a direct point-to-point connection between endpoints on the same PCIe root complex, implemented by electrical switching of the physical links. Not a local network topology with broadcasting from multiple devices and frame switching over multiple hops.
https://forums.guru3d.com/data/avatars/m/246/246171.jpg
DmitryKo:

Sure, PCie 5.0 x16 links are unnecessary, but at the same time they are not going away - so maybe they are necessary? Give yourself some time to reflect about it.
They're not going away because people like you are going to gripe about losing potential in theoretical situations. It's not good marketing to have a product that seems worse when in reality, isn't really.
(Sigh). We've discussed all this in the previous thread, and the equation hasn't really changed four years later.
Right - 4 years later, PCIe 5.0 is now accessible, and things still haven't changed much.
Thus game developers will stream all game assets from disk into system memory, then either move some graphics assets to local video memory or let the GPU access them from the system memory.
Yes, I'm aware - everything you just described supports my point: Games (or most consumer-grade software) doesn't need that much bandwidth for a single slot. There is a demand for more bandwidth, but it doesn't matter how many lanes you have if you're bottlenecked by a component with 1/4 the width. You can argue "assets will be cached in system memory and therefore you have more PCIe lanes you can tap into" but assets will also be cached in VRAM, so you're still minimizing how much you need the extra bandwidth over PCIe. If you have a GPU that actually demands more than x8 5.0 lanes, either the software is crap or the GPU is woefully underspecced.
Your proposed solution is what, exactly - to hold all graphics assets in video memory, and stream these directly from disk storage, over 16 Gbyte/s PCIe 5.0 x4 link?
No...? My proposed solution is to have x8 links on the motherboard and then have GPUs with their own dedicated x4 storage. Therefore, you effectively get double the bandwidth while reducing CPU overhead, reducing DRAM consumption, reducing latency, and simplifying the motherboard traces (and therefore reduce the cost). You'd pay more for the GPU storage but if it functions like a cache, 32GB ought to be plenty. But even then, once you get 16GB of VRAM, I don't get the impression most (if any) games are bottlenecked by PCIe. That's obviously not enough to store all game assets in VRAM but there's clearly no need to do that where people can play open-world games on such GPUs without stuttering caused by PCIe bottlenecks.
External to the CPU, i.e. not integrated onto the same die/package and connected with proprietary silicon links like Infinity Fabric. Not external enclosures with USB4/Thunderbolt4+ connection, that's only PCIe x4.
What you're referring to is a discrete GPU. I'm a little surprised you weren't aware of the terminology differences.
https://forums.guru3d.com/data/avatars/m/268/268248.jpg
DmitryKo:

PCie is a direct point-to-point connection between endpoints on the same PCIe root complex, implemented by electrical switching of the physical links. Not a local network topology with broadcasting from multiple devices and frame switching over multiple hops.
Well yeah but those direct connections if they are bidirectional they have the potential to have crashes like any bidirectional communication , no ? Well at least that was my thought . I know about networks but pcie communication I know it's speed and it exists :P
https://forums.guru3d.com/data/avatars/m/236/236974.jpg
Venix:

if they are bidirectional they have the potential to have crashes like any bidirectional communication
Collision is not really about bidirectional communication - it's rather about several devices broadcasting on a shared frequency carrier in half-duplex media access mode, as it was in early Ethernet protocols; modern Gigabit Ethernet uses network switching with point-to-point connections to eliminate collisions even in half-duplex mode. PCIe is a full-duplex bus, with separate serial links in each direction that connect the endpoint (PCIe device) to a switch in the PCIe Root Complex (host CPU).
schmidtbag:

They're not going away because people like you are going to gripe about losing potential in theoretical situations. It's not good marketing to have a product that seems worse when in reality, isn't really.
I'm humbled by your assumption that I have any influence on product decisions made by Intel, AMD, and other PCI SIG members. But could they rather be motivated by, well... common sense? Like a typical scenario when you need to install your old PCIe 3.0 x16 card into a PCIe 5.0 x16 slot on your brand new motherboard? 🙄
it doesn't matter how many lanes you have if you're bottlenecked by a component with 1/4 the width.
GPUs with their own dedicated x4 storage.
(Sigh). Here we go again. Dedicated NVMe M.2 slots on the video card will not offer any gain comparing to regular NVMe M.2 slots on the motherboard, and will require a separate US$50 PCIe switch chip - when there is already a PCIe switch in the PCIe Root Complex within your CPU, which can facilitage peer-to-peer transfers between NVMe disk and GPU. Even then, NVMe is a block storage protocol that works with LBA disk sector addressing, not a byte-addressable memory interface like video memory. If you want to bypass the main CPU, the video card will have to include a dedicated host CPU with an embedded operating system to handle block I/O requests, address translation, volume management, file systems, and so on. Should you decide to include multi-channel ONFI flash memory interface constoller and flash memory chips directly on the video card, that would require a lot of additional pins and traces to go above the bandwidth offered by standard NVMe x4 disks, will still perform at just a fraction of currently available system memory bandwidh of 120 GBytes/s, and will still require an embedded host CPU and OS/firmware to manage block read/erase/write and perform trimming and garbage management - which is essentially implementing a dedicated proprietary SSD onboard of the video card, with all the associated development and production costs.
simplifying the motherboard traces (and therefore reduce the cost)
...and increasing the cost of video card by US$50 (for a separate PCIe Switch chip) up to US$200 (for a dedicated onboard SSD) - all for a $15 motherboard cost reduction.
everything you just described supports my point
No, you just keep a pretence of not understanding how it is contrary to your point.
What you're referring to is a discrete GPU. I'm a little surprised you weren't aware of the terminology differences.
Well, I thought 'discrete' has a specific meaning which is not really applicable to GPUs as VLSI graphic chips - unless you're using GPU as a colloquial term for "graphics card": (electrical engineering) Having separate electronic components, such as individual diodes, transistors and resistors, as opposed to integrated circuitry Though I guess I could have said 'discrete GPU' as in "separate; distinct; individual".
https://forums.guru3d.com/data/avatars/m/229/229509.jpg
Are we even saturating PCIe 3.0 yet...?
https://forums.guru3d.com/data/avatars/m/268/268248.jpg
DmitryKo:

Collision is not really about bidirectional communication - it's rather about several devices broadcasting on a shared frequency carrier in half-duplex media access mode, as it was in early Ethernet protocols; modern Gigabit Ethernet uses network switching with point-to-point connections to eliminate collisions even in half-duplex mode. PCIe is a full-duplex bus, with separate serial links in each direction that connect the endpoint (PCIe device) to a switch in the PCIe Root Complex (host CPU).
For the life of me I could not remember the word collision! (Not native English speaker). I see so it does have separate serial links , ok then my whole reasoning was moot ! Thanks for the explanation!
https://forums.guru3d.com/data/avatars/m/220/220188.jpg
Moh powah baby
https://forums.guru3d.com/data/avatars/m/270/270041.jpg
BLEH!:

Are we even saturating PCIe 3.0 yet...?
Yes, but barely and only on a RTX4090 and it depends on the game witcher 3 is one of the better examples where at 1440p it went from 343 fps (pcie3) to 353 (pcie4)
https://forums.guru3d.com/data/avatars/m/236/236974.jpg
BLEH!:

Are we even saturating PCIe 3.0 yet...?
What does it even mean?
Ricepudding:

witcher 3 is one of the better examples where at 1440p it went from 343 fps (pcie3) to 353 (pcie4)
Any sane developer would never try to "saturate" a local bus when it's log10(60)=1.77 orders of magnitude slower than onboard video memory - unless they want is a nice looking slide show of a game that rivals Unreal Engine 5 with Nanite and Lumen enabled. FYI, PCIe 3.0 x16 has 16 GByte/s of bandwidth - the same as recent PCIE 5.0 NVMe SSDs, 8 times slower than dual-channel DDR5-8000, and 60 times slower than 384-bit GDDR6 20Gbps:
SSD storage has 16 Gbyte/s of bandwidth (for sequential reads)...; system memory goes up to 120 Gbyte/s for dual-channel DDR5-8000 kits; local VRAM is typically 640-960 GBytes/s for GDDR6 20 Gbps with 256 to 384 bit interface.
https://forums.guru3d.com/data/avatars/m/270/270041.jpg
DmitryKo:

Any sane developer would never try to "saturate" a local bus when it's log10(60)=1.77 orders of magnitude slower than onboard video memory - unless they want is a nice looking slide show of a game that rivals Unreal Engine 5 with Nanite and Lumen enabled. :
Its all well and good saying what they should do, fact is some games do today saturate it, although by different degrees from what I could see normally a fps or 2 with a few outliers like witcher being far more
data/avatar/default/avatar05.webp
schmidtbag:

I agree; I think for consumer-grade hardware, PCIe 4.0 ought to be the last generation where x16 slots are necessary. By the time such hardware will saturate 4.0 @ x16 or 5.0 @ x8, gen 6.0 will be available. More lanes just increases the overall cost of the system, and for what? The only consumer-grade hardware that benefits from newer generations are basically video capture cards and SSDs, neither of which require more than x4 lanes. In the server market, however, I understand there is a much greater need for more bandwidth. A lot more.
I'm sure you know the PCIe bandwidth is bidirectional, so PCIe 4.0 16x provides 32GB/s of bandwidth each way, whereas PCIe 5.0 16x would provide 64GB/s each way, which really isn't that much compared to what current DDR5 offers in a dual memory lane configuration. PCIe 6.0 would be overkill on today's systems of course much less PCIe 7.0, but by the time PCIe 6 is available on consumer hardware, we should be using DDR6 memory. At any rate, faster PCIe bus speeds would enable RAM to VRAM reads and copies much faster, leading to faster data streaming between RAM and GPUs and more consistent frametimes.