Sapphire Radeon RX 7600 PULSE review
Gainward GeForce RTX 4060 Ti GHOST review
Radeon RX 7600 review
ASUS GeForce RTX 4060 Ti TUF Gaming review
MSI GeForce RTX 4060 Ti Gaming X TRIO review
GeForce RTX 4060 Ti 8GB (FE) review
Corsair 2000D RGB Airflow Mini-ITX - PC chassis review
ASUS PG27AQDM Review - 240Hz 1440p OLED monitor
MSI MAG X670E Tomahawk WiFi review
Mountain Makalu Max mouse review
Intel Core i9 10980XE processor review



In this article, we'll review the all-new 18-core Core i9 10980XE from Intel. Despite many announcements, this is the most versatile (in cores) HEDT processor for the consumer market from intel, the $979 Core i9 10980XE that sits as the flagship processor in the new Cascade lake line of processors.
Read article
Advertisement
Tagged as:
intel,
review Core i9 10980XE
« AMD Ryzen Threadripper 3970X review · Intel Core i9 10980XE processor review
· AMD Athlon 3000G review »
pages « < 21 22 23 24
Carfax
Senior Member
Posts: 3413
Senior Member
Posts: 3413
Posted on: 12/08/2019 06:36 AM
According to Spec themselves, the vast majority of the benchmark is derived from real world applications, including open source projects.
Source
Some of the tests are no doubt bandwidth sensitive, but some are also computationally intensive and don't care about bandwidth at all.
The main benefit of smaller nodes is more transistors per mm2. Ice Lake parts have more cache than Sky Lake and lots of microarchitectural enhancements.
I think up till maybe two years ago you would have been correct as to the relative paucity of the newer SIMD instructions like AVX/AVX2, but nowadays they are used in plenty of applications ranging from browsers to graphics drivers to physics engines to encoders and decoders to cutting edge games and the list goes on.
Intel has become quite aggressive at getting developers to optimize their software for these instructions, and also developing their own in house software to showcase the performance gains you can get when the software is properly optimized. A good example of this is Intel's SVT line of codecs. These codecs are exceptionally well optimized for AVX2 and AVX-512 and so they run ridiculously fast.
Also a lot of games use AVX/AVX2 for physics calculations and particle effects. NVidia's PhysX uses AVX for cloth simulation for instance. Also, BF5 definitely uses AVX/AVX2 because if you have AVX offset enabled in the UEFI Bios, your CPU will downclock while playing it.
If the code can be vectorized then yes. But not all code can be vectorized or parallelized.
The instruction sets are being utilized. Windows 10 also uses AVX/AVX2, and I'm sure Linux does as well.
Even if AMD had gotten their wish and the industry had shifted over to multithreaded programming "instantly," the FX series would still have struggled mightily against Intel because of their poor performance per watt. Intel just about slaughtered AMD in that regard. I mean, just look at the 4770K and its performance in that graph to the FX-9590, and then look at the TDP. 84w vs 220w!
Holy sh!t we actually agree on something!
Yeah, the "Cove" CPUs from Intel should be very impressive if they ever come to desktop. Personally I'm hoping they release the Willow Cove aka Tiger Lake CPUs to desktop. Those CPUs have a totally redesigned cache subsystem with WAY more cache.
Maybe it wasn't fair to say the entire suite is synthetic, because there are a few tests in it that I would agree are representative of real-world tasks, or at least potentially can be. But, chunks of it either are synthetic, or, might as well be considering how much they tampered with the application. Take the C benchmarks for example, where they don't seem to tell you what it is they're compiling and they admit to using "many of its optimization flags enabled", yet, they also don't tell you which flags. Well, those flags make a massive difference. Those flags can make the difference between a CPU being slower than it's competition to 30% faster.
According to Spec themselves, the vast majority of the benchmark is derived from real world applications, including open source projects.
Compared to Spec 2006, what's new in Spec 2017?
Total source code has increased, as shown in the graph larger version], because most benchmarks are derived from real applications (including various open source projects).
During benchmark development, SPEC spends substantial effort working to improve portability, using language standards to assist in the process. For CPU 2017, the standards referenced were C99, Fortran-2003, and C++2003.
Caution: The benchmarks do not comply perfectly with ISO/ANSI lanaguage standards, because their source code is derived from real applications. The rules allow optimizers to assume standards *only* where that does not prevent validation.
Total source code has increased, as shown in the graph larger version], because most benchmarks are derived from real applications (including various open source projects).
During benchmark development, SPEC spends substantial effort working to improve portability, using language standards to assist in the process. For CPU 2017, the standards referenced were C99, Fortran-2003, and C++2003.
Caution: The benchmarks do not comply perfectly with ISO/ANSI lanaguage standards, because their source code is derived from real applications. The rules allow optimizers to assume standards *only* where that does not prevent validation.
Source
Regardless, after comparing the specs to the 1065G7 to the 8550U, and, looking at benchmarks from other sites, I'm no longer surprised why Ice Lake is winning. The difference in memory bandwidth is very substantial, and that alone should be yielding at least a 10% improvement.
Some of the tests are no doubt bandwidth sensitive, but some are also computationally intensive and don't care about bandwidth at all.
Going to 10nm is definitely a bonus too, though, since these are mobile CPUs, the real benefit going to 10nm is sustaining boost clocks for longer durations. There's a lot of room to improve on mobile, which is why I said in another post that I think ARM has a longer way to go until it reaches a dead-end. Intel is clearly taking advantage of that extra room, as they should be.
The main benefit of smaller nodes is more transistors per mm2. Ice Lake parts have more cache than Sky Lake and lots of microarchitectural enhancements.
What's your definition of "plenty"? Because I would argue that's a bit of a stretch. Today, there are enough applications out there that use AVX to have a few real-world benchmarks showcasing its performance, but, it is still uncommon enough to not take it too seriously. I would like to stress that it is very much worth looking at.
I think up till maybe two years ago you would have been correct as to the relative paucity of the newer SIMD instructions like AVX/AVX2, but nowadays they are used in plenty of applications ranging from browsers to graphics drivers to physics engines to encoders and decoders to cutting edge games and the list goes on.
Intel has become quite aggressive at getting developers to optimize their software for these instructions, and also developing their own in house software to showcase the performance gains you can get when the software is properly optimized. A good example of this is Intel's SVT line of codecs. These codecs are exceptionally well optimized for AVX2 and AVX-512 and so they run ridiculously fast.
Also a lot of games use AVX/AVX2 for physics calculations and particle effects. NVidia's PhysX uses AVX for cloth simulation for instance. Also, BF5 definitely uses AVX/AVX2 because if you have AVX offset enabled in the UEFI Bios, your CPU will downclock while playing it.
I understand that, but the "side effect" of SIMD instructions is improved int and FP performance.
If the code can be vectorized then yes. But not all code can be vectorized or parallelized.
Adoption of such instructions taking a while is kinda the gist of my whole point - Intel can and will add instructions to further improve performance, even way beyond 18%, but it doesn't matter if it isn't used. It's taking too long.
The instruction sets are being utilized. Windows 10 also uses AVX/AVX2, and I'm sure Linux does as well.
I don't get why you're still not seeing my point. Those many years for that paradigm shift is part of my point: AMD was banking on people to write code that would run fast on their CPUs. That's stupid. The reason I brought that up is because any uarch engineer can come up with a design that's theoretically faster than everything else, but theory doesn't matter when the software people actually want to run on your product is slower. Get something designed for FX and it ran faster than Intel. But 99.9% of everything else was slower, so it's no wonder why that was such a failure.
Even if AMD had gotten their wish and the industry had shifted over to multithreaded programming "instantly," the FX series would still have struggled mightily against Intel because of their poor performance per watt. Intel just about slaughtered AMD in that regard. I mean, just look at the 4770K and its performance in that graph to the FX-9590, and then look at the TDP. 84w vs 220w!

Things will get more interesting once we get to desktop hardware, where there isn't as much room for improvement. You can't cram in more memory channels, and faster memory is already available. There won't be as much of a thermal constraint either. Desktops will show the true potential of the architecture.
Holy sh!t we actually agree on something!

schmidtbag
Senior Member
Posts: 7422
Senior Member
Posts: 7422
Posted on: 12/08/2019 05:06 PM
Yes, but my point is a lot of them are artificially prepared in a way that a large amount of people (possibly even most, but I'm not confident in that) wouldn't do, which is why I brought up the C benchmark. Y'know another word to describe something that is artificial? Synthetic.
The term "synthetic" when it comes to computer benchmarks is a bit muddy, not just because there are no "natural" tests, but also because even benchmarks officially recognized as synthetic are still telling a real story and use real code that you'll find in real applications. So really, the definition of synthetic in this context is best described as "a benchmark that is not representative of common/everyday usage". So, Spec's H.264 benchmark I would say is 100% legit, because it's a common real-world application that they didn't appear to tamper with, and they tell you what it is they're transcoding (Big Buck Bunny). I would say their C benchmark is synthetic, because they use an un-specified arrangement of optimizations on an un-specified file. The only thing they tell you about is which compiler they use.
Absolutely, but, those tests also don't show much performance gain. On other sites (like Phoronix), both CPUs yielded very similar results in a lot of tests. The 1065G7 almost always comes out on top (which is to be expected) but only memory intensive tests seemed to surge ahead.
Anandtech's H.264 benchmark showed one of the greatest improvements in Ice Lake, and that is a memory-intensive benchmark
Why are you only comparing to Sky Lake? All of the tests so far are comparing to Kaby Lake R, which has the same amount of cache, despite being on the same node as Sky Lake.
I chose my phrasing carefully. I'm well aware it has grown more common, but, not in a way that's very representative of its benefits. Take Firefox for example, which uses AVX. But, as far as I'm aware, it didn't really make a big difference.
As for games, it's very difficult to measure how much of an impact AVX has, because new APIs like Vulkan and DX12 have other performance-related enhancements that don't necessarily depend on AVX, and, GPUs are the bottleneck in a lot of cases.
Transcoders and 3D rendering is where AVX really shines (which I knew of prior to the link you provided below).
So yes, it is being used more often, but I guess what I'm getting at is there aren't enough applications out there that could use it and show its true potential, but don't.
I'm aware. All the more reason why there's some truth behind me saying "x86 is nearing a dead-end". Making tasks multi-threaded is another way to hugely improve performance, but, not everything can so easily be made multi-threaded, and not everything benefits from it either. Many tasks are likely to never see any major improvements.
Windows 10 has AVX support by a technicality: it comes with DX12, which uses AVX. There may be a few things here and there in Windows that use it too, but nothing that offers noteworthy performance gains since the OS doesn't really run any faster than Windows 7, which to my knowledge, doesn't claim to use it.
I'm quite positive none of the mainstream Linux kernel binaries are AVX-optimized (wouldn't be necessary anyway). In regards to a complete Linux distro (rather than just the kernel), I don't think most of the applications are optimized. Most of them aren't Linux-specific, either. Clear Linux is the greatest outlier, and that is more of an experiment than a real OS.
Generally, advanced instructions like AVX and SSE only yield a noticeable difference in tasks that are CPU bottlenecked, and, take longer than a couple seconds to process.
For the 9590, yes, Intel would still win in regards to performance-per-watt. But much like a car engine, CPUs have a "sweet spot" for efficiency. If you go too far above or below the "sweet spot" clock speed (or RPM), you lose efficiency. The 9590 was woefully way beyond its peak efficiency point, but the Opterons based on the same architecture weren't. Case in point: some of the 16-core Opterons operated at 115W. Double the cores yet half the wattage, on the same architecture.
Regardless, my goal here isn't to defend Bulldozer, but rather explain that as CPU makers, Intel and AMD can't depend on developers to take advantage of their approach to things. This is probably why twice Intel just focused on increasing clock speed, rather than anything else: it's the easiest way to get an all-around improvement.
According to Spec themselves, the vast majority of the benchmark is derived from real world applications, including open source projects.
Yes, but my point is a lot of them are artificially prepared in a way that a large amount of people (possibly even most, but I'm not confident in that) wouldn't do, which is why I brought up the C benchmark. Y'know another word to describe something that is artificial? Synthetic.
The term "synthetic" when it comes to computer benchmarks is a bit muddy, not just because there are no "natural" tests, but also because even benchmarks officially recognized as synthetic are still telling a real story and use real code that you'll find in real applications. So really, the definition of synthetic in this context is best described as "a benchmark that is not representative of common/everyday usage". So, Spec's H.264 benchmark I would say is 100% legit, because it's a common real-world application that they didn't appear to tamper with, and they tell you what it is they're transcoding (Big Buck Bunny). I would say their C benchmark is synthetic, because they use an un-specified arrangement of optimizations on an un-specified file. The only thing they tell you about is which compiler they use.
Some of the tests are no doubt bandwidth sensitive, but some are also computationally intensive and don't care about bandwidth at all.
Absolutely, but, those tests also don't show much performance gain. On other sites (like Phoronix), both CPUs yielded very similar results in a lot of tests. The 1065G7 almost always comes out on top (which is to be expected) but only memory intensive tests seemed to surge ahead.
Anandtech's H.264 benchmark showed one of the greatest improvements in Ice Lake, and that is a memory-intensive benchmark
The main benefit of smaller nodes is more transistors per mm2. Ice Lake parts have more cache than Sky Lake and lots of microarchitectural enhancements.
Why are you only comparing to Sky Lake? All of the tests so far are comparing to Kaby Lake R, which has the same amount of cache, despite being on the same node as Sky Lake.
I think up till maybe two years ago you would have been correct as to the relative paucity of the newer SIMD instructions like AVX/AVX2, but nowadays they are used in plenty of applications ranging from browsers to graphics drivers to physics engines to encoders and decoders to cutting edge games and the list goes on.
I chose my phrasing carefully. I'm well aware it has grown more common, but, not in a way that's very representative of its benefits. Take Firefox for example, which uses AVX. But, as far as I'm aware, it didn't really make a big difference.
As for games, it's very difficult to measure how much of an impact AVX has, because new APIs like Vulkan and DX12 have other performance-related enhancements that don't necessarily depend on AVX, and, GPUs are the bottleneck in a lot of cases.
Transcoders and 3D rendering is where AVX really shines (which I knew of prior to the link you provided below).
So yes, it is being used more often, but I guess what I'm getting at is there aren't enough applications out there that could use it and show its true potential, but don't.
If the code can be vectorized then yes. But not all code can be vectorized or parallelized.
I'm aware. All the more reason why there's some truth behind me saying "x86 is nearing a dead-end". Making tasks multi-threaded is another way to hugely improve performance, but, not everything can so easily be made multi-threaded, and not everything benefits from it either. Many tasks are likely to never see any major improvements.
The instruction sets are being utilized. Windows 10 also uses AVX/AVX2, and I'm sure Linux does as well.
Windows 10 has AVX support by a technicality: it comes with DX12, which uses AVX. There may be a few things here and there in Windows that use it too, but nothing that offers noteworthy performance gains since the OS doesn't really run any faster than Windows 7, which to my knowledge, doesn't claim to use it.
I'm quite positive none of the mainstream Linux kernel binaries are AVX-optimized (wouldn't be necessary anyway). In regards to a complete Linux distro (rather than just the kernel), I don't think most of the applications are optimized. Most of them aren't Linux-specific, either. Clear Linux is the greatest outlier, and that is more of an experiment than a real OS.
Generally, advanced instructions like AVX and SSE only yield a noticeable difference in tasks that are CPU bottlenecked, and, take longer than a couple seconds to process.
Even if AMD had gotten their wish and the industry had shifted over to multithreaded programming "instantly," the FX series would still have struggled mightily against Intel because of their poor performance per watt. Intel just about slaughtered AMD in that regard. I mean, just look at the 4770K and its performance in that graph to the FX-9590, and then look at the TDP. 84w vs 220w! 

For the 9590, yes, Intel would still win in regards to performance-per-watt. But much like a car engine, CPUs have a "sweet spot" for efficiency. If you go too far above or below the "sweet spot" clock speed (or RPM), you lose efficiency. The 9590 was woefully way beyond its peak efficiency point, but the Opterons based on the same architecture weren't. Case in point: some of the 16-core Opterons operated at 115W. Double the cores yet half the wattage, on the same architecture.
Regardless, my goal here isn't to defend Bulldozer, but rather explain that as CPU makers, Intel and AMD can't depend on developers to take advantage of their approach to things. This is probably why twice Intel just focused on increasing clock speed, rather than anything else: it's the easiest way to get an all-around improvement.
pages « < 21 22 23 24
Click here to post a comment for this article on the message forum.
Senior Member
Posts: 7422
Maybe it wasn't fair to say the entire suite is synthetic, because there are a few tests in it that I would agree are representative of real-world tasks, or at least potentially can be. But, chunks of it either are synthetic, or, might as well be considering how much they tampered with the application. Take the C benchmarks for example, where they don't seem to tell you what it is they're compiling and they admit to using "many of its optimization flags enabled", yet, they also don't tell you which flags. Well, those flags make a massive difference. Those flags can make the difference between a CPU being slower than it's competition to 30% faster.
Regardless, after comparing the specs to the 1065G7 to the 8550U, and, looking at benchmarks from other sites, I'm no longer surprised why Ice Lake is winning. The difference in memory bandwidth is very substantial, and that alone should be yielding at least a 10% improvement. Going to 10nm is definitely a bonus too, though, since these are mobile CPUs, the real benefit going to 10nm is sustaining boost clocks for longer durations. There's a lot of room to improve on mobile, which is why I said in another post that I think ARM has a longer way to go until it reaches a dead-end. Intel is clearly taking advantage of that extra room, as they should be.
What's your definition of "plenty"? Because I would argue that's a bit of a stretch. Today, there are enough applications out there that use AVX to have a few real-world benchmarks showcasing its performance, but, it is still uncommon enough to not take it too seriously. I would like to stress that it is very much worth looking at.
I understand that, but the "side effect" of SIMD instructions is improved int and FP performance.
Adoption of such instructions taking a while is kinda the gist of my whole point - Intel can and will add instructions to further improve performance, even way beyond 18%, but it doesn't matter if it isn't used. It's taking too long.
If they keep the L3 cache the same size then sure. But looking at AMD's trends, they just keep making it bigger. But because of the whole CCX and chiplet idea, this is really the only option they have.
I don't get why you're still not seeing my point. Those many years for that paradigm shift is part of my point: AMD was banking on people to write code that would run fast on their CPUs. That's stupid. The reason I brought that up is because any uarch engineer can come up with a design that's theoretically faster than everything else, but theory doesn't matter when the software people actually want to run on your product is slower. Get something designed for FX and it ran faster than Intel. But 99.9% of everything else was slower, so it's no wonder why that was such a failure.
Anyway, when comparing results on 3 other websites that use tests I find to be less suspicious, the 1065G7 is consistently 10-15% faster than the 8550U, which I find it totally reasonable given the improvements made.
Things will get more interesting once we get to desktop hardware, where there isn't as much room for improvement. You can't cram in more memory channels, and faster memory is already available. There won't be as much of a thermal constraint either. Desktops will show the true potential of the architecture.