AVX-512 Is an Intel Gimmick To Win Benchmarks and should die a painful death

Gomez Addams

2020-07-13 19:44

mbk1969:

You will have a hard time convincing me that OS kernel needs AVX (at all). And outside the kernel any apps can call any (non privileged) instructions - not a concern for OS developers. PS And if OS kernel depends on AVX and OS developers have troubles with it, then it is a sign of bad design/architecture.

The OS kernel does not have to deal with the instruction set per se but it has to deal with its registers on the CPU. One of the primary roles of the OS kernel is scheduling threads and processes. When there is a context switch between threads or processes the OS has to save the current context. If a given set of instructions is to be fully supported then its registers have to be saved and restored during context switches. This is why new processors with new registers and/or instructions sets need additional support from the OS. From a developer's perspective, which I am, these kinds of things are usually non-issues because most of us target the path of widest compatibility. I know I do, generally. Since AVX-512 is available on only a subset of available CPUs I would not target it. IF we were target specific machines for reasons of performance then we might consider using AVX (of any flavor). Right now we have to maintain compatibility with the laptops we use so we do not consider AVX or anything like it. In an odd twist to that, I work with CUDA and that is definitely a targeted, high performance, special case. We deal with its compatibility issue by giving everyone gaming laptops and desktop machines with Nvidia graphics cards. If we were to also target AVX-512 then we would require machines have CPUs that support it and I don't see the performance benefit to be worth the PITA in hardware and development time. Requiring Nvidia GPUs is pretty easy to work around because they are easily available at every platform level. I would be more than happy to use AMD GPUs if they would support CUDA. Hopefully, they will one day - directly. That kind of competition would really help lower the price of Nvidia's high-end GPU cards. They are in the 9K range now for a V100 and Amperes will probably be over 11K. If AMD can be competitive there it would really help bring those prices down and I really hope they will. As an example of what I mean, I use an MSI GL75 for remote work and it has a 2070 so CUDA works pretty well on it. It has a 10750H CPU that supports AVX2 but not AVX-512 so I would be SOL trying to use it.

#5808507

mbk1969

2020-07-13 19:45

schmidtbag:

I didn't say apps... We're talking about instructions at the kernel level. The kernel has the ability to disable instructions if necessary.

How? There are CPU instructions called privileged - these instructions can be executed only from the ring 0 (https://en.wikipedia.org/wiki/Protection_ring). Kernel code being executed in ring 0 can call any instructions. User code being executed not in ring 0 can execute only unprivileged instructions. That`s to my understanding. Do you imply that kernel can switch the access to CPU instructions on the fly?

schmidtbag:

Right, hence my point earlier: he doesn't rant about things he isn't involved in. Why would he, a kernel developer, spend so much time complaining about something he ostensibly doesn't work with? Every time he's had a complaint, it's because the subject of his rant interfered with his workflow. AVX-512 does more than FP ops. Regardless, this here is one example of its involvement in the kernel: https://www.phoronix.com/scan.php?page=news_item&px=MTI1Njc

If you make a cryptography a part of the OS kernel - that`s a bad design in my eyes. But of course, I am not Torvalds. Also it can be that the term "OS kernel" is overused and even misused in the link you provided. Because OS kernel - is the core of OS, and cryptography doesn`t look like a part of OS core: part of OS, but not part of the kernel.

#5808508

mbk1969

2020-07-13 19:51

Gomez Addams:

The OS kernel does not have to deal with the instruction set per se but it has to deal with its registers on the CPU. One of the primary roles of the OS kernel is scheduling threads and processes. When there is a context switch between threads or processes the OS has to save the current context. If a given set of instructions is to be fully supported then its registers have to be saved and restored during context switches. This is why new processors with new registers and/or instructions sets need additional support from the OS.

So if AVX-512 introduced new registers needed to be saved due to context switching, then in Windows build which is unaware of these registers apps which use AVX-512 will not work or even crush OS kernel? I mean has AVX-512 introduced such registers?

#5808509

TieSKey

2020-07-13 19:59

mbk1969:

@TieSKey And where in your analogy stands Linus Torvalds? How all you wrote does relate to Linux (and any other OS)?

I'm not a kernel developer but to add to what others already said, a kernel needs to support the hardware, CPU drivers are built in and closely integrated to an OS kernel. Additional registries to deal with in contexts switches and security checks, efficient cache and virtual memory management for the needs of those instructions, suspend/restart states and thread scheduling. My guess is that last item is a bitch since using avx-512 makes the cpu core downclock a lot, affecting it's 2 threads (and maybe even more parts of the cpu) so, as an OS, u don't want to schedule a software thread to a cpu hyper/virtual thread whose sibling (the other virtual thread of the same physical core) is running avx-512 instructions.

#5808510

bobblunderton

2020-07-13 20:01

Beamng.drive uses all floating point numbers to run the entire game, from the constant 2000hz physics on each vehicle (which is hundreds or thousands of different points or nodes/beams), to the rendering engine and even object placement. AVX can save time IF the processor you're using has it. Floating point has been used for games ever since Doom, heck even Tank Wars might use it - though I'm not 100% sure (was a free 'worms' / scorched earth clone). Yes I still play Tank Wars (and Doom) is Dos on at-least a monthly basis. Thus far VERY FEW processors except some XEON workstation and server processors and HEDT (79xx ~ 10xxx X and XE series on x299) have it. So if you don't have market penetration to a significant degree - because intel artificially segmented the market due to GREED, never mind the excess heat it generates (you almost NEED liquid cooling to really use it for extended periods), no one will use it if there's next to no market penetration for it / almost no one has it. So while AVX2 is a god-send if you use a lot of floating point operations and can use AVX to accelerate it further, if there's only 1~3% (max!) market penetration (of AVX-512) and even less than that with the ability to cool the chip properly while using it for extended computation/runs, there's never going to be a use for it. That'd be like owning a flying car but never being allowed to get it off the ground legally. It's not like MMX, SSE, or 3DNOW! extensions which actually helped because they started putting them on everything after a certain date. Looked great on paper, but until it goes mainstream, it's unlikely anyone will write code to feed it properly/efficiently. If intel doesn't see to include it on mainstream chips, it's never going to go anywhere - same with AMD. Look how many years it took quad cores to get to be the sweet spot for gaming!

#5808512

TieSKey

2020-07-13 20:04

mbk1969:

So if AVX-512 introduced new registers needed to be saved due to context switching, then in Windows build which is unaware of these registers apps which use AVX-512 will not work or even crush OS kernel? I mean has AVX-512 introduced such registers?

AFAIK u can tell a CPU to save all it's context (registries) starting at the given memory address. But if u want to do it right, u will want to know exactly how much space a context switch takes in advance, probably even how many cpu cycles, adding more instructions with new registers (iirc avx-512 does need new registers, some of the others lower variants just convine 2 existing ones) means more cases u have to know about and handle if u don't want the OS hurting performance.

#5808515

mbk1969

2020-07-13 20:15

I found the answer:

AVX-512 instruction are encoded with the new EVEX prefix. It allows 4 operands, 7 new 64-bit opmask registers, scalar memory mode with automatic broadcast, explicit rounding control, and compressed displacement memory addressing mode. The width of the register file is increased to 512 bits and total register count increased to 32 (registers ZMM0-ZMM31) in x86-64 mode.

And before that:

AVX uses sixteen YMM registers to perform a Single Instruction on Multiple pieces of Data (see SIMD). Each YMM register can hold and do simultaneous operations (math) on: eight 32-bit single-precision floating point numbers or four 64-bit double-precision floating point numbers. The width of the SIMD registers is increased from 128 bits to 256 bits, and renamed from XMM0–XMM7 to YMM0–YMM7 (in x86-64 mode, from XMM0–XMM15 to YMM0–YMM15). The legacy SSE instructions can be still utilized via the VEX prefix to operate on the lower 128 bits of the YMM registers.

But still - that`s one place in the kernel scheduler. I mean it should not spread across whole OS kernel source code. PS Context saving code should be in HAL (behind HAL?), most probably...

#5808518

schmidtbag

2020-07-13 20:23

mbk1969:

There are CPU instructions called privileged - these instructions can be executed only from the ring 0 (https://en.wikipedia.org/wiki/Protection_ring). Kernel code being executed in ring 0 can call any instructions. User code being executed not in ring 0 can execute only unprivileged instructions. That`s to my understanding. Do you imply that kernel can switch the access to CPU instructions on the fly?

Not sure - this is a bit beyond my scope of knowledge. All I know is there are people using AVX* software on an AVX-compatible CPU and the instruction wasn't working or being detected. Some distros might restrict userland access to instructions; it definitely isn't a default behavior to do so.

If you make a cryptography a part of the OS kernel - that`s a bad design in my eyes. But of course, I am not Torvalds. Also it can be that the term "OS kernel" is overused and even misused in the link you provided. Because OS kernel - is the core of OS, and cryptography doesn`t look like a part of OS core: part of OS, but not part of the kernel.

How is it bad design when the use of it is optional? If you don't like it, you can remove it or just simply not use it. If you don't have AVX, the cryptography will still work, just slower. Linux isn't an OS and its kernel isn't like Windows - it's monolithic, so the kernel is responsible for more than the Windows kernel. Storage is controlled at the kernel level, drivers are controlled at the kernel level, and the drivers can determine of cryptology is used. Therefore, it makes sense for it to run at the kernel level. In the Linux world, FUSE is often frowned upon.

#5808521

Carfax

2020-07-13 20:57

JamesSneed:

He does have a point. AVX is such a fringe thing especially in 256 and 512 usage yet it gets into many reviews. Of course as long as AMD keeps innovating silly things Intel does will disappear as they have actual competition.

Are you serious? AVX/AVX2 are definitely not fringes. Lots of games and physics engines use AVX, and lots of encoding/decoding/transcoding software use AVX2 to great effect. When the next gen consoles launch, AVX2 adoption will be stratospheric since they are both using the Zen 2 core. AVX-512 on the other hand is geared towards HPC for the most part so its use in consumer applications is much rarer. That could change in the future though as Intel is bringing AVX-512 support to mainstream parts.

#5808522

Mineria

2020-07-13 20:58

https://en.wikipedia.org/wiki/AVX-512 Since Linux also is used for latency sensitive HPC workloads, Linus has no choice but to add support for it, he is probably just venting since there is quite an amount of extensions he had to cover.

#5808526

Mineria

2020-07-13 21:04

Carfax:

Are you serious? AVX/AVX2 are definitely not fringes. Lots of games and physics engines use AVX, and lots of encoding/decoding/transcoding software use AVX2 to great effect. When the next gen consoles launch, AVX2 adoption will be stratospheric since they are both using the Zen 2 core.

Exactly, some of the Assassins Creed games utilize AVX2 as example, they had to patch allowing older CPU's to run them though. For loads of small sets and where reduction in latency matters, AVX2 it is the way to go.

angelgraves13:

It's useless for now, and maybe the next decade. By then...x86 will likely be dead and we'll be on ARM.

It's not useless for the super computers where it is requested, why do you think Linus even bothers to support it to begin with?

#5808541

Gomez Addams

2020-07-13 22:20

mbk1969:

So if AVX-512 introduced new registers needed to be saved due to context switching, then in Windows build which is unaware of these registers apps which use AVX-512 will not work or even crush OS kernel? I mean has AVX-512 introduced such registers?

What would happen is the AVX512 registers used by the program that was switched out could be overwritten so when the first program was switched back in as the current context its data would be garbled, giving erroneous results and possibly causing a crash. Actually, that is why registers are saved and restored during a context switch : so they are not overwritten and restored to where they were when the thread was switched out of context.

TieSKey:

AFAIK u can tell a CPU to save all it's context (registries) starting at the given memory address. But if u want to do it right, u will want to know exactly how much space a context switch takes in advance, probably even how many cpu cycles, adding more instructions with new registers (iirc avx-512 does need new registers, some of the others lower variants just convine 2 existing ones) means more cases u have to know about and handle if u don't want the OS hurting performance.

The way this works is a context is defined by the state of the registers in the CPU. The registers are saved on the stack in a structure known as a context. Generally, a thread has a smaller context than a process does. You can see what is in these structures by looking at a header file in the Windows SDK : WinNT.h. They are defined for a whole bunch of different processor architectures. In fact, all of the ones supported by Windows. It is quite fascinating if you are into that kind of stuff. Each process and thread has its own its own stack and the context is saved (pushed) on the thread/process' own stack. When a thread is made active the kernel first pops its context off the stack and into the CPU registers and then the CPU executes the instruction whose address is in the instruction pointer register that was just restored. This is the general context switching mechanism used by most CPUs in multitasking operating systems. If the OS does not support multiple tasks or threads then it will never switch contexts. It might deal with interrupt handlers but they don't switch contexts - those usually act like a function call that pushes only the registers it touches on the stack. At least, the ones I have dealt with did.

#5808543

Gomez Addams

2020-07-13 22:25

Carfax:

Are you serious? AVX/AVX2 are definitely not fringes. Lots of games and physics engines use AVX, and lots of encoding/decoding/transcoding software use AVX2 to great effect. When the next gen consoles launch, AVX2 adoption will be stratospheric since they are both using the Zen 2 core. AVX-512 on the other hand is geared towards HPC for the most part so its use in consumer applications is much rarer. That could change in the future though as Intel is bringing AVX-512 support to mainstream parts.

I tend to agree with you. AVX2 is rather widespread now. As I wrote, my laptops CPUs support it and have for several years now. AVX-512 support is coming much slower, especially from AMD.

#5808551

Carfax

2020-07-13 22:38

Gomez Addams:

I tend to agree with you. AVX2 is rather widespread now. As I wrote, my laptops CPUs support it and have for several years now. AVX-512 support is coming much slower, especially from AMD.

Besides rendering and physics programs/engines, encoders/decoders, what types of software use AVX2? Probably compression and decompression apps as well if I had to guess. As for AMD and AVX-512, I would assume they will support it with Zen 4. AMD tend to be more cautious when it comes to that sort of thing as it's a big investment in die space and power usage. Zen 4 will be using 5nm, which should be much better for an AVX-512 implementation.

#5808553

Carfax

2020-07-13 22:51

Mineria:

Exactly, some of the Assassins Creed games utilize AVX2 as example, they had to patch allowing older CPU's to run them though. For loads of small sets and where reduction in latency matters, AVX2 it is the way to go.

I don't think any games are using AVX2 right now, but many are using AVX for physics (especially cloth simulation) and other things. AVX2 is going to see a huge increase in adoption with the next gen, and even cross gen titles like Cyberpunk 2077, AC Valhalla etcetera. From what I understand, developers were reluctant to really make strong use of AVX because the Jaguar core in the PS4 and Xbox One ran AVX at half speed, ie 2x128 bit rather than 1x256 bit instruction. So many of them ended up by targeting SSE4 instead of AVX. I might be wrong so someone can fact check me on that, but I remember reading it somewhere on another forum. Luckily, Zen 2 runs AVX2 at full speed so developers will have a lot of incentive to use it.

#5808555

mbk1969

2020-07-13 22:57

schmidtbag:

Linux isn't an OS

What do you mean by that? I always thought that Linux is OS.

#5808556

Noisiv

2020-07-13 22:59

mbk1969:

What do you mean by that? I always thought that Linux is OS.

a kernel. GNU Linux is OS

#5808560

JamesSneed

2020-07-13 23:06

Carfax:

Are you serious? AVX/AVX2 are definitely not fringes. Lots of games and physics engines use AVX, and lots of encoding/decoding/transcoding software use AVX2 to great effect. When the next gen consoles launch, AVX2 adoption will be stratospheric since they are both using the Zen 2 core. AVX-512 on the other hand is geared towards HPC for the most part so its use in consumer applications is much rarer. That could change in the future though as Intel is bringing AVX-512 support to mainstream parts.

I never said AVX. AVX 2 is where we picked up 256-bit support and AVX-512 is 512-bit support. Anyhow those have been pretty fringe use cases to current date, especially AVX-512. You are right AVX2 should catch on more especially with Zen2 and Zen3 but right now the list of software is really short. Once Intel can get on a new node that should help a lot as well since Intel chips use a ton of power doing mixed workloads with AVX instructions in the mix so there is a pretty large cost offset.

#5808562

Carfax

2020-07-13 23:12

JamesSneed:

I never said AVX. AVX 2 is where we picked up 256-bit support and AVX-512 is 512-bit support. Anyhow those have been pretty fringe use cases to current date, especially AVX-512. You are right AVX2 should catch on more especially with Zen2 and Zen3 but right now the list of software is really short. Once Intel can get on a new node that should help a lot as well since Intel chips use a ton of power doing mixed workloads with AVX instructions in the mix so there is a pretty large cost offset.

AVX has 256 bit support, but it's restricted to floating point only. AVX2 added 256 bit support for integers, and also added FMA for double throughput. Source

#5808567

mbk1969

2020-07-13 23:21

Noisiv:

a kernel. GNU Linux is OS

So all people in the world always say/write "GNU Linux" never shrinking it to "Linux"? And when I read "Linux" people always talk about kernel of "GNU Linux"?