Skip to content

AVX512VL and AVX10 support

Michael Kuron requested to merge avx10 into master

AVX512VL is the 256-bit version of all the AVX512F instructions. It is primarily useful on those processors that only have one AVX512 vector unit and drastically reduce their clock frequency when executing 512-bit instructions. For purposes of pystencils, this mostly means scatter/gather support (up to 30% improvements as per !241 (merged)) and no reduced clock frequencies (up to 45% improvements on Xeon Bronze 31xx/32xx, Silver 41xx/42xx, Gold 51xx/52xx). I suppose we never bothered implementing it because it offers no advantage on Xeon Gold 61xx/62xx and Platinum with their two AVX512 units, and not on newer x3xx and x4xx (or the Ice Lake/Tiger Lake/Rocket Lake desktop/laptop processors) which don't clock down anymore.

Many of Intel's future processors, however, will be using AVX10-256 instead of AVX512, which is, in a sense, half way between AVX2 and AVX512. AVX10.1-128 and AVX10.1-256 are essentially a rebranded AVX512VL that can be enabled without AVX512F. This just needs a few changed ifdefs and awareness of the CPU detection. The /proc/cpuinfo flag is just a guess, but a very likely one.

AVX10.1-512 is the same as AVX512F and enabling one always enables the other. I've adapted the ifdefs nonetheless just in case.

All information is based on https://www.phoronix.com/news/GCC-Lands-Initial-AVX10.1.

Edited by Michael Kuron

Merge request reports