Skip to content

AVX512VL and AVX10 support

Michael Kuron requested to merge avx10 into master

AVX512VL is the 256-bit version of all the AVX512F instructions. It is primarily useful on those processors that only have one AVX512 vector unit and drastically reduce their clock frequency when executing 512-bit instructions. For purposes of pystencils, this mostly means scatter/gather support (up to 30% improvements as per !241 (merged)) and no reduced clock frequencies (up to 45% improvements on Xeon Bronze 31xx/32xx, Silver 41xx/42xx, Gold 51xx/52xx). I suppose we never bothered implementing it because it offers no advantage on Xeon Gold 61xx/62xx and Platinum with their two AVX512 units, and not on newer x3xx and x4xx (or the Ice Lake/Tiger Lake/Rocket Lake desktop/laptop processors) which don't clock down anymore.

Many of Intel's future processors, however, will be using AVX10-256 instead of AVX512, which is, in a sense, half way between AVX2 and AVX512. AVX10.1-128 and AVX10.1-256 are essentially a rebranded AVX512VL that can be enabled without AVX512F. This just needs a few changed ifdefs and awareness of the CPU detection. The /proc/cpuinfo flag is just a guess, but a very likely one.

AVX10.1-512 is the same as AVX512F and enabling one always enables the other. I've adapted the ifdefs nonetheless just in case.

All information is based on

Edited by Michael Kuron

Merge request reports