Properly detect and enable vectorization on ARM
!320 (merged) has the side effect of breaking detection of SVE vectorization support and enablement of SVE in the compiler. My patch should properly fix the underlying problem.
py-cpuinfo is supported on ARM64 and can be used to detect Neon and SVE. However, there was indeed a bug here -- Neon is identified as asimd
in /proc/cpuinfo, so we should check for asimd
instead of neon
.
While -march=native
was not supported by Clang before 15, -mcpu=native
is supported by GCC 6+ and Clang 7+. Let's use that instead of not adding a flag at all -- otherwise SVE support is not enabled in the compiler even if the hardware supports it.