Add SVE nontemporal stores and scatters, including masked variants
@holzer pointed out that SVE has nontemporal stores, which I overlooked when I implemented SVE support three years ago. So there are actually eight different kinds of stores we have to support (store, stream, mask-store, mask-stream, scatter, stream-scatter, mask-scatter, mask-stream-scatter).
(Mask-)stream-scatter requires SVE2; adding support for that introduced changes to a number of unrelated files. SVE2 is a superset of SVE, so the automatic detection won‘t return SVE if SVE2 is also available.
The added test coverage revealed a few bugs on master that I also fixed:
- We used an incorrect argument order for
maskStoreS
on RISC-V-V. This meant thatmaskStoreS
just wouldn't compile on RISC-V-V. - Our emulation of
maskStore
(viablendv
) on ARM Neon and POWER VSX was zeroing the masked-out elements when nontemporal mode was selected (i.e. which due to the lack of real nontemporal stores maps to an emulation via cacheline zeroing). This lead to incorrect results. - Our emulation of
maskStore
on POWER VSX was ignoring the mask for the last vector of each cacheline. This lead to even more incorrect results when nontemporal mode was selected.
Edited by Michael Kuron