Skip to content

Add SVE nontemporal stores and scatters, including masked variants

Michael Kuron requested to merge sve into master

@holzer pointed out that SVE has nontemporal stores, which I overlooked when I implemented SVE support three years ago. So there are actually eight different kinds of stores we have to support (store, stream, mask-store, mask-stream, scatter, stream-scatter, mask-scatter, mask-stream-scatter).

(Mask-)stream-scatter requires SVE2; adding support for that introduced changes to a number of unrelated files. SVE2 is a superset of SVE, so the automatic detection won‘t return SVE if SVE2 is also available.

The added test coverage revealed a few bugs on master that I also fixed:

  • We used an incorrect argument order for maskStoreS on RISC-V-V. This meant that maskStoreS just wouldn't compile on RISC-V-V.
  • Our emulation of maskStore (via blendv) on ARM Neon and POWER VSX was zeroing the masked-out elements when nontemporal mode was selected (i.e. which due to the lack of real nontemporal stores maps to an emulation via cacheline zeroing). This lead to incorrect results.
  • Our emulation of maskStore on POWER VSX was ignoring the mask for the last vector of each cacheline. This lead to even more incorrect results when nontemporal mode was selected.
Edited by Michael Kuron

Merge request reports