Little follow-up to !233 (merged) after I thought about it again.
- fix the aligned version (it was using
maskStorein some instruction sets and
- make sure the test case is incommensurate with the vector width (previously it couldn't distinguish
storeMaskon 128-bit vector instruction sets)
- implement a fallback for instruction sets that don't support it natively (turns out this is really easy using a load-blend-store combination)