maskStore improvements

- fix the aligned version
- make sure the test case is incommensurate with the vector width
- implement a fallback for instruction sets that don't support it natively
9 jobs for vec_tests in 8 minutes and 54 seconds (queued for 3 seconds)