Disallow OpenMP + blocking + cacheline-zero
The loop over the blocks is OpenMP-collapsed, so blocks might be worked on simultaneously. If the innermost block size does not align with a cache line and non-temporal stores are enabled on architectures that only do cacheline-zeroing (!230 (merged)), threads would then erase each others' data. So we disallow the problematic combination.