Non-temporal stores do not use fences
When vectorization is enabled, instructions like
_mm(|256|512)_stream_p[sd] are generated. However, the corresponding fence
_mm_mfence is never generated. This is not a problem in practice as enough time will have passed by the time the data is next read. However, an explicit fence should be added to guarantee safety.