Non-temporal stores do not use fences
When vectorization is enabled, instructions like _mm(|256|512)_stream_p[sd]
are generated. However, the corresponding fence _mm_mfence
is never generated. This is not a problem in practice as enough time will have passed by the time the data is next read. However, an explicit fence should be added to guarantee safety.