Use AVX512 masked intrinsics

AVX512 provides intrinsics like _mm512_mask_add_pd, which is like _mm512_add_pd with a write mask. This can be used to efficiently filter out writes to non-fluid cells. It might also be useful to optimize things like sp.Piecewise. Would also work with SVE vectorization on future ARM processors.

Edited by Michael Kuron