Bug in Vectorization with GCC < 8.1.0 and Intel C++ Compiler
When compiled with -Ofast -march=native
(default in lbmpy and pystencils), the following channel flow test case is subject to numerical instabilities in the PDF field.
This was observed with various GCC compiler versions below GCC 8.1.0 and with the Intel C++ compiler versions 17 to 19 (20 was not tested). It could not be observed with the LLVM compiler versions 7 to 10.
The bug is supposedly present in any version of lbmpy and pystencils, as it can be reproduced with lbmpy and pystencils versions 0.2.9, 0.2.13 and 0.2.14.
The minimal required compile flags to get the instabilities on GCC are -O3 -fno-signed-zeros -fno-trapping-math -fassociative-math -mavx
.
The GCC commit that fixed this issue was identified to be 7c080ad. There, the cost estimation for vectorization changes, such that the auto-vectorization behavior differs. We tried -fvect-cost-model=unlimited
to force vectorization irrespective of costs, but that did not make a difference. Since we do not know which commit fixes the actual bug, it is possible that it is present in gcc 8 or even the current gcc 10 -- only the specific code sample below no longer runs into it because it does not get auto-vectorized anymore.
from lbmpy.session import *
from lbmpy.moments import *
ch = create_channel(domain_size=(300, 100), force=5e-5, initial_velocity=(0.5, 0),
relaxation_rate=1.8)
ch.run(6900)
print(ch.velocity[0.5, :, 0])
While the flow field initially seems to be stable, it becomes unstable between time steps 6800 and 6900 and ends up with NaNs.