Numeric instability
In !665 (merged) I introduced some very small changes.
As a result, the P2P1StokesElementwiseAnnulusBlendingTest
fails but only when compiled with the icx compiler in single precision (e.g. https://i10git.cs.fau.de/hyteg/hyteg/-/jobs/1139667).
Nevertheless, once it succeeded with the same code and configuration (https://i10git.cs.fau.de/hyteg/hyteg/-/jobs/1139562).
Other compilers do not seem to be affected by the changes at all.
After playing around with this compiler on one of our workstations, I have the suspicion that numerical instability is at play here.
The test solves the stokes system on a 2D annulus with GMG (Uzawa smoother) and also the HyTeG MinRes solver. After solving with MinRes, the discrete L2 error in the pressure is reduced to 3.6e-2 in both single and double precision. Let us assume that this is the discretization error. The GMG reduces the error to almost the same number. Only when compiling with icx and using single precision the error (GMG) is not reduced below 1.1e-1.
I can reproduce these exact numbers on one of our workstations.
The ci uses the compile flag -fp-model=precise
to force the compiler to not reorder floating point operations (WALBERLA_BUILD_WITH_FASTMATH=OFF
).
If I change the compiler options to either of -O3 -xHost -fp-model=precise
or -O3
, the test succeeds.
With -O3 -fp-model=precise
the test fails.
I think that either of these combinations of options allows the compiler to insert FMA instructions.
As I learnt today, those are not only faster but also more accurate because the result is only rounded once, after performing both FP operations.
Without the explicit -fp-model=precise
, the compiler uses
[...] more aggressive optimizations when implementing floating-point calculations. These optimizations increase speed, but may affect the accuracy or reproducibility of floating-point computations.
Although I am not sure that this is actually the explanation, I am very concerned that convergence is destroyed by setting -fp-model=precise
.
To me this suggests that our implementation is numerically unstable.