Fluctuating MRT collision kernel codegen breaks for single-precision AVX

Code generation for a fluctuating MRT collision kernel with AVX works with double-precision accuracy but breaks with single-precision accuracy.

MWE using config.json:

cd tests/lbm/codegen
python3 FluctuatingMRT.py "$(cat config.json)"

Output:

Traceback (most recent call last):
  File "FluctuatingMRT.py", line 58, in <module>
  ...
  File "/work/jgrad/walberla_deps/devel/pystencils/pystencils/backends/cbackend.py", line 663, in _print_CastFunc
    raise NotImplementedError('Vectorizer cannot cast between different datatypes')
NotImplementedError: Vectorizer cannot cast between different datatypes

The expression that fails is (xi_104*(CastFunc(-1, __m256)*(CastFunc(-1, __m256)*CastFunc(omega_even, __m256) + CastFunc(1.0, __m256))**2 + CastFunc(1.0, __m256)))**CastFunc(0.5, float). Code generation is successful when WALBERLA_DOUBLE_ACCURACY=ON in the config file. The workstation has an AMD Ryzen Threadripper 1950X chip.

Reproducible on the development of waLBerla and both the devel branch and 1.0 release of pystencils/lbmpy using sympy 1.8.