x86 half precision (float16) support

This issue describes the process of adding experimental support for half precision (float16) on x86.

A nice description of what can be expected: https://clang.llvm.org/docs/LanguageExtensions.html#half-precision-floating-point

Some compilers provide support for IEEE 754-2008 half precision (binary16) floating point formats on x86. Most current systems do not implement the necessary arithmetic instructions to work directly on float16, but instead promote to float32 (single precision) before computations and quantize to float16 afterwards. However, storage and interchange is performed in the 16 bit format, thus memory-bound code could benefit, if half precision is sufficient numerically.

Commit 238a069c adds experimental support for clang >= version 15. New type aliases half and float16 are defined in DataTypes.h.

Some caveats / remaining issues:

float16 variables have to be cast to float or double before formatting (i.e. when writing to stdout)
almost no walberla feature has been tested with float16 yet

Other compilers might also support such features, but have not been evaluated yet.

A small app that checks float16 support has been implemented with CheckFP16.cpp.

Also, a new CMake option WALBERLA_BUILD_WITH_HALF_PRECISION_SUPPORT has been added. This does not set real_t to float16 but simply enables float16, and the corresponding type aliases. Note that usually some intrinsics have to be enabled on x86, so WALBERLA_OPTIMIZE_FOR_LOCALHOST generally has to be enabled. Otherwise you will likely encounter linker errors.

First benchmarks with LIKWID show that autovectorization using single precision AVX intrinsics seems to work nicely. Run likwid-perfctr on CheckFP16.cpp to see if that also works on your machine.