Unification GPU/Device Usage

The GPU/CUDA backend of waLBerla is very thin at the moment. In order to integrate the device using the following list of suggestions/observations should function as a guideline.

Like MPI device functions should get a wrapper around with an "environment" as done with WALBERLA_MPI_SECTION. Doing so users would no longer need to work with if defined as for example done here: https://i10git.cs.fau.de/walberla/walberla/-/blob/master/apps/benchmarks/FlowAroundSphereCodeGen/FlowAroundSphereCodeGen.cpp
The GPUField: https://i10git.cs.fau.de/walberla/walberla/-/blob/master/src/cuda/GPUField.h works differently in the sense that the interface is different (f-Size not a template parameter) but also it is very cumbersome to have explicit GPU data with its explicit addToStorage functions. It would be simpler if the Field itself could synchronise its data and give back a device of host pointer depending on the situations needed. This goes allong #109 which suggests using midspan for the Field data structure. The mdspan has the functionality right away.
Some device-specific implementations do not need to be specific. A good example is the communication scheme: https://i10git.cs.fau.de/walberla/walberla/-/blob/master/src/cuda/communication/UniformGPUScheme.h In the end the same MPI function is called just with a device pointer and not a host pointer (or even also with a device pointer if GPU direct is not available). Thus there is no need for this parallel world to exist here.

Another good example is the https://i10git.cs.fau.de/walberla/walberla/-/blob/master/src/cuda/communication/CustomMemoryBuffer.h which works basically completely similar to the normal device buffers with a slightly different API.