Unification GPU/Device Usage
The GPU/CUDA backend of waLBerla is very thin at the moment. In order to integrate the device using the following list of suggestions/observations should function as a guideline.
-
Like
MPI
device functions should get a wrapper around with an "environment" as done withWALBERLA_MPI_SECTION
. Doing so users would no longer need to work withif defined
as for example done here: https://i10git.cs.fau.de/walberla/walberla/-/blob/master/apps/benchmarks/FlowAroundSphereCodeGen/FlowAroundSphereCodeGen.cpp -
The GPUField: https://i10git.cs.fau.de/walberla/walberla/-/blob/master/src/cuda/GPUField.h works differently in the sense that the interface is different (f-Size not a template parameter) but also it is very cumbersome to have explicit GPU data with its explicit
addToStorage
functions. It would be simpler if theField
itself could synchronise its data and give back a device of host pointer depending on the situations needed. This goes allong #109 which suggests using midspan for the Field data structure. Themdspan
has the functionality right away. -
Some device-specific implementations do not need to be specific. A good example is the communication scheme: https://i10git.cs.fau.de/walberla/walberla/-/blob/master/src/cuda/communication/UniformGPUScheme.h In the end the same
MPI
function is called just with a device pointer and not a host pointer (or even also with a device pointer if GPU direct is not available). Thus there is no need for this parallel world to exist here.
Another good example is the https://i10git.cs.fau.de/walberla/walberla/-/blob/master/src/cuda/communication/CustomMemoryBuffer.h which works basically completely similar to the normal device buffers with a slightly different API.