• Martin Bauer's avatar
    New GPU communication scheme with GPU kernels for packing · 319909f0
    Martin Bauer authored
       - uses generated pack infos for packing & unpacking directly on GPU
       - can directly send GPU buffers if cuda-enabled MPI is available,
         otherwise the packed buffers are transfered to CPU first
       - communication hiding with cuda streams: communication can be run
         asynchronously - especially useful when compute kernel is also
         split up into inner and outer part
    - added RAII classes for CUDA streams and events
    - equivalence test that checks if generated CPU and GPU (overlapped)
      versions are computing same result as normal waLBerla LBM kernel