• João Victor Tozatti Risso's avatar
    GPUPackInfo: add asynchronous (un)packing capabilities · 6bfe8c59
    João Victor Tozatti Risso authored
    Changes introduced in this commit are the following:
    - CUDA streams: Add support for asynchronous (un)packing operations using CUDA
      streams in cuda::communication::GPUPackInfo. Through asynchronous operations
      it is possible to overlap GPU computation and MPI communication in simulations
      (e.g. LBM simulations). Asynchronous copies in CUDA require pinned memory on
      the host, and for that purpose a staging buffer is introduced (i.e.
      cuda::communication::PinnedMemoryBuffer) in the cuda module, which is used to
      stage data between the GPU and the MPI buffers.
    - zyxf layout: Add zyxf field layout support in GPUPackInfo through extensions
      of the functions in cuda::GPUCopy.
    - Extended GPUPackInfo test: Add stream and zyxf layout tests to the
      GPUPackInfoTest to test the proposed implementation.
    - Extended Kernel: add CUDA stream and shared memory configuration support in
      cuda::Kernel class.
    Signed-off-by: João Victor Tozatti Risso's avatarJoão Victor Tozatti Risso <joaovictortr@protonmail.com>