There was an error fetching the commit references. Please try again later.
GPUPackInfo: add asynchronous (un)packing capabilities
Changes introduced in this commit are the following:
- CUDA streams: Add support for asynchronous (un)packing operations using CUDA
streams in cuda::communication::GPUPackInfo. Through asynchronous operations
it is possible to overlap GPU computation and MPI communication in simulations
(e.g. LBM simulations). Asynchronous copies in CUDA require pinned memory on
the host, and for that purpose a staging buffer is introduced (i.e.
cuda::communication::PinnedMemoryBuffer) in the cuda module, which is used to
stage data between the GPU and the MPI buffers.
- zyxf layout: Add zyxf field layout support in GPUPackInfo through extensions
of the functions in cuda::GPUCopy.
- Extended GPUPackInfo test: Add stream and zyxf layout tests to the
GPUPackInfoTest to test the proposed implementation.
- Extended Kernel: add CUDA stream and shared memory configuration support in
cuda::Kernel class.
Signed-off-by:
João Victor Tozatti Risso <joaovictortr@protonmail.com>
Showing
- src/cuda/GPUCopy.cpp 354 additions, 104 deletionssrc/cuda/GPUCopy.cpp
- src/cuda/GPUCopy.h 74 additions, 137 deletionssrc/cuda/GPUCopy.h
- src/cuda/Kernel.h 9 additions, 6 deletionssrc/cuda/Kernel.h
- src/cuda/communication/GPUPackInfo.h 192 additions, 48 deletionssrc/cuda/communication/GPUPackInfo.h
- src/cuda/communication/PinnedMemoryBuffer.h 123 additions, 0 deletionssrc/cuda/communication/PinnedMemoryBuffer.h
- tests/cuda/CMakeLists.txt 3 additions, 0 deletionstests/cuda/CMakeLists.txt
- tests/cuda/communication/GPUPackInfoCommunicationTest.cpp 187 additions, 0 deletionstests/cuda/communication/GPUPackInfoCommunicationTest.cpp
- tests/cuda/communication/GPUPackInfoTest.cpp 46 additions, 25 deletionstests/cuda/communication/GPUPackInfoTest.cpp
Please register or sign in to comment