New GPU communication scheme with GPU kernels for packing
Features: - uses generated pack infos for packing & unpacking directly on GPU - can directly send GPU buffers if cuda-enabled MPI is available, otherwise the packed buffers are transfered to CPU first - communication hiding with cuda streams: communication can be run asynchronously - especially useful when compute kernel is also split up into inner and outer part - added RAII classes for CUDA streams and events - equivalence test that checks if generated CPU and GPU (overlapped) versions are computing same result as normal waLBerla LBM kernel
Showing
- src/cuda/CudaRAII.h 85 additions, 0 deletionssrc/cuda/CudaRAII.h
- src/cuda/ErrorChecking.h 1 addition, 1 deletionsrc/cuda/ErrorChecking.h
- src/cuda/GPUField.h 3 additions, 0 deletionssrc/cuda/GPUField.h
- src/cuda/GPUField.impl.h 20 additions, 0 deletionssrc/cuda/GPUField.impl.h
- src/cuda/communication/CustomMemoryBuffer.h 143 additions, 0 deletionssrc/cuda/communication/CustomMemoryBuffer.h
- src/cuda/communication/CustomMemoryBuffer.impl.h 120 additions, 0 deletionssrc/cuda/communication/CustomMemoryBuffer.impl.h
- src/cuda/communication/GPUPackInfo.h 12 additions, 13 deletionssrc/cuda/communication/GPUPackInfo.h
- src/cuda/communication/GeneratedGPUPackInfo.h 44 additions, 0 deletionssrc/cuda/communication/GeneratedGPUPackInfo.h
- src/cuda/communication/UniformGPUScheme.h 94 additions, 0 deletionssrc/cuda/communication/UniformGPUScheme.h
- src/cuda/communication/UniformGPUScheme.impl.h 248 additions, 0 deletionssrc/cuda/communication/UniformGPUScheme.impl.h
- tests/cuda/CMakeLists.txt 4 additions, 0 deletionstests/cuda/CMakeLists.txt
- tests/cuda/codegen/CudaJacobiKernel.py 7 additions, 4 deletionstests/cuda/codegen/CudaJacobiKernel.py
- tests/cuda/codegen/EquivalenceTest.cpp 178 additions, 0 deletionstests/cuda/codegen/EquivalenceTest.cpp
- tests/cuda/codegen/EquivalenceTest.gen.py 40 additions, 0 deletionstests/cuda/codegen/EquivalenceTest.gen.py
Please register or sign in to comment