Usage of NCCL
NVIDIA NCCL provides a Collective Communication Library. This could give a performance boost for multi GPU computations and be better than Cuda-Aware MPI.
https://developer.nvidia.com/nccl https://docs.nvidia.com/deeplearning/nccl/install-guide/index.html