Pure GPU communication scheme
Usage example:
PureGPUCommunicationScheme scheme(cudaEnabledMPIAvailable, blocks, fieldID);
scheme();
//or
scheme.startCommunication()
scheme.wait()
- pack kernel for every direction (perhaps we can generate this?) -> look at code generation first!
- one buffer on the GPU per direction
- then either:
- copy to CPU (if non CUDA enabled MPI)
- send off directly