Use streams for GPU communication

This MR introduces streams for the local communication in the Uniform GPU communication.

Furthermore, bugs in the NonUniform scheme are fixed

Merge request reports

Loading