Communication fails when using multiple blocks per process (GPU)
The communication fails (i.e., communicates to the wrong position) when using multiple blocks per process (GPU). This only occurs between the process-local block 0 and process-local block 1.