Question on GPU affinity on a single-node multi-GPU system

I am looking at running a WalBerla CUDA application on a single node multi-GPU system (8 GV100), and I started playing with the game of life tutorial.

The tutorial works when I run with one MPI process. However, I get a runtime exception when running with two processes:

The number of requested processes (1) doesn't match the number of active MPI processes (2)!

I then modified the code to enable one-block-per-process setting in the createUniformBlockGrid method call (https://i10git.cs.fau.de/holzer/walberla/-/blob/master/apps/tutorials/cuda/01_GameOfLife_cuda.cpp#L106) I was able to run on 2 MPI processes with the change, but they were both mapped to the first GPU.

How can I control MPI-CUDA affinity in WalBerla?
Is the change I made to the example correct?
Is there a way in Warbela to automatically distribute the application blocks to the available MPI processes with uniform grids?

Thank you. Max