Options to skip pre- and/or post-communication
Currently, many "vector-operators" are composed of scalar operators.
Ultimately, the implementation of vector function spaces shall alleviate that issue by enabling to fuse all operators into a single kernel.
As an intermediate step, we may at least be able to avoid unnecessary communication by offering an option to skip pre- and/or (additive) post-communication of the generated operators.
For instance, if we have a 3x3 block operator:
A_11 A_12 A_13
A = A_21 A_22 A_23
A_31 A_32 A_33
and vectors
v = (v_1 v_2 v_3)^T
w = (w_1 w_2 w_3)^T
and want to compute
w <- Av
we do not need to reduce the values of the dst component w_1
on the macro-volume boundary after the products A_11 v_1
and A_12 v_2
, if we reduce it after A_13 v_3
anyway.
Similar arguments apply for the communication of the src vectors to update the boundaries.
This should only require two simple parameters and corresponding conditionals around the synchronization. A wrapper (composite) operator could set them accordingly.