Reduction Support
This MR introduces reductions to pystencils for scalar data types and thus covers #55.
User interface
- Adds reduction assignment classes to
sympyextensionsmodule: AddReductionAssignment, SubReductionAssignment, MulReductionAssignment, MinReductionAssignment, MaxReductionAssignment
These can be used as follows:
import pystencils as ps
r = ps.TypedSymbol("r", "double")
x, y = ps.fields(f"x, y: double[3D]", layout="fzyx")
assign_dot_prod = ps.AddReductionAssignment(r, x.center() * y.center())
- Alternatívely, you can also make use of the
reduction_assignmentorreduction_assignment_from_strfunctions:
from pystencils.sympyextensions import reduction_assignment, reduction_assignment_from_str
from pystencils.sympyextensions.reduction import ReductionOp
assign_dot_prod = reduction_assignment(r, ReductionOp.Add, x.center() * y.center())
assign_dot_prod = reduction_assignment_from_str(r, "+", x.center() * y.center())
Supported Backends
Generic CPUs
- Add reduction support for OpenMP
SIMD: SSE3, AVX2, AVX512
- Include a generated header file with horizontal operations performing a binary operation between a scalar variable and a SIMD vector. The SIMD vector is transformed to a scalar variable via reduction, and then the binary operation is applied to the other operand
CUDA
- Employ atomic reduction operations in all threads when the block size does not align with the warp size
- Optimization for alignment with warp size: perform warp-level reductions and only perform atomic operation on first thread of warp
- Include a header file with manual implementations for atomic operations that are not directly supported for floating point numbers: atomicMul, atomicMax, and atomicMin. These functions make use of a CAS mechanism.
Internal Changes
- Freeze handling for newly introduced
ReductionAssignmentnodes - Add
PsVecHorizontalvectorization node for conducting a binary operation between a scalar symbol and an extraction of a vector value (obtained by performing a reduction within a vector lane) - Add dataclass
ReductionInfoholding essential information about a reduction (i.e. reduction operation, initial value and the write-back pointer for exporting the reduction result) and create corresponding lookup table for symbols inKernelCreationContext - Introduce
NumericLimitsFunctionsfor initializing neutral elements for reductions making use of min/max operations - Adapt
Platform.select_functionsuch that it either returns anPsExpressionthat replaces the function call or returns aPsExpression | tuple[tuple[PsStructuralNode, ...], PsAstNode]holding aPsAstNodethat replaces the function call and tuple of structural nodes that are added before the replacement. The structural nodes allow adding preparatory code for the replacement, as needed for the warp-level reductions for GPU platforms - Add
ReductionFunctions.WriteBackToPtrfunction that is replaced with platform-dependent code inPlatform.select_function - Slightly adapt CPU/GPU Jit modules to support the handling of write-back pointers used for reductions
Edited by Richard Angersbach