diff --git a/doc/index.rst b/doc/index.rst index 3030d33618303ecaf3c3d3f97e2fa12b73177965..3ab860c326f793b09baf29929fca37882c3d7562 100644 --- a/doc/index.rst +++ b/doc/index.rst @@ -10,6 +10,7 @@ pystencils can help you to generate blazingly fast code for image processing, nu sphinx/tutorials.rst sphinx/api.rst + sphinx/nbackend/index.rst .. image:: /img/pystencils_arch_block_diagram.svg diff --git a/doc/sphinx/nbackend/arrays.rst b/doc/sphinx/nbackend/arrays.rst new file mode 100644 index 0000000000000000000000000000000000000000..b668d3c0a41ee7e78b8bc8dab77dd3898929f40d --- /dev/null +++ b/doc/sphinx/nbackend/arrays.rst @@ -0,0 +1,6 @@ +****** +Arrays +****** + +.. automodule:: pystencils.nbackend.arrays + :members: diff --git a/doc/sphinx/nbackend/ast.rst b/doc/sphinx/nbackend/ast.rst new file mode 100644 index 0000000000000000000000000000000000000000..ca0ccf2b9abf0c679eca1eaadc8f59d4212c6dfa --- /dev/null +++ b/doc/sphinx/nbackend/ast.rst @@ -0,0 +1,6 @@ +******************** +Abstract Syntax Tree +******************** + +.. automodule:: pystencils.nbackend.ast + :members: diff --git a/doc/sphinx/nbackend/expressions.rst b/doc/sphinx/nbackend/expressions.rst new file mode 100644 index 0000000000000000000000000000000000000000..8dcf58b61235ecc98f8189e5981de52241b12ffb --- /dev/null +++ b/doc/sphinx/nbackend/expressions.rst @@ -0,0 +1,15 @@ +*********************** +Expression Manipulation +*********************** + +Variables and Constants +======================= + +.. automodule:: pystencils.nbackend.typed_expressions + :members: + +Functions +========= + +.. automodule:: pystencils.nbackend.functions + :members: diff --git a/doc/sphinx/nbackend/index.rst b/doc/sphinx/nbackend/index.rst new file mode 100644 index 0000000000000000000000000000000000000000..0dfe29f80ab51375c7a7e5629dfa139cf5a7cc74 --- /dev/null +++ b/doc/sphinx/nbackend/index.rst @@ -0,0 +1,19 @@ +############################################################# +Developer's Reference: Code Generation Backend (``nbackend``) +############################################################# + + +These pages provide a detailed overview of the next-gen code generation backend ``nbackend`` currently being +developed for *pystencils*. This new backend is intended to consolidate and finally replace +all code generation functionality currently implemented in *pystencils* version 1.x. + +.. toctree:: + :maxdepth: 1 + + rationale + type_system + expressions + arrays + ast + kernelcreation + diff --git a/doc/sphinx/nbackend/kernelcreation.rst b/doc/sphinx/nbackend/kernelcreation.rst new file mode 100644 index 0000000000000000000000000000000000000000..8232a23a1caedec164fdaa0bc0739df795d325f7 --- /dev/null +++ b/doc/sphinx/nbackend/kernelcreation.rst @@ -0,0 +1,6 @@ +*************** +Kernel Creation +*************** + +.. automodule:: pystencils.nbackend.kernelcreation + diff --git a/doc/sphinx/nbackend/rationale.rst b/doc/sphinx/nbackend/rationale.rst new file mode 100644 index 0000000000000000000000000000000000000000..39c4e0d1fc6e2fdcf059c52bfcb482230d52eca6 --- /dev/null +++ b/doc/sphinx/nbackend/rationale.rst @@ -0,0 +1,49 @@ +*********************** +Rationale and Key Ideas +*********************** + +Expression Manipulation +^^^^^^^^^^^^^^^^^^^^^^^ + +The pystencils code generator was originally built based entirely on the computer algebra system SymPy. +SymPy itself is ideal for the front-end representation of kernels using its mathematical language. +In pystencils, however, SymPy was long used to model all mathematical expressions, from the continuous equations +down to the bare C assignments, loop counters, and even pointer arithmetic. +SymPy's unique properties, especially regarding automatic rewriting and simplification of expressions, +while perfect for doing symbolic mathematics, have proven to be very problematic when used as the basis of +an intermediate code representation. + +The primary problems caused by using SymPy for expression manipulation are these: + + - Assigning and checking types on SymPy expressions is not possible in a stable way. While a type checking + pass over the expression trees may validate types early in the code generation process, often SymPy's auto- + rewriting system will be triggered by changes to the AST at a later stage, silently invalidating type + information. + - SymPy will aggressively simplify constant expressions in a strictly mathematical way, which leads to + semantically invalid transformations in contexts with fixed types. This problem especially concerns + integer types, and division in integer contexts. + - SymPy aggressively flattens expressions according to associativity, and freely reorders operands in commutative + operations. While perfectly fine in symbolic mathematics, this behaviour makes it impossible to group + and parenthesize operations for numerical or performance benefits. Another often-observed effect is that + SymPy distributes constant factors across sums, strongly increasing the number of FLOPs. + +To avoid these problems, ``nbackend`` uses the [pymbolic](https://pypi.org/project/pymbolic/) package for expression +manipulation. Pymblic has similar capabilities for writing mathematic expressions as SymPy, however its expression +trees are much simpler, completely static, and easier to extend. + +Structure and Architecture of the Code Generator +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The code generation flow of *pystencils* has grown significantly over the years, to accomodate various different +kinds of kernels, output dialects, and target platforms. Very often, extensions were retroactively integrated with +a system that was not originally designed to support them. As a result, the code generator is now +a very convoluted set of functions and modules, containing large volumes of hard-to-read code, much of it +duplicated for several platforms. + +The design of the ``nbackend`` takes the benefit of hindsight to provide the same (and, in some cases, a broader) set of +functionality through a much better structured software system. While the old code generator was implemented in an almost +entirely imperative manner, the ``nbackend`` makes extensive use of object-oriented programming for knowledge representation, +construction and internal representation of code, as well as analysis, transformation, and code generation tasks. +As a result, the ``nbackend`` is much more modular, concise, easier to extend, and implemented in a much smaller volume of +code. + diff --git a/doc/sphinx/nbackend/type_system.rst b/doc/sphinx/nbackend/type_system.rst new file mode 100644 index 0000000000000000000000000000000000000000..75d126ddbc23033005cddc1337c20135baa0d298 --- /dev/null +++ b/doc/sphinx/nbackend/type_system.rst @@ -0,0 +1,6 @@ +*************** +The Type System +*************** + +.. automodule:: pystencils.nbackend.types + :members: diff --git a/src/pystencils/nbackend/arrays.py b/src/pystencils/nbackend/arrays.py index 0840b26995f5bdc5c51194be04d7296706bef9c3..60343c4d9f211e6408ddf04eacf359e6c082007a 100644 --- a/src/pystencils/nbackend/arrays.py +++ b/src/pystencils/nbackend/arrays.py @@ -1,7 +1,4 @@ """ -Arrays -====== - The pystencils backend models contiguous n-dimensional arrays using a number of classes. Arrays themselves are represented through the `PsLinearizedArray` class. An array has a fixed name, dimensionality, and element type, as well as a number of associated @@ -25,18 +22,16 @@ Constraints are external to the arrays themselves. They are created by the AST p require them and exposed through the `PsKernelFunction` class to the compiler kernel's runtime environment. It is the responsibility of the runtime environment to fulfill all constraints. -For example, if an array `arr` should have both a fixed shape and fixed strides, +For example, if an array ``arr`` should have both a fixed shape and fixed strides, an optimization pass will have to add equality constraints like the following before replacing -all occurences of the shape and stride variables with their constant value: +all occurences of the shape and stride variables with their constant value:: -``` -constraints = ( - [PsKernelConstraint(s.eq(f)) for s, f in zip(arr.shape, fixed_size)] - + [PsKernelConstraint(s.eq(f)) for s, f in zip(arr.strides, fixed_strides)] -) + constraints = ( + [PsKernelConstraint(s.eq(f)) for s, f in zip(arr.shape, fixed_size)] + + [PsKernelConstraint(s.eq(f)) for s, f in zip(arr.strides, fixed_strides)] + ) -kernel_function.add_constraints(*constraints) -``` + kernel_function.add_constraints(*constraints) """ @@ -68,7 +63,7 @@ class PsLinearizedArray: Shape and strides may be specified at construction in the following way. For constant entries, their value must be given as an integer. For variable shape entries and strides, the Ellipsis `...` must be passed instead. - Internally, the passed `index_dtype` will be used to create typed constants (`PsTypedConstant`) + Internally, the passed ``index_dtype`` will be used to create typed constants (`PsTypedConstant`) and variables (`PsArrayShapeVar` and `PsArrayStrideVar`) from the passed values. """ @@ -109,28 +104,34 @@ class PsLinearizedArray: @property def name(self): + """The array's name""" return self._name @property def base_pointer(self) -> PsArrayBasePointer: + """The array's base pointer""" return self._base_ptr @property def shape(self) -> tuple[PsArrayShapeVar | PsTypedConstant, ...]: + """The array's shape, expressed using `PsTypedConstant` and `PsArrayShapeVar`""" return self._shape @property def shape_spec(self) -> tuple[EllipsisType | int, ...]: + """The array's shape, expressed using `int` and `...`""" return tuple( (s.value if isinstance(s, PsTypedConstant) else ...) for s in self._shape ) @property def strides(self) -> tuple[PsArrayStrideVar | PsTypedConstant, ...]: + """The array's strides, expressed using `PsTypedConstant` and `PsArrayStrideVar`""" return self._strides @property def strides_spec(self) -> tuple[EllipsisType | int, ...]: + """The array's strides, expressed using `int` and `...`""" return tuple( (s.value if isinstance(s, PsTypedConstant) else ...) for s in self._strides ) diff --git a/src/pystencils/nbackend/functions.py b/src/pystencils/nbackend/functions.py index 1909843730429b60ce8217b9b9422fa7dfa90d60..22e855dcd53391a1b234f1ddcf3f6cd474889b77 100644 --- a/src/pystencils/nbackend/functions.py +++ b/src/pystencils/nbackend/functions.py @@ -4,7 +4,8 @@ Functions supported by pystencils. Every supported function might require handling logic in the following modules: - In `freeze.FreezeExpressions`, a case in `map_Function` or a separate mapper method to catch its frontend variant - - In each backend platform, a case in `materialize_functions` to map the function onto a concrete C/C++ implementation + - In each backend platform, a case in `Platform.materialize_functions` to map the function onto a concrete + C/C++ implementation - If very special typing rules apply, a case in `typification.Typifier`. In most cases, typification of function applications will require no special handling. diff --git a/src/pystencils/nbackend/kernelcreation/__init__.py b/src/pystencils/nbackend/kernelcreation/__init__.py index 1f7aad5ad1dba93428459cfae364fc762d6e87f8..749493059ee96a73961b24447bdabff05c485315 100644 --- a/src/pystencils/nbackend/kernelcreation/__init__.py +++ b/src/pystencils/nbackend/kernelcreation/__init__.py @@ -1,3 +1,103 @@ +""" +The `kernelcreation` module contains the actual translation logic of the pystencils code generator. +It provides a number of classes and submodules providing the various parts and passes of the code +generation process: + + - Parameterization of the translation process + - Knowledge collection and management + - Kernel analysis and constraints checks + - Expression parsing and AST construction + - Platform-specific code materialization + - General and platform-specific optimizations + +These components are designed to be combined and composed in a variety of ways, depending +on the actual code generation flow required. +The ``nbackend`` currently provides one native code generation driver: +`create_kernel` takes an `AssignmentCollection` and translates it to a simple loop kernel. +The code generator's components are perhaps most easily introduced in the context of that particular driver. + +Exemplary Code Generation Driver: `create_kernel` +------------------------------------------------- + +Generator Arguments +^^^^^^^^^^^^^^^^^^^ + +The driver accepts two parameters: an `AssignmentCollection` whose assignments represent the code of a single +kernel iteration without recurrences or other loop-carried dependencies; and a `CreateKernelConfig` which configures +the translation process. + +Context Creation +^^^^^^^^^^^^^^^^ + +The primary object that stores all information and knowledge required during the translation process is the +`KernelCreationContext`. It is created in the beginning from the configuration parameter. +It will be responsible for managing all fields and arrays encountered during translation, +the kernel's iteration space, +and any parameter constraints introduced by later transformation passes. + +Analysis Passes +^^^^^^^^^^^^^^^ + +Before the actual translation of the SymPy-based assignment collection to the pymbolic-based expression trees begins, +the kernel's assignments are checked for consistency with the translator's prequesites. +In this case, the `KernelAnalysis` pass +checks the static single assignment-form (SSA) requirement and the absence of loop-carried dependencies. +At the same time, it collects the set of all fields used in the assignments. + +Iteration Space Creation +^^^^^^^^^^^^^^^^^^^^^^^^ + +The kernel's `IterationSpace` is inferred from a combination of configuration parameters and the set of field accesses +encountered in the kernel. Two kinds of iteration spaces are available: A sparse iteration space +(`SparseIterationSpace`) encompasses singular points in the cartesian coordinate space, provided by an index list. +A full iteration space (`FullIterationSpace`), on the other hand, represents a full cuboid cartesian coordinate space, +which may optionally be sliced. + +The iteration space is used during the following translation passes to translate field accesses with respect to +the current iteration. It will only be instantiated in the form of a loop nest or GPU index calculation much later. + +Freeze and Typification +^^^^^^^^^^^^^^^^^^^^^^^ + +The transformation of the SymPy-expressions to the backend's pymbolic-based expression trees is handled by +`FreezeExpressions`. +This class instantiates field accesses according to the iteration space, maps SymPy operators and functions to their +backend instances if supported, and raises an exception if asked to translate something the backend can't handle. + +Constants and untyped symbols in the frozen expressions now need to be assigned a data type, and expression types +need to be checked against the C typing rules. This is the task of the `Typifier`. It assigns a default type to +every untyped symbol, attempts to infer the type of constants from their context in the expression, +and checks expression types using a much stricter +subset of the C typing rules, allowing for no implicit type casts even between closely related types. +After the typification pass, the code generator either has a fully and correctly typed kernel body in hand, +or it has raised an exception. + +Platform-Specific Iteration Space Materialization +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +At this point, most remaining transformations are specific to the target platform. Hardware platforms are modelled +using subclasses of the `Platform` class, which implement all platform-specific transformations. +The platform for the current code generation flow is instantiated from the target specification passed +by the user in `CreateKernelConfig`. +Then, the platform is asked to materialize the iteration space (e.g. by turning it into a loop nest +for CPU code) and to materialize any functions for which it provides specific implementations. + +Platform-Specific Optimizations +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Technically, the kernel is now complete, but it may still be optimized. +This is also the task of the platform instance. Potential optimizations include the inclusion of OpenMP, +loop splitting, slicing and blocking on CPUs, +and vectorization on CPU platforms with vector capabilities. + +Finalization +^^^^^^^^^^^^ + +At last, the kernel is packed up as a `PsKernelFunction`. +It is furthermore annotated with constraints collected during the translation, and returned to the user. + +""" + from .config import CreateKernelConfig from .kernelcreation import create_kernel diff --git a/src/pystencils/nbackend/kernelcreation/kernelcreation.py b/src/pystencils/nbackend/kernelcreation/kernelcreation.py index 9d2f2996d425be92149afedcebeb05926b8af8cc..3900585872fab80883c1c18bae00c89b2b706299 100644 --- a/src/pystencils/nbackend/kernelcreation/kernelcreation.py +++ b/src/pystencils/nbackend/kernelcreation/kernelcreation.py @@ -17,9 +17,9 @@ from .transformations import EraseAnonymousStructTypes def create_kernel( assignments: AssignmentCollection, - options: CreateKernelConfig = CreateKernelConfig(), + config: CreateKernelConfig = CreateKernelConfig(), ): - ctx = KernelCreationContext(options) + ctx = KernelCreationContext(config) analysis = KernelAnalysis(ctx) analysis(assignments) @@ -37,7 +37,7 @@ def create_kernel( typify = Typifier(ctx) kernel_body = typify(kernel_body) - match options.target: + match config.target: case Target.CPU: from .platform import BasicCpu @@ -57,7 +57,7 @@ def create_kernel( # - Loop Splitting, Tiling, Blocking kernel_ast = platform.optimize(kernel_ast) - function = PsKernelFunction(kernel_ast, options.target, name=options.function_name) + function = PsKernelFunction(kernel_ast, config.target, name=config.function_name) function.add_constraints(*ctx.constraints) return function diff --git a/src/pystencils/nbackend/typed_expressions.py b/src/pystencils/nbackend/typed_expressions.py index 2b1f3f17d52b23a3e1293091182f17133faf688c..62aaf1f50240ee494391b172c0dc05d88847c91a 100644 --- a/src/pystencils/nbackend/typed_expressions.py +++ b/src/pystencils/nbackend/typed_expressions.py @@ -49,35 +49,35 @@ class PsTypedConstant: The `PsTypedConstant` class overrides the basic arithmetic operations for use during a constant folding pass. Their implementations are very strict regarding types: No implicit conversions take place, and both operands must always have the exact same type. - The only exception to this rule are the values `0`, `1`, and `-1`, which are promoted to `PsTypedConstant` + The only exception to this rule are the values ``0``, ``1``, and ``-1``, which are promoted to `PsTypedConstant` (pymbolic injects those at times). A Note On Divisions ------------------- The semantics of divisions in C and Python differ greatly. - Python has two division operators: `/` (`truediv`) and `//` (`floordiv`). + Python has two division operators: ``/`` (``truediv``) and ``//`` (``floordiv``). `truediv` is pure floating-point division, and so applied to floating-point numbers maps exactly to floating-point division in C, but not when applied to integers. - `floordiv` has no C equivalent: - While `floordiv` performs euclidean division and always rounds its result - downward (`3 // 2 == 1`, and `-3 // 2 = -2`), - the C `/` operator on integers always rounds toward zero (in C, `-3 / 2 = -1`.) - - The same applies to the `%` operator: - In Python, `%` computes the euclidean modulus (e.g. `-3 % 2 = 1`), - while in C, `%` computes the remainder (e.g. `-3 % 2 = -1`). + ``floordiv`` has no C equivalent: + While ``floordiv`` performs euclidean division and always rounds its result + downward (``3 // 2 == 1``, and ``-3 // 2 = -2``), + the C ``/`` operator on integers always rounds toward zero (in C, ``-3 / 2 = -1``.) + + The same applies to the ``%`` operator: + In Python, ``%`` computes the euclidean modulus (e.g. ``-3 % 2 = 1``), + while in C, ``%`` computes the remainder (e.g. ``-3 % 2 = -1``). These two differ whenever negative numbers are involved. - Pymbolic provides `Quotient` to model Python's `/`, `FloorDiv` to model `//`, and `Remainder` to model `%`. - The last one is a misnomer: it should instead be called `Modulus`. + Pymbolic provides ``Quotient`` to model Python's ``/``, ``FloorDiv`` to model ``//``, + and ``Remainder`` to model ``%``. The last one is a misnomer: it should instead be called ``Modulus``. Since the pystencils backend has to accurately capture the behaviour of C, - the behaviour of `/` is overridden in `PsTypedConstant`. + the behaviour of ``/`` is overridden in `PsTypedConstant`. In a floating-point context, it behaves as usual, while in an integer context, it implements C-style integer division. - Similarily, `%` is only legal in integer contexts, where it implements the C-style remainder. - Usage of `//` and the pymbolic `FloorDiv` is illegal. + Similarily, ``%`` is only legal in integer contexts, where it implements the C-style remainder. + Usage of ``//`` and the pymbolic ``FloorDiv`` is illegal. """ __match_args__ = ("value", "dtype") @@ -118,7 +118,7 @@ class PsTypedConstant: def _fix(self, v: Any) -> PsTypedConstant: """In binary operations, checks for type equality and, if necessary, promotes the values - `0`, `1` and `-1` to `PsTypedConstant`.""" + ``0``, ``1`` and ``-1`` to `PsTypedConstant`.""" if not isinstance(v, PsTypedConstant) and v in (0, 1, -1): return PsTypedConstant(v, self._dtype) elif v._dtype != self._dtype: @@ -236,6 +236,6 @@ class PsTypedConstant: pb.register_constant_class(PsTypedConstant) ExprOrConstant: TypeAlias = pb.Expression | PsTypedConstant -"""Required since `PsTypedConstant` does not derive from `pb.Expression`.""" +"""Required since `PsTypedConstant` does not derive from ``pb.Expression``.""" VarOrConstant: TypeAlias = PsTypedVariable | PsTypedConstant diff --git a/src/pystencils/nbackend/types/basic_types.py b/src/pystencils/nbackend/types/basic_types.py index 3f080c313de76f7d1ae1a475f44f56006ba88939..9df6858e90a1aee8ba1640dd2da8f6f4d8547088 100644 --- a/src/pystencils/nbackend/types/basic_types.py +++ b/src/pystencils/nbackend/types/basic_types.py @@ -13,13 +13,12 @@ from ..exceptions import PsInternalCompilerError class PsAbstractType(ABC): """Base class for all pystencils types. - Implementation Notes - -------------------- + **Implementation Notes** - **Type Equality:** Subclasses must implement `__eq__`, but may rely on `_base_equal` to implement + **Type Equality:** Subclasses must implement ``__eq__``, but may rely on ``_base_equal`` to implement type equality checks. - **Hashing:** Each subclass that implements `__eq__` must also implement `__hash__`. + **Hashing:** Each subclass that implements ``__eq__`` must also implement ``__hash__``. """ def __init__(self, const: bool = False): @@ -263,7 +262,7 @@ class PsNumericType(PsAbstractType, ABC): `create_constant` should fail whenever its input cannot safely be interpreted as the given type. As for which interpretations are considered 'safe', it should be as restrictive as possible. - However, `create_constant` must *never* fail for the literals `0`, `1` and `-1`. + However, `create_constant` must *never* fail for the literals ``0``, ``1`` and ``-1``. """ @abstractmethod diff --git a/src/pystencils/nbackend/types/quick.py b/src/pystencils/nbackend/types/quick.py index e5d271cf90218015bd5415a387cc19867228e4f7..65683c80abe25185282b8cc1f7949a110a469047 100644 --- a/src/pystencils/nbackend/types/quick.py +++ b/src/pystencils/nbackend/types/quick.py @@ -29,14 +29,14 @@ def make_type(type_spec: UserTypeSpec) -> PsAbstractType: Possible arguments are: - Strings ('str'): will be parsed as common C types, throwing an exception if that fails. To construct a `PsCustomType` instead, use the constructor of `PsCustomType` or its abbreviation - `types.quick.Custom` instead + ``types.quick.Custom`` instead - Python builtin data types (instances of `type`): Attempts to interpret Python numeric types like so: - `int` becomes a signed 64-bit integer - `float` becomes a double-precision IEEE-754 float - No others are supported at the moment - Supported Numpy scalar data types (see https://numpy.org/doc/stable/reference/arrays.scalars.html) are converted to pystencils scalar data types - - Instances of `np.dtype`: Attempt to interpret scalar types like above, and structured types as structs. + - Instances of `numpy.dtype`: Attempt to interpret scalar types like above, and structured types as structs. - Instances of `PsAbstractType` will be returned as they are """