pystencils merge requests

Nesting of Type Contexts, Type Hints, and Improved Array Typing

2024-10-14T13:52:51+02:00

This MR introduces a few extensions to the typifier, allowing it to infer types for a wider range of more complex expressions. In particular, the typing of array literals and array declaration is improved significantly. - Allow nested type contexts to be deferred by linking them to their parent contexts via *inference hooks* - Introduce a system of type hints and their propagation through inference hooks in order to - resolve type contexts from incomplete type information - propagate fallback-to-default behaviour to nested contexts - (possible future applications) - Use the above to refactor and extend the type inference of array literals and subscripts: - Array literals now fall back to the default type if no array type is known - Inline array literals can now inherit their type from the enclosing expression's type context - Apply the default numeric data type to arguments of `PsCast` and relationals if the argument type could not be inferred Closes #99.

Uniqueness of Data Type Instances

2024-04-22T11:44:27+02:00

This MR refactors the type system to avoid multiple creation of data type instances by caching them. This improves the performance of comparing types, avoids unnecessary copies, and increases general efficiency of data type handling. - Introduce the metaclass `PsTypeMeta` which keeps track of all existing instances of subclasses of `PsType`, and alters class instantiation to return existing instances instead of creating new ones - Refactor `__eq__` and `__hash__`, which are now implemented in `PsType` and use an `__args__` method implemented by each instantiable subclass - Fix `constify`/`deconstify` to no longer copy their arguments, but memoize their return values - Refactor and extend documentation of the type system to reflect the changes Some minor additions come along: - Move `create_type` from `types.quick` to `types.parsing` - Rename `types.basic_types` to `types.types` - Move `PsType`, `constify` and `deconstify` to `types.meta` - Add an inheritance diagram to the type system documentation

Fix kernel function parameters

2024-03-28T13:47:06+01:00

This MR implements equality and hashing for `PsSymbol` such that the parameters of `KernelFunction`s are unique. Also improves some error messages.

Increase supported python version

2024-01-16T11:56:08+01:00

Increase supported python version

Draft: Loop counter dependent kernels: Vector casts and smaller fixes

2024-09-04T19:02:41+02:00

We would like to generate code for kernels which contain spatially dependent expressions. Thus, the expression tree contains loop counter symbols. Unfortunately, vectorizing these kernels does not currently work. The goal of this MR is to change that. It is far, far from being finished but I would appreciate some feedback early on. With the current state, I was able to generate one of our kernels with vectorization (AVX) enabled and the code runs on my AVX512 machine (the code contains AVX512 instructions...). This MR includes a very basic test so that you can play around with the new feature if you want: ``` pytest pystencils_tests/test_vectorization.py::test_vectorize_loop_ctr ``` Issues with the current implementation are: 1. The vectorized loop counter uses a different vector width than the remaining code. E.g. for AVX the (f64) vector width is 4 and the loop is vectorized accordingly. However, the vectorized loop counter has 8 lanes. 2. Vector expressions can not be casted to different types. 3. Integer expressions can not be properly vectorized. 4. `create_type('int')`/`BasicType('int')` creates an `int64` but many places assume that `int` is an `int32`. This is relevant here since conversions from `int64` are not available before AVX512 (they require rounding). The proposed change is to rework the instruction sets slightly. Instead of "dumb" dictionaries, they get promoted to proper classes. Every time an instruction is queried, the data type (base type and number of lanes) must be specified. This way, we can properly handle integers and vectors with less than the maximum bitwidth which is supported by the instruction set. The latter is e.g. necessary to work with i32x4 vectors on AVX. Furthermore, type conversion is implemented in the instruction sets. This obviously changes the interface and has strong implications on how the vectorization is handled throughout the code base. Therefore, I would appreciate any feedback, whether these changes are in line with the goals of the project and the thoughts behind #46. Any comments are welcome.

Draft: [FIX] Index fields exclusively containing coordinates are dropped by code generator

2024-01-31T11:48:23+01:00

Index fields that exclusively contain coordinate data (members `x`, `y` and `z`) and that are not explicitly accessed in the kernel assignments are dropped by `pystencils.cpu.create_indexed_kernel` in `cpu/kernelcreation.py`, prev. line 119. Then in line 128 the list of index fields is empty, and the code generator finds no field containing the coordinate information. Code generation then aborts. Is there a reason why index fields are first filtered this way?

Draft: Generalise usage of Structs for nested array access

2023-09-28T09:47:11+02:00

In this, MR Structs are introduced in a more general form than they are used in the index kernel. The structs here can hold data and pointers to fields. This makes it possible to iterate over a struct and extract field pointers in each loop iteration. The extracted fields are then updated in the normal loop nest. The idea can be illustrated in a small example: ```python import numpy as np import pystencils as ps from pystencils.typing import BasicType, FieldPointerSymbol, PointerType from pystencils.struct import Struct dtype = BasicType(np.float64) f = ps.fields(f'f(1): double[3d]') g = ps.fields(f'g(1): double[3d]') struct_src = Struct("src") struct_src.add_member(PointerType(dtype, const=False, restrict=False, double_pointer=True)) struct_dst = Struct("dst") struct_dst.add_member(PointerType(dtype, const=False, restrict=False, double_pointer=True)) update_rule = [ps.Assignment(FieldPointerSymbol("f", dtype, const=True), struct_src[0]), ps.Assignment(FieldPointerSymbol("g", dtype, const=False), struct_dst[0]), ps.Assignment(g.center, f.center)] ast = ps.create_kernel(update_rule) ``` This produces the following C-Code: ```c++ FUNC_PREFIX void kernel(double ** _data_dst, double ** _data_src, int64_t const _size_dst, int64_t const _size_f_0, int64_t const _size_f_1, int64_t const _size_f_2, int64_t const _stride_f_0, int64_t const _stride_f_1, int64_t const _stride_f_2, int64_t const _stride_g_0, int64_t const _stride_g_1, int64_t const _stride_g_2) { for (int64_t ctr_0 = 0; ctr_0 < _size_dst; ctr_0 += 1) { double * RESTRICT _data_f = _data_src[ctr_0]; double * RESTRICT _data_g = _data_dst[ctr_0]; for (int64_t ctr_1 = 0; ctr_1 < _size_f_0; ctr_1 += 1) { for (int64_t ctr_2 = 0; ctr_2 < _size_f_1; ctr_2 += 1) { for (int64_t ctr_3 = 0; ctr_3 < _size_f_2; ctr_3 += 1) { _data_g[_stride_g_0*ctr_1 + _stride_g_1*ctr_2 + _stride_g_2*ctr_3] = _data_f[_stride_f_0*ctr_1 + _stride_f_1*ctr_2 + _stride_f_2*ctr_3]; } } } } } ``` Thus the struct is used as a container for an arbitrary number of subarrays that are all updated at once. Since the struct only holds a single pointer per Element in the above example we can represent it as a double pointer **

[BugFix] Fix indexing with ghostlayers

2023-09-07T11:10:33+02:00

The Block indexing has bug when created with an iteration slice and ghost layers. With !341 The Block indexing supports slices more naturally by limiting the iteration space to the sliced size. Thus the counter index is multiplied by the step size. This was done also for the offset of the ghostlayers which is wrong. This MR fixes the problem

Draft: Do not reorder accesses in `move_constants_before_loop`

2023-08-18T10:39:05+02:00

Prior to this MR, `move_constants_before_loop` tries to move constants as far to the top as possible. This might reorder read/write accesses to fields. For example: ```python import pystencils as ps from pystencils import CreateKernelConfig from pystencils.astnodes import Block, KernelFunction, LoopOverCoordinate, SympyAssignment from pystencils.field import Field, FieldType from sympy.abc import x, y field = Field.create_generic("field", 1, field_type=FieldType.CUSTOM) counter = LoopOverCoordinate.get_loop_counter_symbol(0) load = SympyAssignment(x, field.absolute_access((counter,), (0,))) store = SympyAssignment(field.absolute_access((counter+1,), (0,)), 2*x) body = ps.typing.transformations.add_types(Block([load, store]), CreateKernelConfig()) loop = LoopOverCoordinate(body, 0, 0, 42) block = Block([loop]) ps.transformations.resolve_field_accesses(block) new_loops = ps.transformations.cut_loop(loop, [41]) ps.transformations.move_constants_before_loop(new_loops.args[1]) kernel = KernelFunction( block, ps.Target.CPU, ps.Backend.C, ps.cpu.cpujit.make_python_function, None, ) code = ps.get_code_str(kernel) print(code) ``` prints ```c FUNC_PREFIX void kernel(double * RESTRICT _data_field, int64_t const _stride_field_0) { const double x = _data_field[41*_stride_field_0]; _data_field[42*_stride_field_0] = x*2.0; { for (int64_t ctr_0 = 0; ctr_0 < 41; ctr_0 += 1) { const double x = _data_field[_stride_field_0*ctr_0]; _data_field[_stride_field_0*(ctr_0 + 1)] = x*2.0; } { } } } ``` Note that the last (cut) loop iteration is moved before the primary loop, leading to a wrong load from index 41. This MR changes `move_constants_before_loop` such that assignments can not be moved before their last modification. Essentially, it replaces `symbols_defined` by `symbols_modified` [here](https://i10git.cs.fau.de/terraneo/pystencils/-/commit/be78ab165339d593869b5c77ef00a590a63ba130#99785d4b53b75ce54c83c3e499248de2a07fb2cd_598_597). This new property is implemented for all AST nodes. Note the implementation of `CustomCCodeNode`. I did not want to introduce breaking changes to the API. Additionally, declarations are now inserted where the caller requests, instead of pushing them all the way to the top (https://i10git.cs.fau.de/terraneo/pystencils/-/commit/5c65d06216d050c22e28ba0b9487544342fc0926). Lastly, a test for the new behavior is included.

[Fix] Update for Docker Images

2023-06-04T16:14:23+02:00

Due to an update of the docker images minor changes are required for the CI

Draft: feat: implement `__cuda_array_interface__`

2023-09-14T10:43:31+02:00

https://numba.readthedocs.io/en/stable/cuda/cuda_array_interface.html This is supported by: - pycuda - numba - cupy - torch - nvcv https://github.com/CvCuda/CV-CUDA - maybe by tensorflow in future: https://github.com/tensorflow/tensorflow/issues/29039 Also allow to execute with cupy (https://docs.cupy.dev/en/stable/index.html) instead of pycuda TODO: - [ ] check that pointers in correct CUDA context and if not import into current - [x] make execution with pycuda aware of `__cuda_array_interface__` - [ ] what/how to test

Fix #62

2022-10-21T09:24:20+02:00

Fixes problems around #62

Regression !300

2022-10-10T13:37:53+02:00

In !300 all written field sizes are added to the SympyAssignment as unknown parameters. This solves the problem that all field sizes need to be passed as arguments when using NT stores with non-x86 architectures. However, it introduces two problems. 1. In all other cases these parameters are not used. Thus waLBerla fails in some cases when compiled with -Wall. Other than that it is not nice either to pass unused parameters. 2. For the GPU code generation problems arose with the usage of `get_parameters` in waLBerla: https://i10git.cs.fau.de/pycodegen/pystencils/-/blob/master/pystencils/astnodes.py#L244 Overall it seems that the easiest way to fix the problem is to only pass the additional size arguments when needed and in no other cases.

Draft: Remove too many zeros

2023-03-27T10:40:59+02:00

Remove unnecessary from numbers: 1.80000000 --> 1.8

WIP: Revamp the type system

2022-05-11T14:33:30+02:00

WIP: Revamp the type system

Fixed kernel_decorator with config parameter

2021-11-03T22:23:36+01:00

The current kernel decorator does not work properly with the introduced `CreateKernelConfig`. This MR fixes that.

Switch index type from int32 to int64

2021-06-07T13:38:27+02:00

For large domain sizes, int32 is not sufficient. Thus it is planned for waLBerla to change `cell_index_t` from `int` to `int64`. To make it consistent with pystencils and to prevent conversion warnings the index type for pystencils is also adapted to int64 Fixes https://i10git.cs.fau.de/pycodegen/lbmpy/-/issues/18

Fix Sympy pipeline

2021-04-26T16:46:20+02:00

Fix Sympy pipeline

Add type conversion for SP types

2021-04-03T06:01:47+02:00

If Assignments are already typed for double-precision but the kernel is created for single-precision the assignments should be adapted.

WIP: ARM cache line zeroing

2021-04-01T22:59:29+02:00

ARM has a cache line zero instruction that prevents data that will be overwritten anyway from being loaded from RAM. Kind of a light version of a non-temporal store. Saw this near the end of https://www.youtube.com/watch?v=BP7XD7JHgrI in the context of SVE, but might be relevant on Neon too. Just wanted to keep a note of this here. Integrating this into pystencils is probably not completely straight-forward as you first need to check how much would be zeroed (64 bytes on all current chips, not guaranteed to match the cache line size), zero it, and then write the corresponding amount of data. Not sure if there are guarantees as to whether it's a multiple of the vector width. There is not a whole lot of information for ARM, but the exact same thing has existed on IBM‘s PowerPC architecture (!228) for decades. There, a cache line has 128 bytes (can be queried from the kernel via `sysconf(_SC_LEVEL1_DCACHE_LINESIZE)`) and can be zeroed with the `__dcbz` intrinsic.