pystencils merge requestshttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests2019-08-06T08:03:25+02:00https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/19Run flynt (https://pypi.org/project/flynt/) on pystencils2019-08-06T08:03:25+02:00Stephan SeitzRun flynt (https://pypi.org/project/flynt/) on pystencilsThis replaces usages of "" % s by Python's f-strings.
Is this a good thing? I don't know. :shrug:This replaces usages of "" % s by Python's f-strings.
Is this a good thing? I don't know. :shrug:https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/302Regression !3002022-10-10T13:37:53+02:00Markus HolzerRegression !300In !300 all written field sizes are added to the SympyAssignment as unknown parameters. This solves the problem that all field sizes need to be passed as arguments when using NT stores with non-x86 architectures. However, it introduces t...In !300 all written field sizes are added to the SympyAssignment as unknown parameters. This solves the problem that all field sizes need to be passed as arguments when using NT stores with non-x86 architectures. However, it introduces two problems.
1. In all other cases these parameters are not used. Thus waLBerla fails in some cases when compiled with -Wall. Other than that it is not nice either to pass unused parameters.
2. For the GPU code generation problems arose with the usage of `get_parameters` in waLBerla:
https://i10git.cs.fau.de/pycodegen/pystencils/-/blob/master/pystencils/astnodes.py#L244
Overall it seems that the easiest way to fix the problem is to only pass the additional size arguments when needed and in no other cases.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/85Opencl datahandling2019-12-05T12:14:33+01:00Stephan SeitzOpencl datahandlingCloses #15
OpenCL kernels are now integrated in the normal `create_kernel` workflow. Also there exists a created a `opencljit.init_globally` function that just creates some CL queue/contex if you do not want to give it as a parameter...Closes #15
OpenCL kernels are now integrated in the normal `create_kernel` workflow. Also there exists a created a `opencljit.init_globally` function that just creates some CL queue/contex if you do not want to give it as a parameter to every kernel.
SerialDatahandling is extended to work with alternative GPU array libraries to PyCuda.
There is now some overlapping code with the `_custom_transfer_functions` but I suppose they are for certain quantities that have a separate transfer function as oppose to using a whole different backend.
@kuron can you have a look on it? I think the solution is not as elegant as I thought it would be.
pycuda.gpuarray.GPUArrays are not wrapped. So if you use `dh.gpuarrays['foo']` you get either a pycuda array or a opencl array. I thought this step would be to drastic for one PR. Using OpenCL should still be a lot easier now.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/366Increase supported python version2024-01-16T11:56:08+01:00Markus HolzerIncrease supported python versionSupport for Python 3.12Support for Python 3.12Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/13implemented derivation of gradient weights via rotation2019-08-03T14:01:23+02:00Markus Holzerimplemented derivation of gradient weights via rotationderive gradient weights of other direction with
already calculated weights of one direction
via rotation and apply them to a field.derive gradient weights of other direction with
already calculated weights of one direction
via rotation and apply them to a field.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/11Fixup for DestructuringBindingsForFieldClass2019-07-18T10:28:09+02:00Stephan SeitzFixup for DestructuringBindingsForFieldClass- rename header Field.h is not a unique name in waLBerla context
- add PyStencilsField.h
- bindings were lacking data type- rename header Field.h is not a unique name in waLBerla context
- add PyStencilsField.h
- bindings were lacking data typehttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/270Fixed kernel_decorator with config parameter2021-11-03T22:23:36+01:00Jan HönigFixed kernel_decorator with config parameterThe current kernel decorator does not work properly with the introduced `CreateKernelConfig`.
This MR fixes that.The current kernel decorator does not work properly with the introduced `CreateKernelConfig`.
This MR fixes that.Jan HönigJan Hönighttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/237Fix Sympy pipeline2021-04-26T16:46:20+02:00Markus HolzerFix Sympy pipelineFix #35Fix #35Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/100Fix Opencl and LLVM GPU tests2019-12-05T11:02:49+01:00Stephan SeitzFix Opencl and LLVM GPU testsFix tests for LLVM GPU and OpenCL
- !96 made it impossible to print functions without names (only important for LLVM GPU test)
- !87 made it impossible to run OpenCL kernels on CUDA OpenCL `int(...)`. is not a valid cast for it
- Sy...Fix tests for LLVM GPU and OpenCL
- !96 made it impossible to print functions without names (only important for LLVM GPU test)
- !87 made it impossible to run OpenCL kernels on CUDA OpenCL `int(...)`. is not a valid cast for it
- SymPy moved `sympy.boolalg` to `sympy.logic.boolalg`https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/369Fix kernel function parameters2024-03-28T13:47:06+01:00Daniel BauerFix kernel function parametersThis MR implements equality and hashing for `PsSymbol` such that the parameters of `KernelFunction`s are unique.
Also improves some error messages.This MR implements equality and hashing for `PsSymbol` such that the parameters of `KernelFunction`s are unique.
Also improves some error messages.Daniel BauerDaniel Bauerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/150Fix import: sympy.numbers -> sympy.core.numbers2020-03-24T00:57:30+01:00Stephan SeitzFix import: sympy.numbers -> sympy.core.numbersApparently `sympy` no longer exports `sympy.numbers` directly.Apparently `sympy` no longer exports `sympy.numbers` directly.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/6Fix deprecation warning: `collections.abc` instead of `abc`2019-07-10T17:04:10+02:00Stephan SeitzFix deprecation warning: `collections.abc` instead of `abc`DeprecationWarning: Using or importing the ABCs from 'collections'
instead of from '' is deprecated, and in 3.8 it will stop workingDeprecationWarning: Using or importing the ABCs from 'collections'
instead of from '' is deprecated, and in 3.8 it will stop workinghttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/305Fix #622022-10-21T09:24:20+02:00Markus HolzerFix #62Fixes problems around #62Fixes problems around #62Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/17Fix #10: Add jinja2 to pystencils's dependencies2019-08-06T08:05:38+02:00Stephan SeitzFix #10: Add jinja2 to pystencils's dependenciesAlternative would be remove jinja2 (see other PR).
However, I think dependency on jinja2 is not to heavy.
This could make some implementations more elegant.Alternative would be remove jinja2 (see other PR).
However, I think dependency on jinja2 is not to heavy.
This could make some implementations more elegant.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/283Draft: Remove too many zeros2023-03-27T10:40:59+02:00Markus HolzerDraft: Remove too many zerosRemove unnecessary from numbers: 1.80000000 --> 1.8Remove unnecessary from numbers: 1.80000000 --> 1.8Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/353Draft: Generalise usage of Structs for nested array access2023-09-28T09:47:11+02:00Markus HolzerDraft: Generalise usage of Structs for nested array accessIn this, MR Structs are introduced in a more general form than they are used in the index kernel. The structs here can hold data and pointers to fields. This makes it possible to iterate over a struct and extract field pointers in each l...In this, MR Structs are introduced in a more general form than they are used in the index kernel. The structs here can hold data and pointers to fields. This makes it possible to iterate over a struct and extract field pointers in each loop iteration. The extracted fields are then updated in the normal loop nest.
The idea can be illustrated in a small example:
```python
import numpy as np
import pystencils as ps
from pystencils.typing import BasicType, FieldPointerSymbol, PointerType
from pystencils.struct import Struct
dtype = BasicType(np.float64)
f = ps.fields(f'f(1): double[3d]')
g = ps.fields(f'g(1): double[3d]')
struct_src = Struct("src")
struct_src.add_member(PointerType(dtype, const=False, restrict=False, double_pointer=True))
struct_dst = Struct("dst")
struct_dst.add_member(PointerType(dtype, const=False, restrict=False, double_pointer=True))
update_rule = [ps.Assignment(FieldPointerSymbol("f", dtype, const=True), struct_src[0]),
ps.Assignment(FieldPointerSymbol("g", dtype, const=False), struct_dst[0]),
ps.Assignment(g.center, f.center)]
ast = ps.create_kernel(update_rule)
```
This produces the following C-Code:
```c++
FUNC_PREFIX void kernel(double ** _data_dst, double ** _data_src, int64_t const _size_dst, int64_t const _size_f_0, int64_t const _size_f_1, int64_t const _size_f_2, int64_t const _stride_f_0, int64_t const _stride_f_1, int64_t const _stride_f_2, int64_t const _stride_g_0, int64_t const _stride_g_1, int64_t const _stride_g_2)
{
for (int64_t ctr_0 = 0; ctr_0 < _size_dst; ctr_0 += 1)
{
double * RESTRICT _data_f = _data_src[ctr_0];
double * RESTRICT _data_g = _data_dst[ctr_0];
for (int64_t ctr_1 = 0; ctr_1 < _size_f_0; ctr_1 += 1)
{
for (int64_t ctr_2 = 0; ctr_2 < _size_f_1; ctr_2 += 1)
{
for (int64_t ctr_3 = 0; ctr_3 < _size_f_2; ctr_3 += 1)
{
_data_g[_stride_g_0*ctr_1 + _stride_g_1*ctr_2 + _stride_g_2*ctr_3] = _data_f[_stride_f_0*ctr_1 + _stride_f_1*ctr_2 + _stride_f_2*ctr_3];
}
}
}
}
}
```
Thus the struct is used as a container for an arbitrary number of subarrays that are all updated at once. Since the struct only holds a single pointer per Element in the above example we can represent it as a double pointer **Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/316Draft: feat: implement `__cuda_array_interface__`2023-09-14T10:43:31+02:00Stephan SeitzDraft: feat: implement `__cuda_array_interface__`https://numba.readthedocs.io/en/stable/cuda/cuda_array_interface.html
This is supported by:
- pycuda
- numba
- cupy
- torch
- nvcv https://github.com/CvCuda/CV-CUDA
- maybe by tensorflow in future: https://github.com/tensorflow/tensorfl...https://numba.readthedocs.io/en/stable/cuda/cuda_array_interface.html
This is supported by:
- pycuda
- numba
- cupy
- torch
- nvcv https://github.com/CvCuda/CV-CUDA
- maybe by tensorflow in future: https://github.com/tensorflow/tensorflow/issues/29039
Also allow to execute with cupy (https://docs.cupy.dev/en/stable/index.html)
instead of pycuda
TODO:
- [ ] check that pointers in correct CUDA context and if not import into
current
- [x] make execution with pycuda aware of `__cuda_array_interface__`
- [ ] what/how to testhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/342Draft: Do not reorder accesses in `move_constants_before_loop`2023-08-18T10:39:05+02:00Daniel BauerDraft: Do not reorder accesses in `move_constants_before_loop`Prior to this MR, `move_constants_before_loop` tries to move constants as far to the top as possible.
This might reorder read/write accesses to fields.
For example:
```python
import pystencils as ps
from pystencils import CreateKernelCo...Prior to this MR, `move_constants_before_loop` tries to move constants as far to the top as possible.
This might reorder read/write accesses to fields.
For example:
```python
import pystencils as ps
from pystencils import CreateKernelConfig
from pystencils.astnodes import Block, KernelFunction, LoopOverCoordinate, SympyAssignment
from pystencils.field import Field, FieldType
from sympy.abc import x, y
field = Field.create_generic("field", 1, field_type=FieldType.CUSTOM)
counter = LoopOverCoordinate.get_loop_counter_symbol(0)
load = SympyAssignment(x, field.absolute_access((counter,), (0,)))
store = SympyAssignment(field.absolute_access((counter+1,), (0,)), 2*x)
body = ps.typing.transformations.add_types(Block([load, store]), CreateKernelConfig())
loop = LoopOverCoordinate(body, 0, 0, 42)
block = Block([loop])
ps.transformations.resolve_field_accesses(block)
new_loops = ps.transformations.cut_loop(loop, [41])
ps.transformations.move_constants_before_loop(new_loops.args[1])
kernel = KernelFunction(
block,
ps.Target.CPU,
ps.Backend.C,
ps.cpu.cpujit.make_python_function,
None,
)
code = ps.get_code_str(kernel)
print(code)
```
prints
```c
FUNC_PREFIX void kernel(double * RESTRICT _data_field, int64_t const _stride_field_0)
{
const double x = _data_field[41*_stride_field_0];
_data_field[42*_stride_field_0] = x*2.0;
{
for (int64_t ctr_0 = 0; ctr_0 < 41; ctr_0 += 1)
{
const double x = _data_field[_stride_field_0*ctr_0];
_data_field[_stride_field_0*(ctr_0 + 1)] = x*2.0;
}
{
}
}
}
```
Note that the last (cut) loop iteration is moved before the primary loop, leading to a wrong load from index 41.
This MR changes `move_constants_before_loop` such that assignments can not be moved before their last modification.
Essentially, it replaces `symbols_defined` by `symbols_modified` [here](https://i10git.cs.fau.de/terraneo/pystencils/-/commit/be78ab165339d593869b5c77ef00a590a63ba130#99785d4b53b75ce54c83c3e499248de2a07fb2cd_598_597).
This new property is implemented for all AST nodes.
Note the implementation of `CustomCCodeNode`. I did not want to introduce breaking changes to the API.
Additionally, declarations are now inserted where the caller requests, instead of pushing them all the way to the top (https://i10git.cs.fau.de/terraneo/pystencils/-/commit/5c65d06216d050c22e28ba0b9487544342fc0926).
Lastly, a test for the new behavior is included.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/224Draft: Develop2023-09-14T11:03:48+02:00Markus HolzerDraft: DevelopThis MR adds two features to pystencils. First, the base pointer specification is revealed to the user which allows producing kernels with less register usage. Second, the summands insider the summation printer are printer recursively no...This MR adds two features to pystencils. First, the base pointer specification is revealed to the user which allows producing kernels with less register usage. Second, the summands insider the summation printer are printer recursively now which allows for more parallelism inside a single core.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/358Draft: [FIX] Index fields exclusively containing coordinates are dropped by c...2024-01-31T11:48:23+01:00Frederik HennigDraft: [FIX] Index fields exclusively containing coordinates are dropped by code generatorIndex fields that exclusively contain coordinate data (members `x`, `y` and `z`) and that are not explicitly accessed in the kernel assignments are dropped by `pystencils.cpu.create_indexed_kernel` in `cpu/kernelcreation.py`, prev. line ...Index fields that exclusively contain coordinate data (members `x`, `y` and `z`) and that are not explicitly accessed in the kernel assignments are dropped by `pystencils.cpu.create_indexed_kernel` in `cpu/kernelcreation.py`, prev. line 119.
Then in line 128 the list of index fields is empty, and the code generator finds no field containing the coordinate information.
Code generation then aborts.
Is there a reason why index fields are first filtered this way?Frederik HennigFrederik Hennig