pystencils issueshttps://i10git.cs.fau.de/pycodegen/pystencils/-/issues2023-03-17T11:26:56+01:00https://i10git.cs.fau.de/pycodegen/pystencils/-/issues/58iteration_slice with step-size for GPU-kernels2023-03-17T11:26:56+01:00Alexander Reinaueriteration_slice with step-size for GPU-kernelsStep-size of `iteration_slice` is ignored for GPU-kernels.
```
import pystencils as ps
dh = ps.create_data_handling(domain_size=(5, 5), periodicity=True, default_target="gpu")
iteration_slice = ps.make_slice[1:-1:2, 1:-1:2]
config = p...Step-size of `iteration_slice` is ignored for GPU-kernels.
```
import pystencils as ps
dh = ps.create_data_handling(domain_size=(5, 5), periodicity=True, default_target="gpu")
iteration_slice = ps.make_slice[1:-1:2, 1:-1:2]
config = ps.CreateKernelConfig(target=dh.default_target, iteration_slice=iteration_slice)
field = dh.add_array("a")
assign = ps.Assignment(field.center, 1.0)
kernel = ps.create_kernel(assign, config=config).compile()
dh.fill(field.name, 0, ghost_layers=True)
if config.target == ps.enums.Target.GPU:
dh.to_gpu(field.name)
dh.run_kernel(kernel)
if config.target == ps.enums.Target.GPU:
dh.to_cpu(field.name)
print(dh.gather_array(field.name, ghost_layers=True))
```
results in
```
[[0. 0. 0. 0. 0. 0. 0.]
[0. 1. 1. 1. 1. 1. 0.]
[0. 1. 1. 1. 1. 1. 0.]
[0. 1. 1. 1. 1. 1. 0.]
[0. 1. 1. 1. 1. 1. 0.]
[0. 1. 1. 1. 1. 1. 0.]
[0. 0. 0. 0. 0. 0. 0.]]
```https://i10git.cs.fau.de/pycodegen/pystencils/-/issues/56Kernelcreation functions take different assignment types2022-05-11T14:40:56+02:00Alexander ReinauerKernelcreation functions take different assignment typesThe documentation for the [`create_indexed_kernel(assignments: List[Assignment], *, config: CreateKernelConfig):`](https://i10git.cs.fau.de/pycodegen/pystencils/-/blob/master/pystencils/kernelcreation.py#L286) function has typehints for ...The documentation for the [`create_indexed_kernel(assignments: List[Assignment], *, config: CreateKernelConfig):`](https://i10git.cs.fau.de/pycodegen/pystencils/-/blob/master/pystencils/kernelcreation.py#L286) function has typehints for the assignments that are different to what the docstring says [`assignments: can be a single assignment, sequence of assignments or an AssignmentCollection`](https://i10git.cs.fau.de/pycodegen/pystencils/-/blob/master/pystencils/kernelcreation.py#L297).
The same is true for [`create_domain_kernel`](https://i10git.cs.fau.de/pycodegen/pystencils/-/blob/master/pystencils/kernelcreation.py#L184) but i also did not check every single one of them.
Independently of that mismatch to me as a user it was suprising that the different kernel creation functions take different types of assignments. Since i mostly work with `create_kernel` i was used to use the `AssignmentCollections` which don't seem to work for the other functions.
<details><summary>The script i tried to use for the indexed kernel</summary>
```
import pystencils as ps
rho = ps.fields("rho :[3D]", layout='zyxf')
collection = ps.AssignmentCollection(main_assignments=[ps.Assignment(rho.center, 1.0)])
config = ps.CreateKernelConfig()
assignments = ps.create_indexed_kernel(assignments=collection, config=config)
````
</details>
If you don't feel like this is an issue, feel free to close it.https://i10git.cs.fau.de/pycodegen/pystencils/-/issues/54Type defaults2022-05-11T14:38:17+02:00Markus HolzerType defaultsWith !292 two new type specifications will be introduced. These concern the typing of numbers. For users, it might get complicated how to set these. Thus, good defaults should be deployed in order to make users set the type as rare as p...With !292 two new type specifications will be introduced. These concern the typing of numbers. For users, it might get complicated how to set these. Thus, good defaults should be deployed in order to make users set the type as rare as possible.Release 1.1Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/issues/49Benchmarking2021-12-03T16:59:08+01:00Markus HolzerBenchmarkingPerformance is key for pystencils, thus there should be nightly benchmarks to check the performance.
Furthermore, it should be possible to easily generate a standalone C-code which can be used for performance measures and code investiga...Performance is key for pystencils, thus there should be nightly benchmarks to check the performance.
Furthermore, it should be possible to easily generate a standalone C-code which can be used for performance measures and code investigations etc.Jan HönigJan Hönighttps://i10git.cs.fau.de/pycodegen/pystencils/-/issues/48Python 3.102024-03-27T18:22:08+01:00Jan HönigPython 3.10We switch to `python3.10` when `conda` and the latest Ubuntu LTS version support that.
Apply pattern matching at important functions.We switch to `python3.10` when `conda` and the latest Ubuntu LTS version support that.
Apply pattern matching at important functions.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/issues/44Kerncraft2021-11-25T12:34:57+01:00Markus HolzerKerncraftCheck the status of kerncraft and check if we should still use it in pystencilsCheck the status of kerncraft and check if we should still use it in pystencilsRelease 1.0Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/issues/43Remove LLVM and OpenCL2021-11-22T21:09:32+01:00Markus HolzerRemove LLVM and OpenCLOpenCL and LLVM are not used by anyone in pystencils. We should deprecate it and tag the last opencl or llvm version pystencils.OpenCL and LLVM are not used by anyone in pystencils. We should deprecate it and tag the last opencl or llvm version pystencils.Release 1.0Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/issues/41Vectorisation Bug with sqrt2021-11-02T21:46:11+01:00Markus HolzerVectorisation Bug with sqrtThis code:
```
instruction_set = 'avx'
dtype = 'float32'
field_layout = 'fzyx'
config = ps.CreateKernelConfig(data_type=dtype,
cpu_vectorize_info={'instruction_set': instruction_set,
...This code:
```
instruction_set = 'avx'
dtype = 'float32'
field_layout = 'fzyx'
config = ps.CreateKernelConfig(data_type=dtype,
cpu_vectorize_info={'instruction_set': instruction_set,
'assume_inner_stride_one': True,
'assume_aligned': False, 'nontemporal': False})
src_field = ps.Field.create_generic('pdfs', 2, dtype, index_dimensions=1, layout=field_layout, index_shape=(9,))
eq = [ps.Assignment(sp.Symbol("xi"), sum(src_field.center_vector)),
ps.Assignment(sp.Symbol("xi_2"), sp.Symbol("xi") * sp.sqrt(src_field.center))]
ps.create_kernel(eq, config=config).compile()
```
produces:
```
g++ -c -Ofast -DNDEBUG -fPIC -march=native -fopenmp -std=c++11 -I/home/markus/miniconda3/envs/pystencils/include/python3.9 -I/home/markus/pystencils/pystencils/pystencils/include -o /home/markus/.cache/pystencils/objectcache/tmpxd6uc9oo /home/markus/.cache/pystencils/objectcache/mod_904f8a018b25a7d1c9d288f1189161f438a5e0a7a0cff26c9e1568ae28c4bd2f.cpp
/home/markus/.cache/pystencils/objectcache/mod_904f8a018b25a7d1c9d288f1189161f438a5e0a7a0cff26c9e1568ae28c4bd2f.cpp: In function ‘void kernel_kernel(float*, int64_t, int64_t, int64_t, int64_t)’:
/home/markus/.cache/pystencils/objectcache/mod_904f8a018b25a7d1c9d288f1189161f438a5e0a7a0cff26c9e1568ae28c4bd2f.cpp:33:61: error: cannot convert ‘const __m256’ to ‘float’
33 | const __m256 xi_2 = _mm256_mul_ps(_mm256_set_ps(xi,xi,xi,xi,xi,xi,xi,xi),_mm256_sqrt_ps(_mm256_loadu_ps(& _data_pdfs_20_10[ctr_0])));
| ^~
| |
| const __m256
In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/11.1.0/include/immintrin.h:43,
from /home/markus/.cache/pystencils/objectcache/mod_904f8a018b25a7d1c9d288f1189161f438a5e0a7a0cff26c9e1568ae28c4bd2f.cpp:2:
/usr/lib/gcc/x86_64-pc-linux-gnu/11.1.0/include/avxintrin.h:1256:22: note: initializing argument 1 of ‘__m256 _mm256_set_ps(float, float, float, float, float, float, float, float)’
1256 | _mm256_set_ps (float __A, float __B, float __C, float __D,
```Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/issues/40SymPy 1.10 is broken2023-06-03T15:05:25+02:00Markus HolzerSymPy 1.10 is brokenThe Latest Sympy master is at the moment not supported by pystencils. Problems occur with `deepcopy`. An example of failure is shown here:
https://i10git.cs.fau.de/holzer/pystencils/-/jobs/658250.
The problem was introduced with this c...The Latest Sympy master is at the moment not supported by pystencils. Problems occur with `deepcopy`. An example of failure is shown here:
https://i10git.cs.fau.de/holzer/pystencils/-/jobs/658250.
The problem was introduced with this commit to SymPy:
https://github.com/sympy/sympy/commit/ef0de0c80ab13501076d9bf611a111250bbc88b0Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/issues/39Use Optimisations2021-11-19T11:17:21+01:00Markus HolzerUse OptimisationsThe optimisations in `math_optimisation.py` are not used in pystencils. They should be enabled by default and the user should have the option to turn them off.
Furthermore, more optimisations, like insertion of constants etc. should be ...The optimisations in `math_optimisation.py` are not used in pystencils. They should be enabled by default and the user should have the option to turn them off.
Furthermore, more optimisations, like insertion of constants etc. should be enabled by defaultMarkus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/issues/38Bug in vectorisation2021-10-26T09:55:31+02:00Markus HolzerBug in vectorisation```
from pystencils.session import *
vectorization_options = {'instruction_set': "neon",
'assume_aligned': True,
'nontemporal': True,
'assume_inner_stride_one': ...```
from pystencils.session import *
vectorization_options = {'instruction_set': "neon",
'assume_aligned': True,
'nontemporal': True,
'assume_inner_stride_one': True,
'assume_sufficient_line_padding': True}
g = ps.Field.create_generic("g", spatial_dimensions=2,
index_shape=(1,), layout="fzyx", dtype="double")
h = g.new_field_with_different_name("h")
config = ps.CreateKernelConfig(cpu_vectorize_info=vectorization_options)
up = ps.Assignment(g.center, h.center)
ast = ps.create_kernel(up, config=config)
ast.compile()
```
gives:
```clang++ -c -Ofast -DNDEBUG -fPIC -Xclang -fopenmp -std=c++11 /opt/local/lib/libomp/libomp.dylib -I/opt/local/Library/Frameworks/Python.framework/Versions/3.9/include/python3.9 -I/Users/holzer/pystencils/pystencils/pystencils/include -o /Users/holzer/Library/Caches/pystencils/objectcache/tmpkktzz1o_ /Users/holzer/Library/Caches/pystencils/objectcache/mod_71309076e2e843a225a7d8121d22b552f6c3a9eba5755088cdcc2d92cbd92e4a.cpp
clang: warning: /opt/local/lib/libomp/libomp.dylib: 'linker' input unused [-Wunused-command-line-argument]
/Users/holzer/Library/Caches/pystencils/objectcache/mod_71309076e2e843a225a7d8121d22b552f6c3a9eba5755088cdcc2d92cbd92e4a.cpp:22:74: error: use of undeclared identifier '_stride_g_0'; did you mean '_stride_g_1'?
if (((uintptr_t) &_data_g_20_10[ctr_0] & _clsize_mask) == 0 && (_stride_g_0*ctr_0 + _stride_g_1*ctr_1 + _clsize/8) < _size_g_0*_size_g_1) {
^~~~~~~~~~~
_stride_g_1
/Users/holzer/Library/Caches/pystencils/objectcache/mod_71309076e2e843a225a7d8121d22b552f6c3a9eba5755088cdcc2d92cbd92e4a.cpp:8:156: note: '_stride_g_1' declared here
FUNC_PREFIX void kernel_kernel(double * RESTRICT _data_g, double * RESTRICT const _data_h, int64_t const _size_g_0, int64_t const _size_g_1, int64_t const _stride_g_1, int64_t const _stride_h_1)
^
1 error generated.
```
Vectorisation with a field of undefined size does not work properly at the moment.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/issues/36Test case for FVM source term and fluctuations2022-01-10T15:10:05+01:00Michael Kuronmkuron@icp.uni-stuttgart.deTest case for FVM source term and fluctuationsPlease create a pystencils test case from your reactive electrokinetics code. We currently don't have test coverage for the discretization of the source term. While you're add it, include the fluctuating EK test as well.Please create a pystencils test case from your reactive electrokinetics code. We currently don't have test coverage for the discretization of the source term. While you're add it, include the fluctuating EK test as well.IngoIngohttps://i10git.cs.fau.de/pycodegen/pystencils/-/issues/35sympy master CI job fails inside TypedSymbol2021-04-26T18:24:04+02:00Michael Kuronmkuron@icp.uni-stuttgart.desympy master CI job fails inside TypedSymbolJob [#567086](https://i10git.cs.fau.de/pycodegen/pystencils/-/jobs/567086) failed for 059de5fb2898eea1a69172ee74b9d9396cacd60c:
```
File "/builds/pycodegen/pystencils/pystencils/data_types.py", line 219, in __new__
obj = TypedSymb...Job [#567086](https://i10git.cs.fau.de/pycodegen/pystencils/-/jobs/567086) failed for 059de5fb2898eea1a69172ee74b9d9396cacd60c:
```
File "/builds/pycodegen/pystencils/pystencils/data_types.py", line 219, in __new__
obj = TypedSymbol.__xnew_cached_(cls, *args, **kwds)
File "/opt/conda/lib/python3.8/site-packages/sympy/core/cache.py", line 74, in wrapper
retval = func(*args, **kwargs)
TypeError: __new_stage2__() missing 1 required positional argument: 'dtype'
```Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/issues/34Use AVX512 scatter/gather2021-05-02T15:32:28+02:00Michael Kuronmkuron@icp.uni-stuttgart.deUse AVX512 scatter/gatherAVX512 has `_mm512_i32gather_ps`/`_mm512_i32gather_pd` that can load strided data into a vector and `_mm512_i32scatter_ps`/`_mm512_i32scatter_pd` which can write a vector to strided memory. These can be used to enable vectorization witho...AVX512 has `_mm512_i32gather_ps`/`_mm512_i32gather_pd` that can load strided data into a vector and `_mm512_i32scatter_ps`/`_mm512_i32scatter_pd` which can write a vector to strided memory. These can be used to enable vectorization without `assume_inner_stride_one=True`. For LBM, this would permit vectorizing the streaming and collision kernels simultaneously. In general, it would also allow vectorizing with zyxf memory layout. I don't know how big the performance benefits would be, but it's certainly worth trying.
The Fujitsu A64fx also has special hardware support for scatter/gather with SVE instructions and would probably perform even better.https://i10git.cs.fau.de/pycodegen/pystencils/-/issues/32Check for correct alignment offset in cpujit2021-03-03T16:41:22+01:00Michael Kuronmkuron@icp.uni-stuttgart.deCheck for correct alignment offset in cpujitThe following segfaults without a reasonable error message due to incorrect alignment:
```python
from pystencils.session import *
domain_size = (128, 128)
dh = ps.create_data_handling(domain_size, periodicity=(True, True), default_targe...The following segfaults without a reasonable error message due to incorrect alignment:
```python
from pystencils.session import *
domain_size = (128, 128)
dh = ps.create_data_handling(domain_size, periodicity=(True, True), default_target='cpu')
src = dh.add_array("src", values_per_cell=1, dtype=np.float64, ghost_layers=1, alignment=32)
dh.fill(src.name, 1.0, ghost_layers=True)
dst = dh.add_array("dst", values_per_cell=1, dtype=np.float64, ghost_layers=1, alignment=32)
dh.fill(dst.name, 1.0, ghost_layers=True)
update_rule = ps.Assignment(dst[0, 0], src[0, 0])
opt = {'instruction_set': 'avx', 'assume_aligned': True, 'nontemporal': True, 'assume_inner_stride_one': True}
ast = ps.create_kernel(update_rule, target=dh.default_target, cpu_vectorize_info=opt)
kernel = ast.compile()
ps.show_code(ast)
dh.run_kernel(kernel)
```
This is because the kernel has zero ghost layers but the fields have one, so alignment is inconsistent. It would have been avoided by adding `ghost_layers=1` to the `ps.create_kernel` call, by removing it from the `ps.create_data_handling` call, or by changing the update rule to an assignment that includes neighbors (e.g. `ps.Assignment(dst[0, 0], src[0, 1])`.
We should show a good error message, probably by adding something like the following to cpujit.py:
```python
if ast_node.instruction_set:
offset = (ast_node.instruction_set['width'] - ast_node.ghost_layers) * item_size
offset_cond = "((uintptr_t) buffer_{name}.buf) % buffer_{name}.strides[0] == {offset}".format(name=field.name, offset=str(offset))
pre_call_code += template_check_array.format(cond=offset_cond, what="offset", name=field.name, expected=str(offset))
```Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/issues/31Kernel expects wrong shape of array2021-03-03T16:41:22+01:00Markus HolzerKernel expects wrong shape of arrayThe issue can be reproduced with the following code:
```
from pystencils.session import *
domain_size = (132, 128)
dh = ps.create_data_handling(domain_size, periodicity=(True, True), default_target='cpu')
src = dh.add_array("src", va...The issue can be reproduced with the following code:
```
from pystencils.session import *
domain_size = (132, 128)
dh = ps.create_data_handling(domain_size, periodicity=(True, True), default_target='cpu')
src = dh.add_array("src", values_per_cell=1, dtype=np.float64, ghost_layers=1, alignment=True)
dh.fill(src.name, 1.0, ghost_layers=True)
dst = dh.add_array("dst", values_per_cell=1, dtype=np.float64, ghost_layers=1, alignment=True)
dh.fill(dst.name, 1.0, ghost_layers=True)
update_rule = ps.Assignment(dst[0, 0], src[-1, 0] + src[0, 0])
opt = {'instruction_set': 'avx', 'assume_aligned': True, 'nontemporal': True, 'assume_inner_stride_one': True}
ast = ps.create_kernel(update_rule, target=dh.default_target, cpu_vectorize_info=opt)
kernel = ast.compile()
dh.run_kernel(kernel)
```
If `alignment` is set to false (and thus `assume_aligned` and `nontemporal`) everything works fine.
Even with the alignment of the array set to True and `assume_aligned` set to false the error does not occur.
The error is the following:
```
ValueError Traceback (most recent call last)
<ipython-input-1-f6893d7e30e1> in <module>
17 kernel = ast.compile()
18
---> 19 dh.run_kernel(kernel)
~/pystencils/pystencils/pystencils/datahandling/serial_datahandling.py in run_kernel(self, kernel_function, **kwargs)
241 def run_kernel(self, kernel_function, **kwargs):
242 arrays = self.gpu_arrays if kernel_function.ast.backend in self._GPU_LIKE_BACKENDS else self.cpu_arrays
--> 243 kernel_function(**{**arrays, **kwargs})
244
245 def get_kernel_kwargs(self, kernel_function, **kwargs):
~/pystencils/pystencils/pystencils/kernel_wrapper.py in __call__(self, **kwargs)
16
17 def __call__(self, **kwargs):
---> 18 return self.kernel(**kwargs)
19
20 @property
ValueError: Wrong shape of array dst. Expected (133, 130)
```https://i10git.cs.fau.de/pycodegen/pystencils/-/issues/30Vectorization warning issues2021-03-02T19:50:51+01:00Jan HönigVectorization warning issuesThere seem to be some vectorization issues.
First, we have a very simple 1D-Kernel [example.py](/uploads/b0bf171fcabe7894ba8c38cb6f526436/test.py), which should not have any issues. And yet I receive the warning:
```
pystencils/cpu/vect...There seem to be some vectorization issues.
First, we have a very simple 1D-Kernel [example.py](/uploads/b0bf171fcabe7894ba8c38cb6f526436/test.py), which should not have any issues. And yet I receive the warning:
```
pystencils/cpu/vectorization.py:127: UserWarning: Could not vectorize loop because of non-consecutive memory access warnings.warn("Could not vectorize loop because of non-consecutive memory access")
```
The second [example2.py](/uploads/92f7f12c046dc87449b910262861ff7e/test1.py) has the field `y` on both lhs and rhs side of the assignment. However, the stencil-access pattern is the same und thus now non-consecutive memory access should be present. Yet I receive the warning.
The third [example3.py](/uploads/92a997c2a76293d8742dc65cb8903d68/test2.py) has non-consecutive memory access and the warning is correct.
It is possible that I misunderstood something with the vectoriziation.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/issues/28Get Assembly output of kernel2021-12-10T12:23:39+01:00Markus HolzerGet Assembly output of kernelI think it would be helpful sometimes to have something like `ps.show_assembly` which directly prints the Assembly Code of a kernel.I think it would be helpful sometimes to have something like `ps.show_assembly` which directly prints the Assembly Code of a kernel.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/issues/27Use pylikwid in pystencils2021-11-19T11:44:42+01:00Markus HolzerUse pylikwid in pystencilsI think it would be a good idea to use pylikwid in pystencils. In that way pystencils could make sure, that memory allocation, initialization, and execution are done on the same process etc.
Especially, when it comes to reliable perfor...I think it would be a good idea to use pylikwid in pystencils. In that way pystencils could make sure, that memory allocation, initialization, and execution are done on the same process etc.
Especially, when it comes to reliable performance benchmarks, pystencils could highly profit from the pylikwid functionality.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/issues/26Use AVX512 masked intrinsics2021-11-19T15:43:59+01:00Michael Kuronmkuron@icp.uni-stuttgart.deUse AVX512 masked intrinsicsAVX512 provides intrinsics like `_mm512_mask_add_pd`, which is like `_mm512_add_pd` with a write mask. This can be used to efficiently filter out writes to non-fluid cells. It might also be useful to optimize things like `sp.Piecewise`. ...AVX512 provides intrinsics like `_mm512_mask_add_pd`, which is like `_mm512_add_pd` with a write mask. This can be used to efficiently filter out writes to non-fluid cells. It might also be useful to optimize things like `sp.Piecewise`. Would also work with SVE vectorization on future ARM processors.Jan HönigJan Hönig