pystencils merge requestshttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests2019-12-05T11:02:49+01:00https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/100Fix Opencl and LLVM GPU tests2019-12-05T11:02:49+01:00Stephan SeitzFix Opencl and LLVM GPU testsFix tests for LLVM GPU and OpenCL
- !96 made it impossible to print functions without names (only important for LLVM GPU test)
- !87 made it impossible to run OpenCL kernels on CUDA OpenCL `int(...)`. is not a valid cast for it
- Sy...Fix tests for LLVM GPU and OpenCL
- !96 made it impossible to print functions without names (only important for LLVM GPU test)
- !87 made it impossible to run OpenCL kernels on CUDA OpenCL `int(...)`. is not a valid cast for it
- SymPy moved `sympy.boolalg` to `sympy.logic.boolalg`https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/101WIP: Add csqrt, cpow to cuda_complex.hpp2021-11-22T15:41:05+01:00Stephan SeitzWIP: Add csqrt, cpow to cuda_complex.hppApparently, I'm using here a feature of a more recent C++ verion.
Specializing `cpow(T)` to `cpow(complex<T>)`Apparently, I'm using here a feature of a more recent C++ verion.
Specializing `cpow(T)` to `cpow(complex<T>)`https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/106WIP: Cuda autotune2020-10-07T13:04:35+02:00Stephan SeitzWIP: Cuda autotuneThis PR introduces ~~two~~ one change~~s~~:
- ~~rotate (32,1,1) depending on field strides to fastest dimension. So (1,1,32) for c-layout and (32,1,1) for fortran layout. So pystencils will be fast also for c-layout (this will always be...This PR introduces ~~two~~ one change~~s~~:
- ~~rotate (32,1,1) depending on field strides to fastest dimension. So (1,1,32) for c-layout and (32,1,1) for fortran layout. So pystencils will be fast also for c-layout (this will always be performed)~~
- auto-tune the block dimensions to whatevers is fastest for a specific kernel on localhost. On first kernel call different layouts are tried and the kernel will be called henceforth with the fastest configuration (disk_cached). This could be intersting for OpenCL where we don't know which launch config is the fastest (on OpenCL the runtime can alternatively give a hint on that).
One drawback: the test calls are only correct if input and output fields do not overlap (so no in-place kernels).https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/111Test pystencils_autodiff in integration test2019-12-17T18:56:19+01:00Stephan SeitzTest pystencils_autodiff in integration testhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/112Add CI minimal CI test for old sympy2019-12-17T18:51:26+01:00Stephan SeitzAdd CI minimal CI test for old sympyThe minimal test cannot catch everything but its something.The minimal test cannot catch everything but its something.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/113Test pystencils_autodiff in integration test2020-01-08T13:49:44+01:00Stephan SeitzTest pystencils_autodiff in integration testhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/116Throw error when trying to sympify `pystencils.Field` (e.g. using it in an...2020-01-03T13:24:14+01:00Stephan SeitzThrow error when trying to sympify `pystencils.Field` (e.g. using it in an...Throw error when trying to sympify `pystencils.Field` (e.g. using it in an Assignment without indexing)
This is a typical error when using pystencils: you forget the index and use a field directly in an Assignment.
Edit: apparently...Throw error when trying to sympify `pystencils.Field` (e.g. using it in an Assignment without indexing)
This is a typical error when using pystencils: you forget the index and use a field directly in an Assignment.
Edit: apparently, this error is only triggered on recent versions of Sympy that can sympify using `__sympy__` (not on CI).https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/117WIP: Add InterpolatorAccess.__getnewargs__2020-01-28T14:23:15+01:00Stephan SeitzWIP: Add InterpolatorAccess.__getnewargs__it was missing and instead TypedSymbol.__getnewargs__ was usedit was missing and instead TypedSymbol.__getnewargs__ was usedhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/144Add TypedMatrixSymbol (for usage of `MatrixSymbol` in kernels)2020-02-21T15:16:29+01:00Stephan SeitzAdd TypedMatrixSymbol (for usage of `MatrixSymbol` in kernels)I don't know whether this is a good idea but SymPy supports assigning MatrixSymbols. Like
```python
>>>A = MatrixSymbols('A', 3, 3)
>>>B = MatrixSymbols('B', 3, 3)
In [12]: pystencils.Assignment(A, B) ...I don't know whether this is a good idea but SymPy supports assigning MatrixSymbols. Like
```python
>>>A = MatrixSymbols('A', 3, 3)
>>>B = MatrixSymbols('B', 3, 3)
In [12]: pystencils.Assignment(A, B)
Out[12]: A := B
```
With this hack I can generate code like this:
```cpp
#define FUNC_PREFIX static
2 FUNC_PREFIX void kernel(float * RESTRICT _data_y, int64_t const _size_y_0, int64_t const _size_y_1, int64_t const _size_y_2, int64_t
const _stride_y_0, int64_t const _stride_y_1, int64_t const _stride_y_2, std::function< Vector3 < double >(int, int, int) > my_fun)
3 {
4 for (int ctr_0 = 0; ctr_0 < _size_y_0; ctr_0 += 1)
5 {
6 float * RESTRICT _data_y_00 = _data_y + _stride_y_0*ctr_0;
7 for (int ctr_1 = 0; ctr_1 < _size_y_1; ctr_1 += 1)
8 {
9 float * RESTRICT _data_y_00_10 = _stride_y_1*ctr_1 + _data_y_00;
10 for (int ctr_2 = 0; ctr_2 < _size_y_2; ctr_2 += 1)
11 {
12 const Vector3<double> A = my_fun(ctr_0, ctr_1, ctr_2);
13 _data_y_00_10[_stride_y_2*ctr_2] = A[0] + A[1] + A[2];
14 }
15 }
16 }
17 }
1 #define FUNC_PREFIX static
2 template <class Functor_T>
3 FUNC_PREFIX void kernel(float * RESTRICT _data_y, int64_t const _size_y_0, int64_t const _size_y_1, int64_t const _size_y_2, int64_t
const _stride_y_0, int64_t const _stride_y_1, int64_t const _stride_y_2, Functor_T my_fun)
4 {
5 for (int ctr_0 = 0; ctr_0 < _size_y_0; ctr_0 += 1)
6 {
7 float * RESTRICT _data_y_00 = _data_y + _stride_y_0*ctr_0;
8 for (int ctr_1 = 0; ctr_1 < _size_y_1; ctr_1 += 1)
9 {
10 float * RESTRICT _data_y_00_10 = _stride_y_1*ctr_1 + _data_y_00;
11 for (int ctr_2 = 0; ctr_2 < _size_y_2; ctr_2 += 1)
12 {
13 const Vector3<double> A = my_fun(ctr_0, ctr_1, ctr_2);
14 _data_y_00_10[_stride_y_2*ctr_2] = A[0] + A[1] + A[2];
15 }
16 }
17 }
18 }
```
from
```python
x, y = pystencils.fields('x, y: float32[3d]')
from pystencils.data_types import TypedMatrixSymbol
A = TypedMatrixSymbol('A', 3, 1, create_type('double'), 'Vector3<double>')
my_fun_call = DynamicFunction(TypedSymbol('my_fun',
'std::function< Vector3 < double >(int, int, int) >'),
A.dtype,
*pystencils.x_vector(3))
assignments = pystencils.AssignmentCollection({
A: my_fun_call,
y.center: A[0] + A[1] + A[2]
})
ast = pystencils.create_kernel(assignments)
pystencils.show_code(ast, custom_backend=FrameworkIntegrationPrinter())
```https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/150Fix import: sympy.numbers -> sympy.core.numbers2020-03-24T00:57:30+01:00Stephan SeitzFix import: sympy.numbers -> sympy.core.numbersApparently `sympy` no longer exports `sympy.numbers` directly.Apparently `sympy` no longer exports `sympy.numbers` directly.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/151Use dark mode for code preview if user prefers `prefers-color-scheme: dark`2020-04-23T07:59:41+02:00Stephan SeitzUse dark mode for code preview if user prefers `prefers-color-scheme: dark`pystencils currently does not look good in dark mode :/pystencils currently does not look good in dark mode :/https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/175WIP: Opencl to SPIR-V ahead-of-time compilation2023-03-16T12:42:10+01:00Stephan SeitzWIP: Opencl to SPIR-V ahead-of-time compilationThis does not yet use the pystencils' cache folder or disk caching of the compilation.
This can be used to embed compiled bytecode into waLBerla executables as I do with my Vulkan wrapper. Not sure if this is a good way to go but at lea...This does not yet use the pystencils' cache folder or disk caching of the compilation.
This can be used to embed compiled bytecode into waLBerla executables as I do with my Vulkan wrapper. Not sure if this is a good way to go but at least we can experiment with it.
A good way to proceed with this MR is also a comparison between hip/sicl/ocl/vulkan in order to identify a suitable backend for pystencils.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/183Updated Kerncraft Coupling2020-11-06T15:45:24+01:00Julian HammerUpdated Kerncraft Couplinghttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/187WIP: ARM NEON vectorization2020-11-18T14:59:55+01:00Michael Kuronmkuron@icp.uni-stuttgart.deWIP: ARM NEON vectorizationWith Apple's new laptops having ARM processors, I thought it might be time to add ARM NEON vectorization to pystencils. I don't currently have hardware to test on, but a bunch of test cases from both pystencils and lbmpy at least compile...With Apple's new laptops having ARM processors, I thought it might be time to add ARM NEON vectorization to pystencils. I don't currently have hardware to test on, but a bunch of test cases from both pystencils and lbmpy at least compile successfully. A Raspberry Pi 4 might actually be a useful and cheap device to add to CI for this purpose.
This may also become useful once ARM HPC clusters actually get deployed, though these might end up using SVE instead of NEON -- while I have added a few `if`s for that case, additional work is needed because SVE's vector width is determined at runtime.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/210WIP: Assembly2021-03-26T20:17:13+01:00Markus HolzerWIP: AssemblyAdds the functionality to directly show the assembly output of the generated code.
Further, the base pointer specification is revealed to the user which is helpful to minimize register spilling in some cases.Adds the functionality to directly show the assembly output of the generated code.
Further, the base pointer specification is revealed to the user which is helpful to minimize register spilling in some cases.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/224Draft: Develop2023-09-14T11:03:48+02:00Markus HolzerDraft: DevelopThis MR adds two features to pystencils. First, the base pointer specification is revealed to the user which allows producing kernels with less register usage. Second, the summands insider the summation printer are printer recursively no...This MR adds two features to pystencils. First, the base pointer specification is revealed to the user which allows producing kernels with less register usage. Second, the summands insider the summation printer are printer recursively now which allows for more parallelism inside a single core.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/225WIP: ARM cache line zeroing2021-04-01T22:59:29+02:00Michael Kuronmkuron@icp.uni-stuttgart.deWIP: ARM cache line zeroingARM has a cache line zero instruction that prevents data that will be overwritten anyway from being loaded from RAM. Kind of a light version of a non-temporal store. Saw this near the end of https://www.youtube.com/watch?v=BP7XD7JHgrI in...ARM has a cache line zero instruction that prevents data that will be overwritten anyway from being loaded from RAM. Kind of a light version of a non-temporal store. Saw this near the end of https://www.youtube.com/watch?v=BP7XD7JHgrI in the context of SVE, but might be relevant on Neon too. Just wanted to keep a note of this here.
Integrating this into pystencils is probably not completely straight-forward as you first need to check how much would be zeroed (64 bytes on all current chips, not guaranteed to match the cache line size), zero it, and then write the corresponding amount of data. Not sure if there are guarantees as to whether it's a multiple of the vector width.
There is not a whole lot of information for ARM, but the exact same thing has existed on IBM‘s PowerPC architecture (!228) for decades. There, a cache line has 128 bytes (can be queried from the kernel via `sysconf(_SC_LEVEL1_DCACHE_LINESIZE)`) and can be zeroed with the `__dcbz` intrinsic.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/229Add type conversion for SP types2021-04-03T06:01:47+02:00Markus HolzerAdd type conversion for SP typesIf Assignments are already typed for double-precision but the kernel is created for single-precision the assignments should be adapted.If Assignments are already typed for double-precision but the kernel is created for single-precision the assignments should be adapted.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/237Fix Sympy pipeline2021-04-26T16:46:20+02:00Markus HolzerFix Sympy pipelineFix #35Fix #35Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/250Switch index type from int32 to int642021-06-07T13:38:27+02:00Markus HolzerSwitch index type from int32 to int64For large domain sizes, int32 is not sufficient. Thus it is planned for waLBerla to change `cell_index_t` from `int` to `int64`. To make it consistent with pystencils and to prevent conversion warnings the index type for pystencils is al...For large domain sizes, int32 is not sufficient. Thus it is planned for waLBerla to change `cell_index_t` from `int` to `int64`. To make it consistent with pystencils and to prevent conversion warnings the index type for pystencils is also adapted to int64
Fixes https://i10git.cs.fau.de/pycodegen/lbmpy/-/issues/18Markus HolzerMarkus Holzer