pystencils merge requestshttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests2020-10-03T16:26:28+02:00https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/172Warning fixes in setup.py2020-10-03T16:26:28+02:00Markus HolzerWarning fixes in setup.pyThis MR fixes some small warnings in setup.py. Additionally, the accuracy for the timeloop test case is lowered due to its often failureThis MR fixes some small warnings in setup.py. Additionally, the accuracy for the timeloop test case is lowered due to its often failurehttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/163Volume-of-Fluid: better tests and make it actually work2020-07-10T21:18:57+02:00Michael Kuronmkuron@icp.uni-stuttgart.deVolume-of-Fluid: better tests and make it actually workThe scheme corresponds to calculating the overlap volume of a cell shifted by the flow velocity with its neighbor cell. The new test does that explicitly, but generates convoluted expressions which should not be used directly because the...The scheme corresponds to calculating the overlap volume of a cell shifted by the flow velocity with its neighbor cell. The new test does that explicitly, but generates convoluted expressions which should not be used directly because they are so slow to evaluate. Thanks to this test, an incorrect sign in the implementation was found and fixed, as well as improper scaling of the fluxes.
@alexander.reinauer, this could be important to you because you are the only other person using volume-of-fluid advection.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/238Versioneer2021-04-27T11:02:20+02:00Markus HolzerVersioneerThis MR enables Versioneer to have a consistent way for pystencils to get a version stringThis MR enables Versioneer to have a consistent way for pystencils to get a version stringMarkus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/344Vectorize all scalar symbols in vector expressions2023-09-04T14:59:27+02:00Daniel BauerVectorize all scalar symbols in vector expressionsPystencils fails to vectorize very simple kernels:
```python
import numpy as np
import pystencils as ps
from pystencils.astnodes import SympyAssignment, TypedSymbol
f = ps.fields("f: [1D]")
x = TypedSymbol("x", np.float64)
kernel = p...Pystencils fails to vectorize very simple kernels:
```python
import numpy as np
import pystencils as ps
from pystencils.astnodes import SympyAssignment, TypedSymbol
f = ps.fields("f: [1D]")
x = TypedSymbol("x", np.float64)
kernel = ps.create_kernel(
[SympyAssignment(x, 2.0), SympyAssignment(f[0], x)],
cpu_vectorize_info={"assume_inner_stride_one": True},
)
ps.show_code(kernel)
```
This example throws an exception in `show_code`, complaining that the printer can not vectorize type casts.
The problem is that `x = 2.0` is moved out of the loop (since it is constant).
What remains in the loop is `f[i] = x`.
While the left-hand-side of this expression is vectorized, the right-hand-side is left scalar, leading to the exception.
The issue comes from the `insert_vector_casts` function.
It traverses each expression from the leafs to the root, leaving scalars scalar [^1] and collating mixed expressions to vectors.
However, it handles the rhs of assignments separate from the lhs, leading to above issue.
Moreover, expressions like `a (vec) + (b (scalar) * c (scalar))` are converted to `a (vec) + CastToVec(b (scalar) * c (scalar))`, which leads to the same exception.
The correct way is to directly cast `b` and `c` to vectors, not their product.
Therefore, `insert_vector_casts` must know beforehand, whether an expression appears inside a vectorized expression.
This MR fixes that for SympyAssignments.
To that end, it first checks whether either side contains a vectorized expression, and if so, casts all symbols to vectors.
Since I am not really sure how to handle the cases for `VectorMemoryAccess` ([line 370/386](https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/344/diffs#2d8e7266b6ec295a8ceac4159fcd9cea9ede6ca8_370_386)) and `ast.Conditional` ([line 374/390](https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/344/diffs#2d8e7266b6ec295a8ceac4159fcd9cea9ede6ca8_374_390)), I left those untouched.
[^1]: The exception is that CastFunctions are always replaced by vector casts. I do not know whether this is intentional.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/212vectorization: improve treatment of unary minus2021-02-19T16:41:53+01:00Michael Kuronmkuron@icp.uni-stuttgart.devectorization: improve treatment of unary minusChanges in !48 caused test failures in https://i10git.cs.fau.de/pycodegen/lbmpy/-/jobs/534348.
Also add a test for vectorized unary minus and `sp.Abs`, suppress a warning with older versions of randomgen and make double-precision vector...Changes in !48 caused test failures in https://i10git.cs.fau.de/pycodegen/lbmpy/-/jobs/534348.
Also add a test for vectorized unary minus and `sp.Abs`, suppress a warning with older versions of randomgen and make double-precision vector RNG accurate on Clang in fast-math mode.
The errors in test_resting_fluid and test_point_force are fixed by https://i10git.cs.fau.de/pycodegen/lbmpy/-/merge_requests/63. The error in test_phi_staggered_equivalence_on_random is fixed by https://i10git.cs.fau.de/pycodegen/pygrandchem/-/merge_requests/4.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/233Vectorization: improve test coverage2021-04-21T19:55:08+02:00Michael Kuronmkuron@icp.uni-stuttgart.deVectorization: improve test coverageSome things were not tested with all available vectorizations. `maskStore` was previously untested and only worked with AVX512 float/double and AVX double. `maskLoad` was clearly broken, unused, and is a pretty useless instruction anyway...Some things were not tested with all available vectorizations. `maskStore` was previously untested and only worked with AVX512 float/double and AVX double. `maskLoad` was clearly broken, unused, and is a pretty useless instruction anyway, so I removed it.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/228Vectorization improvements2021-03-29T22:31:22+02:00Michael Kuronmkuron@icp.uni-stuttgart.deVectorization improvementsAfter we cleaned up vectorization support as part of our ARM Neon experiments a few weeks ago (!188, !220, !222), I did the same thing with AltiVec/VSX intrinsics for POWER processors. Adding a new SIMD instruction set to pystencils real...After we cleaned up vectorization support as part of our ARM Neon experiments a few weeks ago (!188, !220, !222), I did the same thing with AltiVec/VSX intrinsics for POWER processors. Adding a new SIMD instruction set to pystencils really is just a matter of some quick find-and-replace now. I had test access to a POWER8 machine today, ran in both little-endian and big-endian mode, and all tests passed. So pystencils now actually supports _all_ SIMD instruction sets out there (ignoring MIPS and SPARC processors, which are essentially dead).
This pull request also contains some minor unrelated changes:
- switches the AES RNG to aligned stores
- adds a missing `pytest.importorskip`
- fixes the `vec_any`/`vec_all` operations (which used to only work on 256 bit doubles)
- removes the `q_registers` argument from `get_vector_instruction_set` because there is no point in using half-width vectors
- fix the AES-NI RNG on Ice Lake/Tiger Lake processorsMarkus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/266Vectorisation bug2021-11-09T11:21:00+01:00Markus HolzerVectorisation bugFixes #41Fixes #41Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/241Vector scatter/gather support2023-08-18T20:43:16+02:00Michael Kuronmkuron@icp.uni-stuttgart.deVector scatter/gather supportSome modern processors support scatter/gather operations in hardware to be able to vectorize even when the stride between consecutive elements is not 1. Supporting this in pystencils turned out to be surprisingly easy. On a Core i7-7820X...Some modern processors support scatter/gather operations in hardware to be able to vectorize even when the stride between consecutive elements is not 1. Supporting this in pystencils turned out to be surprisingly easy. On a Core i7-7820X, the D3Q19 TRT benchmark shows an appreciable performance benefit for cases with nonideal memory layout:
- 15% for fzyx without assume_inner_stride_one and with split
- 20% for fzyx without assume_inner_stride_one
- 30% for zyxf
AVX2 only supported gather, so this requires AVX512. The internet says it was quite slow on AVX2 processors anyway. Even on AVX512 the latency is quite high and the throughput is quite low, but it's still better than not vectorizing. SVE also supports it, so all future ARM processors will benefit too, and they will probably have better hardware support for higher throughput.
Fixes #34Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/405Various GPU-related and some general fixes.2024-07-23T12:49:16+02:00Frederik HennigVarious GPU-related and some general fixes. - Recombine KernelWrapper APIs of CPU and GPU JIT
- Clean up JIT module
- Fix kernel constraints analysis
- Fix handling of slices with negative start index
- Fix sparse iteration spaces on GPUs
- Reintroduce GPU periodicity module... - Recombine KernelWrapper APIs of CPU and GPU JIT
- Clean up JIT module
- Fix kernel constraints analysis
- Fix handling of slices with negative start index
- Fix sparse iteration spaces on GPUs
- Reintroduce GPU periodicity module
- Extend GPU test cases
- Reintroduce kwargs for create_kernel
- Restrict parsing of structured data types to aligned types
- Expose GPU block size selection through `CreateKernelConfig`Frederik HennigFrederik Hennighttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/371Various fixes to constants2024-03-28T16:03:38+01:00Frederik HennigVarious fixes to constantsVarious fixes to constants:
- Make `PsConstant` immutable
- Add `interpret_as`/`reinterpret_as` to apply data types to constants
- Add docstrings to `PsConstant`
- Add out-of-bounds checking to constant creation
- Add test cas...Various fixes to constants:
- Make `PsConstant` immutable
- Add `interpret_as`/`reinterpret_as` to apply data types to constants
- Add docstrings to `PsConstant`
- Add out-of-bounds checking to constant creation
- Add test cases
- Remove long-obsolete and completely commented-out test filesFrederik HennigFrederik Hennighttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/80Use reproducible hashlib for representing TextureCachedField2019-10-28T12:38:04+01:00Stephan SeitzUse reproducible hashlib for representing TextureCachedFieldTextureCachedField was using `hash(...)` to disambiguate its instances.
However, `hash` is randomized and will hinder reproducible code
generationTextureCachedField was using `hash(...)` to disambiguate its instances.
However, `hash` is randomized and will hinder reproducible code
generationhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/251Use int64 for indexing2021-06-08T08:33:34+02:00Markus HolzerUse int64 for indexingFor indexed kernels, int32 is too small for large domain sizes. Thus the coordinates are cast to int64 in this MR to allow huge domain sizes.
As an example of the adaption the generated code for a Neumann boundary is shown. Before:
```...For indexed kernels, int32 is too small for large domain sizes. Thus the coordinates are cast to int64 in this MR to allow huge domain sizes.
As an example of the adaption the generated code for a Neumann boundary is shown. Before:
```cpp
FUNC_PREFIX void kernel(double * RESTRICT _data_C, uint8_t * RESTRICT const _data_indexField, int64_t const _size_indexField_0, int64_t const _stride_indexField_0)
{
#pragma omp parallel
{
#pragma omp for schedule(static)
for (int64_t ctr_0 = 0; ctr_0 < _size_indexField_0; ctr_0 += 1)
{
const int32_t x = *((int32_t *)(& _data_indexField[12*_stride_indexField_0*ctr_0]));
const int32_t y = *((int32_t *)(& _data_indexField[12*_stride_indexField_0*ctr_0 + 4]));
const int64_t cx [] = { 0, 0, 0, -1, 1, -1, 1, -1, 1 };
const int64_t cy [] = { 0, 1, -1, 0, 0, 1, 1, -1, -1 };
const int invdir [] = { 0, 2, 1, 4, 3, 8, 7, 6, 5 };
const int32_t dir = *((int32_t *)(& _data_indexField[12*_stride_indexField_0*ctr_0 + 8]));
_data_C[x + 11*y] = _data_C[x + 11*y + cx[dir] + 11*cy[dir]];
}
}
}
```
After:
```cpp
FUNC_PREFIX void kernel(double * RESTRICT _data_C, uint8_t * RESTRICT const _data_indexField, int64_t const _size_indexField_0, int64_t const _stride_indexField_0)
{
#pragma omp parallel
{
#pragma omp for schedule(static)
for (int64_t ctr_0 = 0; ctr_0 < _size_indexField_0; ctr_0 += 1)
{
const int64_t x = *((int32_t *)(& _data_indexField[12*_stride_indexField_0*ctr_0]));
const int64_t y = *((int32_t *)(& _data_indexField[12*_stride_indexField_0*ctr_0 + 4]));
const int64_t cx [] = { 0, 0, 0, -1, 1, -1, 1, -1, 1 };
const int64_t cy [] = { 0, 1, -1, 0, 0, 1, 1, -1, -1 };
const int64_t invdir [] = { 0, 2, 1, 4, 3, 8, 7, 6, 5 };
const int64_t dir = *((int32_t *)(& _data_indexField[12*_stride_indexField_0*ctr_0 + 8]));
_data_C[x + 11*y] = _data_C[x + 11*y + cx[dir] + 11*cy[dir]];
}
}
}
```Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/43Use get_type_of_expression in typing_form_sympy_inspection to infer types2019-09-23T16:16:50+02:00Stephan SeitzUse get_type_of_expression in typing_form_sympy_inspection to infer typesDANGER ZONE: this changes something in the core behavior of pystencils. Be careful before merging!
In summary, when `typing_form_sympy_inspection` reaches the point where it would just use `default_type`, we try to use `get_type_of_ex...DANGER ZONE: this changes something in the core behavior of pystencils. Be careful before merging!
In summary, when `typing_form_sympy_inspection` reaches the point where it would just use `default_type`, we try to use `get_type_of_expression` to infer the actual type.
We use information of previously defined variables in current scope.
Another approach would be to just type all the intermediate variable with `auto`.
```python
x = pystencils.fields('x: float32[3d]')
assignments = pystencils.AssignmentCollection({
a: cast_func(10, create_type('float64')),
b: cast_func(10, create_type('uint16')),
e: 11,
c: b,
f: c + b,
d: c + b + x.center + e,
x.center: c + b + x.center
})
```
Before:
```cpp
FUNC_PREFIX void kernel(float * RESTRICT _data_x, int64_t const _size_x_0, int64_t const _size_x_1,
int64_t const _size_x_2, int64_t const _stride_x_0, int64_t const _stride_x_1, int64_t const _stri
de_x_2)
{
const double a = 10.0;
const double b = 10;
const double e = 11.0;
const double c = b;
const double f = b + c;
for (int ctr_0 = 0; ctr_0 < _size_x_0; ctr_0 += 1)
{
float * RESTRICT _data_x_00 = _data_x + _stride_x_0*ctr_0;
for (int ctr_1 = 0; ctr_1 < _size_x_1; ctr_1 += 1)
{
float * RESTRICT _data_x_00_10 = _stride_x_1*ctr_1 + _data_x_00;
for (int ctr_2 = 0; ctr_2 < _size_x_2; ctr_2 += 1)
{
const double d = b + c + e + _data_x_00_10[_stride_x_2*ctr_2];
_data_x_00_10[_stride_x_2*ctr_2] = b + c + _data_x_00_10[_stride_x_2*ctr_2];
}
}
}
}
```
After:
```cpp
FUNC_PREFIX void kernel(float * RESTRICT _data_x, int64_t const _size_x_0, int64_t const _size_x_1,
int64_t const _size_x_2, int64_t const _stride_x_0, int64_t const _stride_x_1, int64_t const _stri
de_x_2)
{
const double a = 10.0;
const uint16_t b = 10;
const int64_t e = 11.0;
const uint16_t c = b;
const uint16_t f = b + c;
for (int ctr_0 = 0; ctr_0 < _size_x_0; ctr_0 += 1)
{
float * RESTRICT _data_x_00 = _data_x + _stride_x_0*ctr_0;
for (int ctr_1 = 0; ctr_1 < _size_x_1; ctr_1 += 1)
{
float * RESTRICT _data_x_00_10 = _stride_x_1*ctr_1 + _data_x_00;
for (int ctr_2 = 0; ctr_2 < _size_x_2; ctr_2 += 1)
{
const float d = b + c + e + _data_x_00_10[_stride_x_2*ctr_2];
_data_x_00_10[_stride_x_2*ctr_2] = b + c + _data_x_00_10[_stride_x_2*ctr_2];
}
}
}
}
```https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/312Use common shape to resolve buffer access2022-12-22T09:41:41+01:00Markus HolzerUse common shape to resolve buffer accessPystencils assume that all fields have the same spatial shape. Thus the field access should also be resolved by one common field shape. This was violated in the GPU kernel creation and should be fixed with this MRPystencils assume that all fields have the same spatial shape. Thus the field access should also be resolved by one common field shape. This was violated in the GPU kernel creation and should be fixed with this MRMarkus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/253Use closest normal for boundary index list with single_link2021-06-18T12:07:29+02:00Markus HolzerUse closest normal for boundary index list with single_linkFor creating the index list just the first stencil entry was taken which is a neighbour of the investigated cell if `single_link=True`. With this MR the discrete normal is calculated and the neighbouring cell in the normal direction is t...For creating the index list just the first stencil entry was taken which is a neighbour of the investigated cell if `single_link=True`. With this MR the discrete normal is calculated and the neighbouring cell in the normal direction is taken to build up the index array.
Furthermore, the computational cost of the python versions for `create_boundary_index_list` is reduced drastically because the iteration space is now restricted to the boundary cells and not the entire domain anymore.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/176Use C11CodePrinter for sympy 1.72020-10-07T16:42:49+02:00Stephan SeitzUse C11CodePrinter for sympy 1.7C++ may cause problems for CUDA/OpenCL (e.g. print `std::log`)C++ may cause problems for CUDA/OpenCL (e.g. print `std::log`)Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/139Use `rich` for syntax highlighting of `show_code` also in terminal2020-01-30T10:18:28+01:00Stephan SeitzUse `rich` for syntax highlighting of `show_code` also in terminalStill works as usual in Jupyter notebooks and in terminal if `rich` is not installed.![Screenshot_20200130_100924](/uploads/5876a0aab841c4a2d736777cac4535f6/Screenshot_20200130_100924.png)Still works as usual in Jupyter notebooks and in terminal if `rich` is not installed.![Screenshot_20200130_100924](/uploads/5876a0aab841c4a2d736777cac4535f6/Screenshot_20200130_100924.png)https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/198Usage of custom boundary functor if given2020-12-20T16:11:13+01:00Sebastian Bindgen Usage of custom boundary functor if givenThis is needed to implement the Lees Edwards boundary conditions submitted in https://i10git.cs.fau.de/pycodegen/lbmpy/-/merge_requests/49.
Custom boundary functors can now be created by users.This is needed to implement the Lees Edwards boundary conditions submitted in https://i10git.cs.fau.de/pycodegen/lbmpy/-/merge_requests/49.
Custom boundary functors can now be created by users.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/304Upgrade maximum supported SymPy version to 1.11.12022-10-10T22:32:05+02:00Markus HolzerUpgrade maximum supported SymPy version to 1.11.1Markus HolzerMarkus Holzer