pystencils merge requestshttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests2021-11-09T11:21:00+01:00https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/266Vectorisation bug2021-11-09T11:21:00+01:00Markus HolzerVectorisation bugFixes #41Fixes #41Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/265Fix deepcopy issue with Sympy 1.92023-06-01T22:17:18+02:00Michael Kuronmkuron@icp.uni-stuttgart.deFix deepcopy issue with Sympy 1.9Fixes https://i10git.cs.fau.de/pycodegen/pystencils/-/issues/40. Caused by https://github.com/sympy/sympy/pull/21260.Fixes https://i10git.cs.fau.de/pycodegen/pystencils/-/issues/40. Caused by https://github.com/sympy/sympy/pull/21260.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/264Clean up and Bug Fixes2021-10-26T09:55:30+02:00Markus HolzerClean up and Bug FixesFixes #38
Further changes:
- Third party warnings are suppressed now as they just polluted the warning summary in pystencils.
- Removing the math optimisations as they are not used in pystencils. An Issue is created to reintroduce them...Fixes #38
Further changes:
- Third party warnings are suppressed now as they just polluted the warning summary in pystencils.
- Removing the math optimisations as they are not used in pystencils. An Issue is created to reintroduce them again in a clean way in the future
- Removing the joblib workaround as joblib is a hard dependency of pystencils
- Fixing SymPy 1.9 as maximum version
- Removing sympy.multiply_elementwise due to https://github.com/sympy/sympy/issues/22353. A workaround with NumPy is introduced which seems to be way more robust because NumPy has a more static API
- Introducing more extra requires, which are needed to have all features of pystencils
- Minor cleanup of some test cases and the datahandlingMarkus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/263Fixed wrong type hints. Updated setup.py2021-09-16T22:47:01+02:00Jan HönigFixed wrong type hints. Updated setup.pyAdded authors and changed the package's email to a more general solution.Added authors and changed the package's email to a more general solution.Jan HönigJan Hönighttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/262fixed create_kernel parameter data_type="float" to procucde single precision2021-09-14T18:35:29+02:00Christoph Altfixed create_kernel parameter data_type="float" to procucde single precisionCurrently if create_kernel(assignments, data_type="float") is used then the untyped symbols are typed with float64, since the np.dtype("float") creates this during the construction of a new TypedSymbol.
Since data_type or as it is calle...Currently if create_kernel(assignments, data_type="float") is used then the untyped symbols are typed with float64, since the np.dtype("float") creates this during the construction of a new TypedSymbol.
Since data_type or as it is called in cpu.create_kernel type_info can be an string of an C type, At least following the [documentation of cpu.create_kernel](https://i10git.cs.fau.de/pycodegen/pystencils/-/blob/master/pystencils/cpu/kernelcreation.py#L31) this behavior is a bit confusing, since typical the C type specifier "float" is meant to be single precision.
So I added a small function that just replaces "float" with "single" in the symbol_to_type dict, so the untyped symbols get the single precision type.Christoph AltChristoph Althttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/261`create_kernel` API Update2021-11-21T21:58:15+01:00Jan Hönig`create_kernel` API UpdateTo reduce the `kwargs` hell, we introduced dataclasses, which handle the settings of `create_kernel` and similar functions.
In addition we introduced type-hints for the API functions to increase usability and simplify development.To reduce the `kwargs` hell, we introduced dataclasses, which handle the settings of `create_kernel` and similar functions.
In addition we introduced type-hints for the API functions to increase usability and simplify development.Jan HönigJan Hönighttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/260Fix pipeline2021-08-19T07:56:55+02:00Markus HolzerFix pipelineDue to an update of the docker containers, some parts of pystencils need to be adapted.
- The test `test_field_layouts` in `test_buffer_gpu` fails because the `zeros` function of a gpu_array can only deal with continuous data since 2021...Due to an update of the docker containers, some parts of pystencils need to be adapted.
- The test `test_field_layouts` in `test_buffer_gpu` fails because the `zeros` function of a gpu_array can only deal with continuous data since 2021.1. Thus the test case fails now with the `zyxf` field layout. This was fixed by replacing `zeros_like` with `empty_like`
- matplotlib has dropped deprecated functionality. In particular, it is now necessary to distinguish between Axes3D and Axes. This was fixed by using Axes3D now.
- randomgen 1.20.0 [changed](https://github.com/bashtage/randomgen/pull/258) the behavior of `Philox`'s `advance` method. This was fixed by directly seeding to the correct counter value.
- kerncraft 0.8.10 made some internal changes which produces floats now. The problem was introduced in line 516 [in this commit](https://github.com/RRZE-HPC/kerncraft/commit/74930334b46e12194484594b8834bf091665c9ec#diff-7211fed2c4fe4632c06646fa9d1ec6fbd8b74886efb3ebb96ee645f9183c9fdf). In particular the determination of the `counter` variable is now different and produces floats instead of integers. This is probably a kerncraft bug because the loop counters should not be obtained as floats. Later in the code they enter a `range` function and this causes the failure.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/259Increasing the minimal Python version2021-09-10T11:50:35+02:00Jan HönigIncreasing the minimal Python versionIncreasing the minimal Python version to 3.8 to enable features from 3.7 and 3.8. Also, the official support for 3.6 will cease this December.
Closes #21Increasing the minimal Python version to 3.8 to enable features from 3.7 and 3.8. Also, the official support for 3.6 will cease this December.
Closes #21Jan HönigJan Hönighttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/258Advanced Subexpression Insertion2021-07-28T22:10:01+02:00Frederik HennigAdvanced Subexpression InsertionMoved a few methods for elimination of selected subexpressions from lbmpy to pystencils. Helpful to control the granularity of common subexpression elimination, expression tree cleanup, and potentially to simplify equations by substituti...Moved a few methods for elimination of selected subexpressions from lbmpy to pystencils. Helpful to control the granularity of common subexpression elimination, expression tree cleanup, and potentially to simplify equations by substituting constant or zero subexpressions.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/257Bit Flag Conditional2021-07-04T16:57:51+02:00Frederik HennigBit Flag ConditionalThis MR introduces a bit flag conditional that acts like a ternary operator for a single bit flag. It takes at least three arguments:
- `flag_bit` specifies which bit of the mask is examined
- `mask_expression` is an integer-typed expres...This MR introduces a bit flag conditional that acts like a ternary operator for a single bit flag. It takes at least three arguments:
- `flag_bit` specifies which bit of the mask is examined
- `mask_expression` is an integer-typed expression which acts as a bit mask
- `then_expression` is a SymPy expression of arbitrary type, and
- optionally `else_expression` is a SymPy expression of arbitrary type
`flag_cond` examines the given bit of the given mask and takes the value of `then_expression` if the bit is set to 1. If not, it becomes either `0` or `else_expression`:
Three argument version:
```
flag_cond(flag_bit, mask, expr) = expr if (flag_bit is set in mask) else 0
```
Four argument version:
```
flag_cond(flag_bit, mask, expr_then, expr_else) = expr_then if (flag_bit is set in mask) else expr_else
```https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/256Add citations to ReadMe2021-07-05T11:55:34+02:00Markus HolzerAdd citations to ReadMepystencils citations have been added to the ReadMe. Furthermore, the authors list is reorderd and the GitLab CI is adapted to have a pre test stage.pystencils citations have been added to the ReadMe. Furthermore, the authors list is reorderd and the GitLab CI is adapted to have a pre test stage.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/255Remove gmpy workaround2021-06-26T15:43:54+02:00Markus HolzerRemove gmpy workaroundThe gmpy workaroung is not needed anymore because this problem is fixed in smpy with https://github.com/sympy/sympy/pull/13530The gmpy workaroung is not needed anymore because this problem is fixed in smpy with https://github.com/sympy/sympy/pull/13530Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/254FVM: Choose better stencil for derivative in flux for D3Q272021-06-15T07:26:29+02:00Michael Kuronmkuron@icp.uni-stuttgart.deFVM: Choose better stencil for derivative in flux for D3Q27As reported by @Tischler, the FVM discretization does not use the correct stencils for fluxes with derivatives in D3Q27. The result is not wrong, but uses more neighbors than necessary.
This merge request adds a test case and improves t...As reported by @Tischler, the FVM discretization does not use the correct stencils for fluxes with derivatives in D3Q27. The result is not wrong, but uses more neighbors than necessary.
This merge request adds a test case and improves the stencil-choosing heuristic. It now produces the same two-point finite differences that a human would choose by optimizing the free weights later in the process.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/253Use closest normal for boundary index list with single_link2021-06-18T12:07:29+02:00Markus HolzerUse closest normal for boundary index list with single_linkFor creating the index list just the first stencil entry was taken which is a neighbour of the investigated cell if `single_link=True`. With this MR the discrete normal is calculated and the neighbouring cell in the normal direction is t...For creating the index list just the first stencil entry was taken which is a neighbour of the investigated cell if `single_link=True`. With this MR the discrete normal is calculated and the neighbouring cell in the normal direction is taken to build up the index array.
Furthermore, the computational cost of the python versions for `create_boundary_index_list` is reduced drastically because the iteration space is now restricted to the boundary cells and not the entire domain anymore.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/252Fixes for buffers in loops with step size > 12021-06-08T08:54:17+02:00Frederik HennigFixes for buffers in loops with step size > 1This MR introduces some additions and fixes for generating CPU loops with step sizes > 1:
- The CPU `create_kernel` function now exposes a flag to disable the double field write check
- Rewrote `get_base_buffer_index` to use pure integ...This MR introduces some additions and fixes for generating CPU loops with step sizes > 1:
- The CPU `create_kernel` function now exposes a flag to disable the double field write check
- Rewrote `get_base_buffer_index` to use pure integer arithmetic, and corrected the computation of the buffer base index
to correctly incorporate loop step sizes. Added test case to check correctness.
- Added rudimentary `evalf` functionality to integer division sympy function `int_div` (its absence lead to an infinite recursion during code generation).
- Added correct printing of integer-typed expressions in `CustomSympyPrinter._typed_number`.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/251Use int64 for indexing2021-06-08T08:33:34+02:00Markus HolzerUse int64 for indexingFor indexed kernels, int32 is too small for large domain sizes. Thus the coordinates are cast to int64 in this MR to allow huge domain sizes.
As an example of the adaption the generated code for a Neumann boundary is shown. Before:
```...For indexed kernels, int32 is too small for large domain sizes. Thus the coordinates are cast to int64 in this MR to allow huge domain sizes.
As an example of the adaption the generated code for a Neumann boundary is shown. Before:
```cpp
FUNC_PREFIX void kernel(double * RESTRICT _data_C, uint8_t * RESTRICT const _data_indexField, int64_t const _size_indexField_0, int64_t const _stride_indexField_0)
{
#pragma omp parallel
{
#pragma omp for schedule(static)
for (int64_t ctr_0 = 0; ctr_0 < _size_indexField_0; ctr_0 += 1)
{
const int32_t x = *((int32_t *)(& _data_indexField[12*_stride_indexField_0*ctr_0]));
const int32_t y = *((int32_t *)(& _data_indexField[12*_stride_indexField_0*ctr_0 + 4]));
const int64_t cx [] = { 0, 0, 0, -1, 1, -1, 1, -1, 1 };
const int64_t cy [] = { 0, 1, -1, 0, 0, 1, 1, -1, -1 };
const int invdir [] = { 0, 2, 1, 4, 3, 8, 7, 6, 5 };
const int32_t dir = *((int32_t *)(& _data_indexField[12*_stride_indexField_0*ctr_0 + 8]));
_data_C[x + 11*y] = _data_C[x + 11*y + cx[dir] + 11*cy[dir]];
}
}
}
```
After:
```cpp
FUNC_PREFIX void kernel(double * RESTRICT _data_C, uint8_t * RESTRICT const _data_indexField, int64_t const _size_indexField_0, int64_t const _stride_indexField_0)
{
#pragma omp parallel
{
#pragma omp for schedule(static)
for (int64_t ctr_0 = 0; ctr_0 < _size_indexField_0; ctr_0 += 1)
{
const int64_t x = *((int32_t *)(& _data_indexField[12*_stride_indexField_0*ctr_0]));
const int64_t y = *((int32_t *)(& _data_indexField[12*_stride_indexField_0*ctr_0 + 4]));
const int64_t cx [] = { 0, 0, 0, -1, 1, -1, 1, -1, 1 };
const int64_t cy [] = { 0, 1, -1, 0, 0, 1, 1, -1, -1 };
const int64_t invdir [] = { 0, 2, 1, 4, 3, 8, 7, 6, 5 };
const int64_t dir = *((int32_t *)(& _data_indexField[12*_stride_indexField_0*ctr_0 + 8]));
_data_C[x + 11*y] = _data_C[x + 11*y + cx[dir] + 11*cy[dir]];
}
}
}
```Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/249Undo some changes from !248 that are no longer needed2021-05-27T19:41:41+02:00Michael Kuronmkuron@icp.uni-stuttgart.deUndo some changes from !248 that are no longer neededIt turns out these were only needed before I moved the vectorization of the `RNGBase` objects to the right place. The vectorized C printer does actually print scalar code when it is passed scalar variables and field accesses.It turns out these were only needed before I moved the vectorization of the `RNGBase` objects to the right place. The vectorized C printer does actually print scalar code when it is passed scalar variables and field accesses.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/248Fix RNG vectorization for LB2021-05-27T19:41:42+02:00Michael Kuronmkuron@icp.uni-stuttgart.deFix RNG vectorization for LBReported by @RudolfWeeber. Not sure why it wasn't caught by the pycodegen nightly job. Added new tests to lbmpy in https://i10git.cs.fau.de/pycodegen/lbmpy/-/merge_requests/82.
Forgotten type check in `_print_Function`:
```
File "pyst...Reported by @RudolfWeeber. Not sure why it wasn't caught by the pycodegen nightly job. Added new tests to lbmpy in https://i10git.cs.fau.de/pycodegen/lbmpy/-/merge_requests/82.
Forgotten type check in `_print_Function`:
```
File "pystencils/pystencils/backends/cbackend.py", line 673, in _print_Function
(isinstance(arg, TypedSymbol) and arg.dtype.is_int())
AttributeError: 'VectorType' object has no attribute 'is_int'
```
Generation of invalid tail loops when `assume_sufficient_line_padding=False`:
```c++
_data_pdfs_tmp_20_30_10[ctr_0] = _mm256_add_pd(_mm256_add_pd(_mm256_add_pd(_mm256_add_pd(_mm256_add_pd(_mm256_add_pd(_mm256_add_pd(_mm256_add_pd(_mm256_add_pd(_mm256_mul_pd(xi_88,_mm256_set_pd(0.142857142857143,0.142857142857143,0.142857142857143,0.142857142857143)),_mm256_mul_pd(xi_89,_mm256_set_pd(0.2,0.2,0.2,0.2))),_mm256_mul_pd(xi_91,_mm256_set_pd(-1.0,-1.0,-1.0,-1.0))),_mm256_mul_pd(xi_92,_mm256_set_pd(0.0857142857142857,0.0857142857142857,0.0857142857142857,0.0857142857142857))),_mm256_set_pd(xi_108*-0.5,xi_108*-0.5,xi_108*-0.5,xi_108*-0.5)),_mm256_set_pd(xi_112*0.0238095238095238,xi_112*0.0238095238095238,xi_112*0.0238095238095238,xi_112*0.0238095238095238)),_mm256_set_pd(xi_95*0.1,xi_95*0.1,xi_95*0.1,xi_95*0.1)),_mm256_set_pd(xi_98*0.0428571428571429,xi_98*0.0428571428571429,xi_98*0.0428571428571429,xi_98*0.0428571428571429)),_mm256_set_pd(_data_pdfs_20_30_10[ctr_0],_data_pdfs_20_30_10[ctr_0],_data_pdfs_20_30_10[ctr_0],_data_pdfs_20_30_10[ctr_0])),_mm256_set_pd(forceTerm_0,forceTerm_0,forceTerm_0,forceTerm_0));
```
Generation of partially-vectorized code with `assume_inner_stride_is_one=False` in conjunction with the random number generator:
```c++
const double xi_22 = -xi_64 + xi_66;
const float64x2_t xi_26 = vaddq_f64(random_1_1,vdupq_n_f64(-0.5));
```Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/247OpenCL RNG2021-05-26T13:44:59+02:00Michael Kuronmkuron@icp.uni-stuttgart.deOpenCL RNGUnfortunately most OpenCL implementations don't support C++Unfortunately most OpenCL implementations don't support C++Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/246ship C-file2021-05-11T09:31:17+02:00Markus Holzership C-fileShipping the generated C-files to pypi is a good idea since it is less error-prone. A New Cython version might deal with the provided pyx file in a way we did not intend.
In more detail the best practice can be found here:
http://blog.b...Shipping the generated C-files to pypi is a good idea since it is less error-prone. A New Cython version might deal with the provided pyx file in a way we did not intend.
In more detail the best practice can be found here:
http://blog.behnel.de/posts/ship-generated-c-code-or-not.htmlMarkus HolzerMarkus Holzer