pystencils merge requestshttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests2019-09-04T15:06:57+02:00https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/49Actually increment counter inside random_symbol2019-09-04T15:06:57+02:00Michael Kuronmkuron@icp.uni-stuttgart.deActually increment counter inside random_symbol@rudolfweeber is currently looking at the statistical mechanics of the fluctuating LB and found a velocity bias. It turned out that this is due to all generated random numbers using the same key. Instead it should be incremented when gen...@rudolfweeber is currently looking at the statistical mechanics of the fluctuating LB and found a velocity bias. It turned out that this is due to all generated random numbers using the same key. Instead it should be incremented when generating multiple random numbers in the same kernel.
So in the generated code,
```c++
philox_float4(time_step, ctr_0, ctr_1, ctr_2, 0, 2, Dummy_38, Dummy_39, Dummy_40, Dummy_41);
philox_float4(time_step, ctr_0, ctr_1, ctr_2, 0, 2, Dummy_34, Dummy_35, Dummy_36, Dummy_37);
philox_float4(time_step, ctr_0, ctr_1, ctr_2, 0, 2, Dummy_30, Dummy_31, Dummy_32, Dummy_33);
philox_float4(time_step, ctr_0, ctr_1, ctr_2, 0, 2, Dummy_26, Dummy_27, Dummy_28, Dummy_29);
```
becomes
```c++
philox_float4(time_step, ctr_0, ctr_1, ctr_2, 3, 2, Dummy_38, Dummy_39, Dummy_40, Dummy_41);
philox_float4(time_step, ctr_0, ctr_1, ctr_2, 2, 2, Dummy_34, Dummy_35, Dummy_36, Dummy_37);
philox_float4(time_step, ctr_0, ctr_1, ctr_2, 1, 2, Dummy_30, Dummy_31, Dummy_32, Dummy_33);
philox_float4(time_step, ctr_0, ctr_1, ctr_2, 0, 2, Dummy_26, Dummy_27, Dummy_28, Dummy_29);
```Martin BauerMartin Bauerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/48RNG SIMD2021-02-12T22:31:36+01:00Michael Kuronmkuron@icp.uni-stuttgart.deRNG SIMDI've vectorized the Philox and AES-NI RNGs, fixes #12. I had to add a very minimal integer vectorization that only supports `int32`, `makeVec`, `+`, and loop counters. Also, the `RNGNode` now needs to know already during construction how...I've vectorized the Philox and AES-NI RNGs, fixes #12. I had to add a very minimal integer vectorization that only supports `int32`, `makeVec`, `+`, and loop counters. Also, the `RNGNode` now needs to know already during construction how it's vectorized, which is ugly, but could only be resolved by a better type system (#20). For the same reason, it is not possible to use a vectorized float RNG with double fields or vice versa. Also, we essentially discard half the random numbers in double precision mode because otherwise the number of variables we return would change between the vectorized and non-vectorized version, which is incompatible with the interface.
For the tests, we need to add `pip3 install randomgen` to the Dockerfile.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/46AES-NI vectorization improvements2019-09-17T09:08:05+02:00Michael Kuronmkuron@icp.uni-stuttgart.deAES-NI vectorization improvements!30 didn't implement an SSE-vectorized `_mm_cvtepu64_pd` equivalent because the [stackoverflow](https://stackoverflow.com/a/41148578) solution didn't work. That turned out to be due to a bad optimization in GCC 5+ in fast-math mode. None...!30 didn't implement an SSE-vectorized `_mm_cvtepu64_pd` equivalent because the [stackoverflow](https://stackoverflow.com/a/41148578) solution didn't work. That turned out to be due to a bad optimization in GCC 5+ in fast-math mode. None of the other compilers (Clang, Intel, MSVC) have that issue, so we just disable fast-math for that function.
Also, we now use fused multiply-add if available.Martin BauerMartin Bauerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/45Add PyPI badge2019-09-02T13:40:44+02:00Stephan SeitzAdd PyPI badgeBadge with current PyPI version and link to the PyPI page.Badge with current PyPI version and link to the PyPI page.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/44Fix typo in "pre-push"2019-09-02T13:41:09+02:00Stephan SeitzFix typo in "pre-push"https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/43Use get_type_of_expression in typing_form_sympy_inspection to infer types2019-09-23T16:16:50+02:00Stephan SeitzUse get_type_of_expression in typing_form_sympy_inspection to infer typesDANGER ZONE: this changes something in the core behavior of pystencils. Be careful before merging!
In summary, when `typing_form_sympy_inspection` reaches the point where it would just use `default_type`, we try to use `get_type_of_ex...DANGER ZONE: this changes something in the core behavior of pystencils. Be careful before merging!
In summary, when `typing_form_sympy_inspection` reaches the point where it would just use `default_type`, we try to use `get_type_of_expression` to infer the actual type.
We use information of previously defined variables in current scope.
Another approach would be to just type all the intermediate variable with `auto`.
```python
x = pystencils.fields('x: float32[3d]')
assignments = pystencils.AssignmentCollection({
a: cast_func(10, create_type('float64')),
b: cast_func(10, create_type('uint16')),
e: 11,
c: b,
f: c + b,
d: c + b + x.center + e,
x.center: c + b + x.center
})
```
Before:
```cpp
FUNC_PREFIX void kernel(float * RESTRICT _data_x, int64_t const _size_x_0, int64_t const _size_x_1,
int64_t const _size_x_2, int64_t const _stride_x_0, int64_t const _stride_x_1, int64_t const _stri
de_x_2)
{
const double a = 10.0;
const double b = 10;
const double e = 11.0;
const double c = b;
const double f = b + c;
for (int ctr_0 = 0; ctr_0 < _size_x_0; ctr_0 += 1)
{
float * RESTRICT _data_x_00 = _data_x + _stride_x_0*ctr_0;
for (int ctr_1 = 0; ctr_1 < _size_x_1; ctr_1 += 1)
{
float * RESTRICT _data_x_00_10 = _stride_x_1*ctr_1 + _data_x_00;
for (int ctr_2 = 0; ctr_2 < _size_x_2; ctr_2 += 1)
{
const double d = b + c + e + _data_x_00_10[_stride_x_2*ctr_2];
_data_x_00_10[_stride_x_2*ctr_2] = b + c + _data_x_00_10[_stride_x_2*ctr_2];
}
}
}
}
```
After:
```cpp
FUNC_PREFIX void kernel(float * RESTRICT _data_x, int64_t const _size_x_0, int64_t const _size_x_1,
int64_t const _size_x_2, int64_t const _stride_x_0, int64_t const _stride_x_1, int64_t const _stri
de_x_2)
{
const double a = 10.0;
const uint16_t b = 10;
const int64_t e = 11.0;
const uint16_t c = b;
const uint16_t f = b + c;
for (int ctr_0 = 0; ctr_0 < _size_x_0; ctr_0 += 1)
{
float * RESTRICT _data_x_00 = _data_x + _stride_x_0*ctr_0;
for (int ctr_1 = 0; ctr_1 < _size_x_1; ctr_1 += 1)
{
float * RESTRICT _data_x_00_10 = _stride_x_1*ctr_1 + _data_x_00;
for (int ctr_2 = 0; ctr_2 < _size_x_2; ctr_2 += 1)
{
const float d = b + c + e + _data_x_00_10[_stride_x_2*ctr_2];
_data_x_00_10[_stride_x_2*ctr_2] = b + c + _data_x_00_10[_stride_x_2*ctr_2];
}
}
}
}
```https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/41Add pyhtml to tests and artifacts2019-09-02T13:42:18+02:00Stephan SeitzAdd pyhtml to tests and artifactsI think this should suffice to produce the artifacts.
But `pytest-html` and `ansi2html` need to be added to the docker images.I think this should suffice to produce the artifacts.
But `pytest-html` and `ansi2html` need to be added to the docker images.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/40Contest: ignore two more files if waLBerla is not available2019-08-21T18:43:43+02:00Stephan SeitzContest: ignore two more files if waLBerla is not available- Contest: ignore two more files if waLBerla is not available (need when executing
- Skip collection of `pystencils.autodiff` always (not only if `'CI' in `os.environ`)- Contest: ignore two more files if waLBerla is not available (need when executing
- Skip collection of `pystencils.autodiff` always (not only if `'CI' in `os.environ`)https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/39Conftest: waLBerla, kerncraft2019-08-20T16:26:56+02:00Stephan SeitzConftest: waLBerla, kerncraft- Add `waLBerla` to conftest
- Add missing file to conftest for `kerncraft`- Add `waLBerla` to conftest
- Add missing file to conftest for `kerncraft`https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/38Implement sp.Sum, sp.Product2019-08-21T18:45:35+02:00Stephan SeitzImplement sp.Sum, sp.ProductSum and Product have a indexing variable which is a Atom but not a free
symbol. So logic, that defines the undefined symbols in a `SympyAssignment` should not be
`atoms(sp.Symbol)` but `free_symbols`. `sp.Indexed` from the `ResolvedFie...Sum and Product have a indexing variable which is a Atom but not a free
symbol. So logic, that defines the undefined symbols in a `SympyAssignment` should not be
`atoms(sp.Symbol)` but `free_symbols`. `sp.Indexed` from the `ResolvedFieldAcess`es forms an edge case.
So we could also use `atoms(sp.Symbol).intersection(...free_symbols)`.
I hope I extracted from my fork all the necessary code to implement this feature.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/37Remove main methods from tests (sorry for adding them)2019-08-20T16:27:21+02:00Stephan SeitzRemove main methods from tests (sorry for adding them)... or code will be executed when pytest is collecting the tests.
I found out that I can use "-s" to convince vim-test to show me test
output.... or code will be executed when pytest is collecting the tests.
I found out that I can use "-s" to convince vim-test to show me test
output.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/36Pre-push hook2019-08-20T16:28:40+02:00Stephan SeitzPre-push hookThis prevents me from pushing stuff that either fails in quicktest or flake8.
Has to be copied manually to `.git/hooks` and `python3` has to be adapted to your Python executable.
~~Is there an update in flake8 that `.flake8` is not...This prevents me from pushing stuff that either fails in quicktest or flake8.
Has to be copied manually to `.git/hooks` and `python3` has to be adapted to your Python executable.
~~Is there an update in flake8 that `.flake8` is not recognized automatically anymore and that we need to append C901?~~
Probably, I installed just different linter on my PC at home. flake8 can use different linters.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/35Fix get_type_of_expression for constants like sympy.pi2019-08-22T08:31:17+02:00Stephan SeitzFix get_type_of_expression for constants like sympy.piProblem: some constant expressions are neither Float,Integer,Rational and
don't have arguments.
```python
>>> from sympy import *
>>> isinstance(pi, Integer)
False
>>> isinstance(pi, Float)
False
>>> isinstance(pi, Rational)
F...Problem: some constant expressions are neither Float,Integer,Rational and
don't have arguments.
```python
>>> from sympy import *
>>> isinstance(pi, Integer)
False
>>> isinstance(pi, Float)
False
>>> isinstance(pi, Rational)
False
>>> pi.args
()
```https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/34Address #13: Use sympy.codegen.rewriting.optimize2019-09-23T10:55:13+02:00Stephan SeitzAddress #13: Use sympy.codegen.rewriting.optimizeIt's really comfortable to write optimizations in terms of `sympy.codegen.rewrite.RewriteOptim`:
```python
# Evaluates all constant terms
evaluate_constant_terms = ReplaceOptim(
lambda e: hasattr(e, 'is_constant') a...It's really comfortable to write optimizations in terms of `sympy.codegen.rewrite.RewriteOptim`:
```python
# Evaluates all constant terms
evaluate_constant_terms = ReplaceOptim(
lambda e: hasattr(e, 'is_constant') and e.is_constant,
lambda p: p.evalf()
)
```
This PR adds a parameter `sympy_optimizations` to the `create_*_kernel` functions that applies the list of optimizations to the assignments before creating the AST.
`sympy.codegen.rewrite` already has some optimizations. Some similar to the optimizations of pystencils.
For example `create_expand_pow_optimization(limit)` is really similar to the logic in `CustomSympyPrinter._print_Pow`.
See #13
Problem: old versions of sympy (e.g. from ubuntu CI) don't have `sympy.codegen.rewrite`. The optimizations are skipped in that case. `test_and_coverage` applies all optimizations.
We could also try to implement a fma-optimization (fused-multipy add) with that and `sympy.Wild`.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/33Add KernelFunction.fields_written2019-08-16T08:59:16+02:00Stephan SeitzAdd KernelFunction.fields_writtenI found myself needing this convenience wrapper in various places.I found myself needing this convenience wrapper in various places.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/32Bugfix: Readd __launch_bounds__ for dialect 'cuda'2019-08-15T09:14:26+02:00Stephan SeitzBugfix: Readd __launch_bounds__ for dialect 'cuda'__launch_bounds__ was deactivated when introducing `CudaBackend`__launch_bounds__ was deactivated when introducing `CudaBackend`https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/31Bugfix: TypedSymbol.is_negative should not be implemented in terms of super().is_positive2019-08-14T17:03:02+02:00Stephan SeitzBugfix: TypedSymbol.is_negative should not be implemented in terms of super().is_positiveThis can lead to surprising simplificationsThis can lead to surprising simplificationshttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/30AES-NI Random Number Generator2019-09-02T10:21:21+02:00Michael Kuronmkuron@icp.uni-stuttgart.deAES-NI Random Number GeneratorI was looking at how to vectorize the Philox RNG yesterday. Before I knew it, I had implemented a working RNG using AES-NI instructions :nerd: ... Not entirely what I had intended to do, but it might still be useful to someone and should...I was looking at how to vectorize the Philox RNG yesterday. Before I knew it, I had implemented a working RNG using AES-NI instructions :nerd: ... Not entirely what I had intended to do, but it might still be useful to someone and should be similarly fast as a vectorized Philox.
There is one place that could be optimized because I fall back to scalar instructions: I failed to reimplement `_mm_cvtepu64_pd` (the solution from https://stackoverflow.com/a/41148578 produces incorrect results in the least-significant half of the mantissa). Perhaps someone else can try to fix that.
I did not integrate this with the `vector_instruction_set` parameter of the code generation. Perhaps you can do that, @bauer. It needs support for SSE2 and AES instructions (which look like SSE2 instructions, but their availability is determined by a separate CPUID flag). It will also make use of `_mm_cvtepu32_ps` and `_mm_cvtepu64_pd` from AVX512 if available (these are 128-bit instructions that actually look like SSE2 instructions).Martin BauerMartin Bauerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/29Basic support for OpenCL (experimental)2019-08-22T08:37:37+02:00Stephan SeitzBasic support for OpenCL (experimental)Basic support for OpenCL
Problem: OpenCL cannot import `stdint.h`. Temporary fix: define custom `opencl_stdint.h` (~~defines currently only `int64_t`~~ `)
TODO:
- ~~implement `opencl_stdint.h`~~
- implement shard_mem, textures,...Basic support for OpenCL
Problem: OpenCL cannot import `stdint.h`. Temporary fix: define custom `opencl_stdint.h` (~~defines currently only `int64_t`~~ `)
TODO:
- ~~implement `opencl_stdint.h`~~
- implement shard_mem, textures, built-in functions
- ~~avoid CUDA intrinsics (`fast_div`)~~https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/28Philox tests and clean up2019-08-13T14:17:37+02:00Michael Kuronmkuron@icp.uni-stuttgart.dePhilox tests and clean upTest the Philox against reference data and clean up duplicated code in the code generation. The latter will make it easier to later add a vectorized Philox.Test the Philox against reference data and clean up duplicated code in the code generation. The latter will make it easier to later add a vectorized Philox.Martin BauerMartin Bauer