pystencils merge requestshttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests2019-09-17T09:08:05+02:00https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/46AES-NI vectorization improvements2019-09-17T09:08:05+02:00Michael Kuronmkuron@icp.uni-stuttgart.deAES-NI vectorization improvements!30 didn't implement an SSE-vectorized `_mm_cvtepu64_pd` equivalent because the [stackoverflow](https://stackoverflow.com/a/41148578) solution didn't work. That turned out to be due to a bad optimization in GCC 5+ in fast-math mode. None...!30 didn't implement an SSE-vectorized `_mm_cvtepu64_pd` equivalent because the [stackoverflow](https://stackoverflow.com/a/41148578) solution didn't work. That turned out to be due to a bad optimization in GCC 5+ in fast-math mode. None of the other compilers (Clang, Intel, MSVC) have that issue, so we just disable fast-math for that function.
Also, we now use fused multiply-add if available.Martin BauerMartin Bauerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/45Add PyPI badge2019-09-02T13:40:44+02:00Stephan SeitzAdd PyPI badgeBadge with current PyPI version and link to the PyPI page.Badge with current PyPI version and link to the PyPI page.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/44Fix typo in "pre-push"2019-09-02T13:41:09+02:00Stephan SeitzFix typo in "pre-push"https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/43Use get_type_of_expression in typing_form_sympy_inspection to infer types2019-09-23T16:16:50+02:00Stephan SeitzUse get_type_of_expression in typing_form_sympy_inspection to infer typesDANGER ZONE: this changes something in the core behavior of pystencils. Be careful before merging!
In summary, when `typing_form_sympy_inspection` reaches the point where it would just use `default_type`, we try to use `get_type_of_ex...DANGER ZONE: this changes something in the core behavior of pystencils. Be careful before merging!
In summary, when `typing_form_sympy_inspection` reaches the point where it would just use `default_type`, we try to use `get_type_of_expression` to infer the actual type.
We use information of previously defined variables in current scope.
Another approach would be to just type all the intermediate variable with `auto`.
```python
x = pystencils.fields('x: float32[3d]')
assignments = pystencils.AssignmentCollection({
a: cast_func(10, create_type('float64')),
b: cast_func(10, create_type('uint16')),
e: 11,
c: b,
f: c + b,
d: c + b + x.center + e,
x.center: c + b + x.center
})
```
Before:
```cpp
FUNC_PREFIX void kernel(float * RESTRICT _data_x, int64_t const _size_x_0, int64_t const _size_x_1,
int64_t const _size_x_2, int64_t const _stride_x_0, int64_t const _stride_x_1, int64_t const _stri
de_x_2)
{
const double a = 10.0;
const double b = 10;
const double e = 11.0;
const double c = b;
const double f = b + c;
for (int ctr_0 = 0; ctr_0 < _size_x_0; ctr_0 += 1)
{
float * RESTRICT _data_x_00 = _data_x + _stride_x_0*ctr_0;
for (int ctr_1 = 0; ctr_1 < _size_x_1; ctr_1 += 1)
{
float * RESTRICT _data_x_00_10 = _stride_x_1*ctr_1 + _data_x_00;
for (int ctr_2 = 0; ctr_2 < _size_x_2; ctr_2 += 1)
{
const double d = b + c + e + _data_x_00_10[_stride_x_2*ctr_2];
_data_x_00_10[_stride_x_2*ctr_2] = b + c + _data_x_00_10[_stride_x_2*ctr_2];
}
}
}
}
```
After:
```cpp
FUNC_PREFIX void kernel(float * RESTRICT _data_x, int64_t const _size_x_0, int64_t const _size_x_1,
int64_t const _size_x_2, int64_t const _stride_x_0, int64_t const _stride_x_1, int64_t const _stri
de_x_2)
{
const double a = 10.0;
const uint16_t b = 10;
const int64_t e = 11.0;
const uint16_t c = b;
const uint16_t f = b + c;
for (int ctr_0 = 0; ctr_0 < _size_x_0; ctr_0 += 1)
{
float * RESTRICT _data_x_00 = _data_x + _stride_x_0*ctr_0;
for (int ctr_1 = 0; ctr_1 < _size_x_1; ctr_1 += 1)
{
float * RESTRICT _data_x_00_10 = _stride_x_1*ctr_1 + _data_x_00;
for (int ctr_2 = 0; ctr_2 < _size_x_2; ctr_2 += 1)
{
const float d = b + c + e + _data_x_00_10[_stride_x_2*ctr_2];
_data_x_00_10[_stride_x_2*ctr_2] = b + c + _data_x_00_10[_stride_x_2*ctr_2];
}
}
}
}
```https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/41Add pyhtml to tests and artifacts2019-09-02T13:42:18+02:00Stephan SeitzAdd pyhtml to tests and artifactsI think this should suffice to produce the artifacts.
But `pytest-html` and `ansi2html` need to be added to the docker images.I think this should suffice to produce the artifacts.
But `pytest-html` and `ansi2html` need to be added to the docker images.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/40Contest: ignore two more files if waLBerla is not available2019-08-21T18:43:43+02:00Stephan SeitzContest: ignore two more files if waLBerla is not available- Contest: ignore two more files if waLBerla is not available (need when executing
- Skip collection of `pystencils.autodiff` always (not only if `'CI' in `os.environ`)- Contest: ignore two more files if waLBerla is not available (need when executing
- Skip collection of `pystencils.autodiff` always (not only if `'CI' in `os.environ`)https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/39Conftest: waLBerla, kerncraft2019-08-20T16:26:56+02:00Stephan SeitzConftest: waLBerla, kerncraft- Add `waLBerla` to conftest
- Add missing file to conftest for `kerncraft`- Add `waLBerla` to conftest
- Add missing file to conftest for `kerncraft`https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/38Implement sp.Sum, sp.Product2019-08-21T18:45:35+02:00Stephan SeitzImplement sp.Sum, sp.ProductSum and Product have a indexing variable which is a Atom but not a free
symbol. So logic, that defines the undefined symbols in a `SympyAssignment` should not be
`atoms(sp.Symbol)` but `free_symbols`. `sp.Indexed` from the `ResolvedFie...Sum and Product have a indexing variable which is a Atom but not a free
symbol. So logic, that defines the undefined symbols in a `SympyAssignment` should not be
`atoms(sp.Symbol)` but `free_symbols`. `sp.Indexed` from the `ResolvedFieldAcess`es forms an edge case.
So we could also use `atoms(sp.Symbol).intersection(...free_symbols)`.
I hope I extracted from my fork all the necessary code to implement this feature.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/37Remove main methods from tests (sorry for adding them)2019-08-20T16:27:21+02:00Stephan SeitzRemove main methods from tests (sorry for adding them)... or code will be executed when pytest is collecting the tests.
I found out that I can use "-s" to convince vim-test to show me test
output.... or code will be executed when pytest is collecting the tests.
I found out that I can use "-s" to convince vim-test to show me test
output.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/36Pre-push hook2019-08-20T16:28:40+02:00Stephan SeitzPre-push hookThis prevents me from pushing stuff that either fails in quicktest or flake8.
Has to be copied manually to `.git/hooks` and `python3` has to be adapted to your Python executable.
~~Is there an update in flake8 that `.flake8` is not...This prevents me from pushing stuff that either fails in quicktest or flake8.
Has to be copied manually to `.git/hooks` and `python3` has to be adapted to your Python executable.
~~Is there an update in flake8 that `.flake8` is not recognized automatically anymore and that we need to append C901?~~
Probably, I installed just different linter on my PC at home. flake8 can use different linters.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/35Fix get_type_of_expression for constants like sympy.pi2019-08-22T08:31:17+02:00Stephan SeitzFix get_type_of_expression for constants like sympy.piProblem: some constant expressions are neither Float,Integer,Rational and
don't have arguments.
```python
>>> from sympy import *
>>> isinstance(pi, Integer)
False
>>> isinstance(pi, Float)
False
>>> isinstance(pi, Rational)
F...Problem: some constant expressions are neither Float,Integer,Rational and
don't have arguments.
```python
>>> from sympy import *
>>> isinstance(pi, Integer)
False
>>> isinstance(pi, Float)
False
>>> isinstance(pi, Rational)
False
>>> pi.args
()
```https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/34Address #13: Use sympy.codegen.rewriting.optimize2019-09-23T10:55:13+02:00Stephan SeitzAddress #13: Use sympy.codegen.rewriting.optimizeIt's really comfortable to write optimizations in terms of `sympy.codegen.rewrite.RewriteOptim`:
```python
# Evaluates all constant terms
evaluate_constant_terms = ReplaceOptim(
lambda e: hasattr(e, 'is_constant') a...It's really comfortable to write optimizations in terms of `sympy.codegen.rewrite.RewriteOptim`:
```python
# Evaluates all constant terms
evaluate_constant_terms = ReplaceOptim(
lambda e: hasattr(e, 'is_constant') and e.is_constant,
lambda p: p.evalf()
)
```
This PR adds a parameter `sympy_optimizations` to the `create_*_kernel` functions that applies the list of optimizations to the assignments before creating the AST.
`sympy.codegen.rewrite` already has some optimizations. Some similar to the optimizations of pystencils.
For example `create_expand_pow_optimization(limit)` is really similar to the logic in `CustomSympyPrinter._print_Pow`.
See #13
Problem: old versions of sympy (e.g. from ubuntu CI) don't have `sympy.codegen.rewrite`. The optimizations are skipped in that case. `test_and_coverage` applies all optimizations.
We could also try to implement a fma-optimization (fused-multipy add) with that and `sympy.Wild`.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/33Add KernelFunction.fields_written2019-08-16T08:59:16+02:00Stephan SeitzAdd KernelFunction.fields_writtenI found myself needing this convenience wrapper in various places.I found myself needing this convenience wrapper in various places.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/32Bugfix: Readd __launch_bounds__ for dialect 'cuda'2019-08-15T09:14:26+02:00Stephan SeitzBugfix: Readd __launch_bounds__ for dialect 'cuda'__launch_bounds__ was deactivated when introducing `CudaBackend`__launch_bounds__ was deactivated when introducing `CudaBackend`https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/31Bugfix: TypedSymbol.is_negative should not be implemented in terms of super()...2019-08-14T17:03:02+02:00Stephan SeitzBugfix: TypedSymbol.is_negative should not be implemented in terms of super().is_positiveThis can lead to surprising simplificationsThis can lead to surprising simplificationshttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/30AES-NI Random Number Generator2019-09-02T10:21:21+02:00Michael Kuronmkuron@icp.uni-stuttgart.deAES-NI Random Number GeneratorI was looking at how to vectorize the Philox RNG yesterday. Before I knew it, I had implemented a working RNG using AES-NI instructions :nerd: ... Not entirely what I had intended to do, but it might still be useful to someone and should...I was looking at how to vectorize the Philox RNG yesterday. Before I knew it, I had implemented a working RNG using AES-NI instructions :nerd: ... Not entirely what I had intended to do, but it might still be useful to someone and should be similarly fast as a vectorized Philox.
There is one place that could be optimized because I fall back to scalar instructions: I failed to reimplement `_mm_cvtepu64_pd` (the solution from https://stackoverflow.com/a/41148578 produces incorrect results in the least-significant half of the mantissa). Perhaps someone else can try to fix that.
I did not integrate this with the `vector_instruction_set` parameter of the code generation. Perhaps you can do that, @bauer. It needs support for SSE2 and AES instructions (which look like SSE2 instructions, but their availability is determined by a separate CPUID flag). It will also make use of `_mm_cvtepu32_ps` and `_mm_cvtepu64_pd` from AVX512 if available (these are 128-bit instructions that actually look like SSE2 instructions).Martin BauerMartin Bauerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/29Basic support for OpenCL (experimental)2019-08-22T08:37:37+02:00Stephan SeitzBasic support for OpenCL (experimental)Basic support for OpenCL
Problem: OpenCL cannot import `stdint.h`. Temporary fix: define custom `opencl_stdint.h` (~~defines currently only `int64_t`~~ `)
TODO:
- ~~implement `opencl_stdint.h`~~
- implement shard_mem, textures,...Basic support for OpenCL
Problem: OpenCL cannot import `stdint.h`. Temporary fix: define custom `opencl_stdint.h` (~~defines currently only `int64_t`~~ `)
TODO:
- ~~implement `opencl_stdint.h`~~
- implement shard_mem, textures, built-in functions
- ~~avoid CUDA intrinsics (`fast_div`)~~https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/28Philox tests and clean up2019-08-13T14:17:37+02:00Michael Kuronmkuron@icp.uni-stuttgart.dePhilox tests and clean upTest the Philox against reference data and clean up duplicated code in the code generation. The latter will make it easier to later add a vectorized Philox.Test the Philox against reference data and clean up duplicated code in the code generation. The latter will make it easier to later add a vectorized Philox.Martin BauerMartin Bauerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/27Fix error message of CBackend for unsupported nodes2019-08-15T09:15:02+02:00Stephan SeitzFix error message of CBackend for unsupported nodesConcatenating `__class__` and `str` is not supported. Should be `str(type(self))` (full type path) or `self.__class__.__name__` (just class name)Concatenating `__class__` and `str` is not supported. Should be `str(type(self))` (full type path) or `self.__class__.__name__` (just class name)https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/25Make generate_c also work if astnode does not have member `instruction_set`2019-08-06T22:07:34+02:00Stephan SeitzMake generate_c also work if astnode does not have member `instruction_set`generate_c currently only works for KernelFunctions, since member `instruction_set` is required.
generate_c can generate code for any astnode if this requirement is dropped.generate_c currently only works for KernelFunctions, since member `instruction_set` is required.
generate_c can generate code for any astnode if this requirement is dropped.