should be considered like on the CPU version
should be considered like on the CPU version

Bugfix fields accessed for interpolator access
2019-09-30T14:10:31+02:00
Stephan Seitz
Bugfix fields accessed for interpolator access

Bugfix avoid east and west const
2019-09-30T14:10:05+02:00
Stephan Seitz
Bugfix avoid east and west const
Here's the printing logic for SympyAsssignment:
```python
if node.is_declaration:
if node.is_const # <<< and 'const' not in self._print(node.lhs.dtype):
prefix = 'const '
else:
...
```python
if node.is_declaration:
if node.is_const # <<< and 'const' not in self._print(node.lhs.dtype):
prefix = 'const '
else:
prefix = ''
data_type = prefix + self._print(node.lhs.dtype) + " "
return "%s%s = %s;" % (data_type, self.sympy_printer.doprint(node.lhs),
self.sympy_printer.doprint(node.rhs))
else:
lhs_type = get_type_of_expression(node.lhs)
if type(lhs_type) is VectorType and isinstance(node.lhs, cast_func):
```
It will always prefix const on a declaration. This will not work if dtype is also const since:
```python
def __str__(self):
result = BasicType.numpy_name_to_c(str(self._dtype))
if self.const:
result += " const"
return result
```
So we get something like `const int64_t const`.
I deleted the postfix const to have everything nicely aligned.

Kernel wrapper
2019-09-26T17:14:29+02:00
Stephan Seitz
Kernel wrapper
`KernelWrapper` is cool. Let's also use it for the `gpucuda` backend.

Also:

- make `show_code(kernel_wrapper)` possible
- fix `DeprecationWarning` for import of `Hashable`
Also:
- make `show_code(kernel_wrapper)` possible
- fix `DeprecationWarning` for import of `Hashable``KernelWrapper` is cool. Let's also use it for the `gpucuda` backend.
Also:
- make `show_code(kernel_wrapper)` possible
Eliminate usages of old name 'equation collection' for `AssignmentCollection`
2019-09-26T17:12:44+02:00
Stephan Seitz
Eliminate usages of old name 'equation collection' for `AssignmentCollection`
We should avoid the old name equation collection.
Iterpolation accesses work like `absolute_access` except they can be savely applied on all fields (i.e. with boundary checks).
More info here: !20
This PR contains som...This is another rebased PR for integrating interpolated accesses.
Iterpolation accesses work like `absolute_access` except they can be savely applied on all fields (i.e. with boundary checks).
More info here: !20
This PR contains some dead code that uses https://github.com/theHamsta/CubicInterpolationCUDA . I have not included it as a submodule in pystencils in this PR.
This PR break the hash of those two test:
```
[gw11] [ 14%] FAILED lbmpy_tests/test_code_hashequivalence.py::test_hash_equivalence_llvm
lbmpy_tests/test_conserved_quantity_relaxation_invariance.py::test_srt
[gw8] [ 15%] FAILED lbmpy_tests/test_code_hashequivalence.py::test_hash_equivalence
Extra asserts sympy issue
2019-09-25T15:38:17+02:00
Stephan Seitz
Extra asserts sympy issue
Add extra assertions to be super sure.

Use get_type_of_expression in typing_form_sympy_inspection to infer types
2019-09-23T16:16:50+02:00
Stephan Seitz
Use get_type_of_expression in typing_form_sympy_inspection to infer types
DANGER ZONE: this changes something in the core behavior of pystencils. Be careful before merging!

In summary, when `typing_form_sympy_inspection` reaches the point where it would just use `default_type`, we try to use `get_type_of_expression` to infer the actual type.

We use information of previously defined variables in current scope.

Another approach would be to just type all the intermediate variable with `auto`.
In summary, when `typing_form_sympy_inspection` reaches the point where it would just use `default_type`, we try to use `get_type_of_ex...DANGER ZONE: this changes something in the core behavior of pystencils. Be careful before merging!
In summary, when `typing_form_sympy_inspection` reaches the point where it would just use `default_type`, we try to use `get_type_of_expression` to infer the actual type.
We use information of previously defined variables in current scope.
Another approach would be to just type all the intermediate variable with `auto`.
```python
x = pystencils.fields('x: float32[3d]')
assignments = pystencils.AssignmentCollection({
a: cast_func(10, create_type('float64')),
b: cast_func(10, create_type('uint16')),
e: 11,
c: b,
f: c + b,
d: c + b + x.center + e,
x.center: c + b + x.center
})
```
Before:
```cpp
FUNC_PREFIX void kernel(float * RESTRICT _data_x, int64_t const _size_x_0, int64_t const _size_x_1,
int64_t const _size_x_2, int64_t const _stride_x_0, int64_t const _stride_x_1, int64_t const _stri
de_x_2)
{
const double a = 10.0;
const double b = 10;
const double e = 11.0;
const double c = b;
const double f = b + c;
for (int ctr_0 = 0; ctr_0 < _size_x_0; ctr_0 += 1)
{
float * RESTRICT _data_x_00 = _data_x + _stride_x_0*ctr_0;
for (int ctr_1 = 0; ctr_1 < _size_x_1; ctr_1 += 1)
{
float * RESTRICT _data_x_00_10 = _stride_x_1*ctr_1 + _data_x_00;
for (int ctr_2 = 0; ctr_2 < _size_x_2; ctr_2 += 1)
{
const double d = b + c + e + _data_x_00_10[_stride_x_2*ctr_2];
_data_x_00_10[_stride_x_2*ctr_2] = b + c + _data_x_00_10[_stride_x_2*ctr_2];
}
}
}
}
```
After:
```cpp
FUNC_PREFIX void kernel(float * RESTRICT _data_x, int64_t const _size_x_0, int64_t const _size_x_1,
int64_t const _size_x_2, int64_t const _stride_x_0, int64_t const _stride_x_1, int64_t const _stri
de_x_2)
{
const double a = 10.0;
const uint16_t b = 10;
const int64_t e = 11.0;
const uint16_t c = b;
const uint16_t f = b + c;
for (int ctr_0 = 0; ctr_0 < _size_x_0; ctr_0 += 1)
{
float * RESTRICT _data_x_00 = _data_x + _stride_x_0*ctr_0;
for (int ctr_1 = 0; ctr_1 < _size_x_1; ctr_1 += 1)
{
float * RESTRICT _data_x_00_10 = _stride_x_1*ctr_1 + _data_x_00;
for (int ctr_2 = 0; ctr_2 < _size_x_2; ctr_2 += 1)
{
const float d = b + c + e + _data_x_00_10[_stride_x_2*ctr_2];
_data_x_00_10[_stride_x_2*ctr_2] = b + c + _data_x_00_10[_stride_x_2*ctr_2];
}
}
}
}
Compile CUDA using the LLVM backend
2019-09-23T12:49:30+02:00
Stephan Seitz
Compile CUDA using the LLVM backend
We can compile CUDA to PTX using the LLVM backend :wink:

`llc` produces PTX files without complaining.
`llc` produces PTX files without complaining.We can compile CUDA to PTX using the LLVM backend :wink:
`llc` produces PTX files without complaining.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/52Sort headers/global definitions to enable reproducible code generation2019-09-23T11:03:53+02:00Stephan SeitzSort headers/global definitions to enable reproducible code generationheaders and global_declarations are generated by methods that return
sets. So even with the same inputs it is not guaranteed that the same
source code is generated since sets do not guarantee a specific order
when iterating over them.
I...headers and global_declarations are generated by methods that return
sets. So even with the same inputs it is not guaranteed that the same
source code is generated since sets do not guarantee a specific order
when iterating over them.
Address #13: Use sympy.codegen.rewriting.optimize
2019-09-23T10:55:13+02:00
Stephan Seitz
Address #13: Use sympy.codegen.rewriting.optimize
It's really comfortable to write optimizations in terms of `sympy.codegen.rewrite.RewriteOptim`:
```python
# Evaluates all constant terms
evaluate_constant_terms = ReplaceOptim(
lambda e: hasattr(e, 'is_constant') a...It's really comfortable to write optimizations in terms of `sympy.codegen.rewrite.RewriteOptim`:
```python
# Evaluates all constant terms
evaluate_constant_terms = ReplaceOptim(
lambda e: hasattr(e, 'is_constant') and e.is_constant,
lambda p: p.evalf()
)
```
This PR adds a parameter `sympy_optimizations` to the `create_*_kernel` functions that applies the list of optimizations to the assignments before creating the AST.
`sympy.codegen.rewrite` already has some optimizations. Some similar to the optimizations of pystencils.
For example `create_expand_pow_optimization(limit)` is really similar to the logic in `CustomSympyPrinter._print_Pow`.
See #13
Problem: old versions of sympy (e.g. from ubuntu CI) don't have `sympy.codegen.rewrite`. The optimizations are skipped in that case. `test_and_coverage` applies all optimizations.
AES-NI vectorization improvements
2019-09-17T09:08:05+02:00
Michael Kuron
mkuron@icp.uni-stuttgart.de
AES-NI vectorization improvements
!30 didn't implement an SSE-vectorized `_mm_cvtepu64_pd` equivalent because the [stackoverflow](https://stackoverflow.com/a/41148578) solution didn't work. That turned out to be due to a bad optimization in GCC 5+ in fast-math mode. None of the other compilers (Clang, Intel, MSVC) have that issue, so we just disable fast-math for that function.

Also, we now use fused multiply-add if available.

Martin Bauer
Martin Bauer

CI: Add minimal-sympy-master
2019-09-17T09:06:54+02:00
Stephan Seitz
CI: Add minimal-sympy-master
Test for #11

This could warn us when SymPy introduces breaking changes.

The test with SymPy master is allowed to fail.
Also, we now use fused multiply-add if available.Martin BauerMartin Bauerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/50CI: Add minimal-sympy-master2019-09-17T09:06:54+02:00Stephan SeitzCI: Add minimal-sympy-masterTest for #11
This could warn us when SymPy introduces breaking changes.
The test with SymPy master is allowed to fail.Test for #11
This could warn us when SymPy introduces breaking changes.
Close pystencils' config file after writing
2019-09-17T09:06:15+02:00
Stephan Seitz
Close pystencils' config file after writing
I got a warning that this file remains unclosed.

Actually increment counter inside random_symbol
2019-09-04T15:06:57+02:00
Michael Kuron
mkuron@icp.uni-stuttgart.de
Actually increment counter inside random_symbol
@rudolfweeber is currently looking at the statistical mechanics of the fluctuating LB and found a velocity bias. It turned out that this is due to all generated random numbers using the same key. Instead it should be incremented when generating multiple random numbers in the same kernel.

So in the generated code,
So in the generated code,
```c++
philox_float4(time_step, ctr_0, ctr_1, ctr_2, 0, 2, Dummy_38, Dummy_39, Dummy_40, Dummy_41);
philox_float4(time_step, ctr_0, ctr_1, ctr_2, 0, 2, Dummy_34, Dummy_35, Dummy_36, Dummy_37);
philox_float4(time_step, ctr_0, ctr_1, ctr_2, 0, 2, Dummy_30, Dummy_31, Dummy_32, Dummy_33);
philox_float4(time_step, ctr_0, ctr_1, ctr_2, 0, 2, Dummy_26, Dummy_27, Dummy_28, Dummy_29);
```
becomes
```c++
philox_float4(time_step, ctr_0, ctr_1, ctr_2, 3, 2, Dummy_38, Dummy_39, Dummy_40, Dummy_41);
philox_float4(time_step, ctr_0, ctr_1, ctr_2, 2, 2, Dummy_34, Dummy_35, Dummy_36, Dummy_37);
philox_float4(time_step, ctr_0, ctr_1, ctr_2, 1, 2, Dummy_30, Dummy_31, Dummy_32, Dummy_33);
philox_float4(time_step, ctr_0, ctr_1, ctr_2, 0, 2, Dummy_26, Dummy_27, Dummy_28, Dummy_29);
Martin Bauer
Martin Bauer

Add pyhtml to tests and artifacts
2019-09-02T13:42:18+02:00
Stephan Seitz
Add pyhtml to tests and artifacts
I think this should suffice to produce the artifacts.

But `pytest-html` and `ansi2html` need to be added to the docker images.
But `pytest-html` and `ansi2html` need to be added to the docker images.I think this should suffice to produce the artifacts.
Fix typo in "pre-push"
2019-09-02T13:41:09+02:00
Stephan Seitz
Fix typo in "pre-push"