pystencils merge requestshttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests2019-07-10T16:14:26+02:00https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/1Address of SymPy-Function `address_of`2019-07-10T16:14:26+02:00Stephan SeitzAddress of SymPy-Function `address_of`Some CUDA functions (like `atomic_add`) require pointers to data. This PR adds a SymPy function representing the C address-of operator (`&`).
I tried to trigger cse to show a problem related to this function (dummy variables were not ...Some CUDA functions (like `atomic_add`) require pointers to data. This PR adds a SymPy function representing the C address-of operator (`&`).
I tried to trigger cse to show a problem related to this function (dummy variables were not typed correctly as pointer). I'll include the fix in a follow-up PR.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/3Make subexpressions optional for constructing an AssignmentCollection2019-07-10T16:18:05+02:00Stephan SeitzMake subexpressions optional for constructing an AssignmentCollectionWhen introducing new people to pystencils it's often simpler not to
differentiate between `main_assignments` and `subexpressions` in the
beginning.
Also for simple kernels subexpressions are often not needed, since
intermediate symbols c...When introducing new people to pystencils it's often simpler not to
differentiate between `main_assignments` and `subexpressions` in the
beginning.
Also for simple kernels subexpressions are often not needed, since
intermediate symbols can also be set in main_assignments.
Subexpression should be kept for expert users.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/4Destructuring field binding2019-07-10T17:24:07+02:00Stephan SeitzDestructuring field bindingAdd DestructuringBindingsForFieldClass to use pystencils kernels in a more C++-ish way
DestructuringBindingsForFieldClass defines all field-related variables
in its subordinated block.
However, it leaves a TypedSymbol of type `Field...Add DestructuringBindingsForFieldClass to use pystencils kernels in a more C++-ish way
DestructuringBindingsForFieldClass defines all field-related variables
in its subordinated block.
However, it leaves a TypedSymbol of type `Field` for each field
undefined.
By that trick we can generate kernels that accept structs as
kernelparameters.
Either to include a pystencils specific Field struct of the following
definition:
```cpp
template<DTYPE_T, DIMENSION>
struct Field
{
DTYPE_T* data;
std::array<int64_t, DIMENSION> shape;
std::array<int64_t, DIMENSION> stride;
}
```
or to be able to destructure user defined types like `pybind11::array`,
`at::Tensor`, `tensorflow::Tensor`.
The test generates a kernel like that:
```cpp
FUNC_PREFIX void kernel(Field<double, 2>& x, Field<double, 2>& y, Field<double, 2>& z)
{
_stride_z_1 = z.stride[1];
_size_x_0 = x.shape[0];
_stride_x_1 = x.stride[1];
_stride_z_0 = z.stride[0];
_size_x_1 = x.shape[1];
_stride_y_1 = y.stride[1];
_data_x = x.data;
_stride_x_0 = x.stride[0];
_data_z = z.data;
_stride_y_0 = y.stride[0];
_data_y = y.data;
{
for (int ctr_0 = 0; ctr_0 < _size_x_0; ctr_0 += 1)
{
double * RESTRICT _data_z_00 = _data_z + _stride_z_0*ctr_0;
double * RESTRICT const _data_y_00 = _data_y + _stride_y_0*ctr_0;
double * RESTRICT const _data_x_00 = _data_x + _stride_x_0*ctr_0;
for (int ctr_1 = 0; ctr_1 < _size_x_1; ctr_1 += 1)
{
_data_z_00[_stride_z_1*ctr_1] = log(_data_x_00[_stride_x_1*ctr_1]*_data_y_00[_stride_y_1*ctr_1])*_data_y_00[_stride_y_1*ctr_1];
}
}
}
}
```https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/5Add global_declarations to cbackend2019-07-10T16:20:44+02:00Stephan SeitzAdd global_declarations to cbackendThis enables `astnodes.Nodes` to have a member `required_global_declarations`
by which they can specify a global declaration required for their usage.
In the test, I added a AST-Node Bogus which requires a global declaration. The global...This enables `astnodes.Nodes` to have a member `required_global_declarations`
by which they can specify a global declaration required for their usage.
In the test, I added a AST-Node Bogus which requires a global declaration. The global declaration can define symbols required in the kernel that will then not appear in the kernel parameters
```cpp
// Declaration would go here
FUNC_PREFIX void kernel(double * RESTRICT const _data_x, double * RESTRICT const _data_y, double * RESTRICT _data_z, int64_t const _size_1, int64_t const _stride_z_0, int64_t const _stride_z_1)
{
for (int ctr_0 = 0; ctr_0 < _size_x_0; ctr_0 += 1)
{
double * RESTRICT _data_z_00 = _data_z + _stride_z_0*ctr_0;
double * RESTRICT const _data_y_00 = _data_y + _stride_y_0*ctr_0;
double * RESTRICT const _data_x_00 = _data_x + _stride_x_0*ctr_0;
for (int ctr_1 = 0; ctr_1 < _size_x_1; ctr_1 += 1)
{
_data_z_00[_stride_z_1*ctr_1] = log(_data_x_00[_stride_x_1*ctr_1]*_data_y_00[_stride_y_1*ctr_1])*_data_y_00[_stride_y_1*ctr_1];
}
}
// Bogus would go here
}
```
I used this code for my CudaBackend (instead of CBackend) to enable the forward declaration of textures and constant memory.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/9Add CudaBackend, CudaSympyPrinter2019-07-18T10:04:27+02:00Stephan SeitzAdd CudaBackend, CudaSympyPrinterAdd CudaBackend, CudaSympyPrinter to extract CUDA-specific code from CBackend, CustomSympyPrinter
Cuda built-ins are added to `CudaSympyPrinter.known_functions` to use them as sympy.FunctionAdd CudaBackend, CudaSympyPrinter to extract CUDA-specific code from CBackend, CustomSympyPrinter
Cuda built-ins are added to `CudaSympyPrinter.known_functions` to use them as sympy.Functionhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/12fix compiler options for macOS2019-07-31T09:14:52+02:00Michael Kuronmkuron@icp.uni-stuttgart.defix compiler options for macOSMartin BauerMartin Bauerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/14Remove floor, ceiling for integer symbols2019-08-02T22:26:37+02:00Stephan SeitzRemove floor, ceiling for integer symbols# Original Intent
Allow optimizations by SymPy when we know that a `TypedSymbol` `is_integer` or `is_real`
(e.g. drop rounding functions).
We can deduce some of those properties with Numpy's type system (https://docs.scipy.org/doc...# Original Intent
Allow optimizations by SymPy when we know that a `TypedSymbol` `is_integer` or `is_real`
(e.g. drop rounding functions).
We can deduce some of those properties with Numpy's type system (https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.scalars.html).
We have to be careful since all the `is_*` methods have ternary logic (`True`, `False`, `None`== we don't know).
Field.Access can take advantage of those optimizations by making it a subclass of `TypedSymbol`.
# Extended Changes
By writing a test I realized that it would be handy to compare `AssignmentCollection`s and use the functions `find`, `match`, `subs`, `replace` of SymPy.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/15implemented derivation of gradient weights via rotation2020-11-25T13:23:50+01:00Markus Holzerimplemented derivation of gradient weights via rotationderive gradient weights of other direction with
already calculated weights of one direction
via rotation and apply them to a field.derive gradient weights of other direction with
already calculated weights of one direction
via rotation and apply them to a field.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/16Declare FieldShapeSymbol and FieldStrideSymbol as strictly positive2019-08-06T08:06:27+02:00Stephan SeitzDeclare FieldShapeSymbol and FieldStrideSymbol as strictly positiveWe can assume that FieldShapeSymbol and FieldStrideSymbol are always positive.
`TypedSymbol` should forward kwargs to `sympy.Symbol`.We can assume that FieldShapeSymbol and FieldStrideSymbol are always positive.
`TypedSymbol` should forward kwargs to `sympy.Symbol`.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/18Fix #10: Avoid jinja2 dependency2019-08-06T08:05:02+02:00Stephan SeitzFix #10: Avoid jinja2 dependencyThis commit avoid dependency of core pystencils on jinja2.
However this could make the printing of some AST-nodes less elegant (see https://i10git.cs.fau.de/pycodegen/pystencils/merge_requests/17).This commit avoid dependency of core pystencils on jinja2.
However this could make the printing of some AST-nodes less elegant (see https://i10git.cs.fau.de/pycodegen/pystencils/merge_requests/17).https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/21Add RELEASE-VERSION to .gitignore2019-08-06T22:04:11+02:00Stephan SeitzAdd RELEASE-VERSION to .gitignorehttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/24Remove deprecation warning ('cachedir' parameter has been deprecated)2019-08-08T08:57:45+02:00Stephan SeitzRemove deprecation warning ('cachedir' parameter has been deprecated)Warning was:
```
/localhome/seitz_local/projects/pystencils/pystencils/cache.py:15: DeprecationWarning: The 'cachedir' parameter has been deprecated in
version 0.12 and will be removed in version 0.14.
You provided "cachedir='/local...Warning was:
```
/localhome/seitz_local/projects/pystencils/pystencils/cache.py:15: DeprecationWarning: The 'cachedir' parameter has been deprecated in
version 0.12 and will be removed in version 0.14.
You provided "cachedir='/localhome/seitz_local/.cache/pystencils'", use "location='/localhome/seitz_local/.cache/pystencils'" instead.
disk_cache = Memory(cachedir=cache_dir, verbose=False).cache
```https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/25Make generate_c also work if astnode does not have member `instruction_set`2019-08-06T22:07:34+02:00Stephan SeitzMake generate_c also work if astnode does not have member `instruction_set`generate_c currently only works for KernelFunctions, since member `instruction_set` is required.
generate_c can generate code for any astnode if this requirement is dropped.generate_c currently only works for KernelFunctions, since member `instruction_set` is required.
generate_c can generate code for any astnode if this requirement is dropped.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/27Fix error message of CBackend for unsupported nodes2019-08-15T09:15:02+02:00Stephan SeitzFix error message of CBackend for unsupported nodesConcatenating `__class__` and `str` is not supported. Should be `str(type(self))` (full type path) or `self.__class__.__name__` (just class name)Concatenating `__class__` and `str` is not supported. Should be `str(type(self))` (full type path) or `self.__class__.__name__` (just class name)https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/28Philox tests and clean up2019-08-13T14:17:37+02:00Michael Kuronmkuron@icp.uni-stuttgart.dePhilox tests and clean upTest the Philox against reference data and clean up duplicated code in the code generation. The latter will make it easier to later add a vectorized Philox.Test the Philox against reference data and clean up duplicated code in the code generation. The latter will make it easier to later add a vectorized Philox.Martin BauerMartin Bauerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/29Basic support for OpenCL (experimental)2019-08-22T08:37:37+02:00Stephan SeitzBasic support for OpenCL (experimental)Basic support for OpenCL
Problem: OpenCL cannot import `stdint.h`. Temporary fix: define custom `opencl_stdint.h` (~~defines currently only `int64_t`~~ `)
TODO:
- ~~implement `opencl_stdint.h`~~
- implement shard_mem, textures,...Basic support for OpenCL
Problem: OpenCL cannot import `stdint.h`. Temporary fix: define custom `opencl_stdint.h` (~~defines currently only `int64_t`~~ `)
TODO:
- ~~implement `opencl_stdint.h`~~
- implement shard_mem, textures, built-in functions
- ~~avoid CUDA intrinsics (`fast_div`)~~https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/30AES-NI Random Number Generator2019-09-02T10:21:21+02:00Michael Kuronmkuron@icp.uni-stuttgart.deAES-NI Random Number GeneratorI was looking at how to vectorize the Philox RNG yesterday. Before I knew it, I had implemented a working RNG using AES-NI instructions :nerd: ... Not entirely what I had intended to do, but it might still be useful to someone and should...I was looking at how to vectorize the Philox RNG yesterday. Before I knew it, I had implemented a working RNG using AES-NI instructions :nerd: ... Not entirely what I had intended to do, but it might still be useful to someone and should be similarly fast as a vectorized Philox.
There is one place that could be optimized because I fall back to scalar instructions: I failed to reimplement `_mm_cvtepu64_pd` (the solution from https://stackoverflow.com/a/41148578 produces incorrect results in the least-significant half of the mantissa). Perhaps someone else can try to fix that.
I did not integrate this with the `vector_instruction_set` parameter of the code generation. Perhaps you can do that, @bauer. It needs support for SSE2 and AES instructions (which look like SSE2 instructions, but their availability is determined by a separate CPUID flag). It will also make use of `_mm_cvtepu32_ps` and `_mm_cvtepu64_pd` from AVX512 if available (these are 128-bit instructions that actually look like SSE2 instructions).Martin BauerMartin Bauerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/31Bugfix: TypedSymbol.is_negative should not be implemented in terms of super()...2019-08-14T17:03:02+02:00Stephan SeitzBugfix: TypedSymbol.is_negative should not be implemented in terms of super().is_positiveThis can lead to surprising simplificationsThis can lead to surprising simplificationshttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/32Bugfix: Readd __launch_bounds__ for dialect 'cuda'2019-08-15T09:14:26+02:00Stephan SeitzBugfix: Readd __launch_bounds__ for dialect 'cuda'__launch_bounds__ was deactivated when introducing `CudaBackend`__launch_bounds__ was deactivated when introducing `CudaBackend`https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/33Add KernelFunction.fields_written2019-08-16T08:59:16+02:00Stephan SeitzAdd KernelFunction.fields_writtenI found myself needing this convenience wrapper in various places.I found myself needing this convenience wrapper in various places.