pystencils merge requestshttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests2019-07-10T16:14:26+02:00https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/1Address of SymPy-Function `address_of`2019-07-10T16:14:26+02:00Stephan SeitzAddress of SymPy-Function `address_of`Some CUDA functions (like `atomic_add`) require pointers to data. This PR adds a SymPy function representing the C address-of operator (`&`).
I tried to trigger cse to show a problem related to this function (dummy variables were not ...Some CUDA functions (like `atomic_add`) require pointers to data. This PR adds a SymPy function representing the C address-of operator (`&`).
I tried to trigger cse to show a problem related to this function (dummy variables were not typed correctly as pointer). I'll include the fix in a follow-up PR.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/2Add autodiff2019-07-11T12:12:00+02:00Stephan SeitzAdd autodiffDraft for minimal integration of automatic differentiation. Tensorflow and Torch backend were removed (apart from AutoDiffOp.create_tensorlfow())
Only tests without LBM or tf/torch dependencies have been added. This implies that also nu...Draft for minimal integration of automatic differentiation. Tensorflow and Torch backend were removed (apart from AutoDiffOp.create_tensorlfow())
Only tests without LBM or tf/torch dependencies have been added. This implies that also numeric gradient checking is missing (depends on either tf or torch). Should we really move the backends into separate modules?
Apart from adding auto-differentiation functionality I added two more changes. Would be happy to AutoDiffOp.create_tensorflow() after feedback.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/3Make subexpressions optional for constructing an AssignmentCollection2019-07-10T16:18:05+02:00Stephan SeitzMake subexpressions optional for constructing an AssignmentCollectionWhen introducing new people to pystencils it's often simpler not to
differentiate between `main_assignments` and `subexpressions` in the
beginning.
Also for simple kernels subexpressions are often not needed, since
intermediate symbols c...When introducing new people to pystencils it's often simpler not to
differentiate between `main_assignments` and `subexpressions` in the
beginning.
Also for simple kernels subexpressions are often not needed, since
intermediate symbols can also be set in main_assignments.
Subexpression should be kept for expert users.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/4Destructuring field binding2019-07-10T17:24:07+02:00Stephan SeitzDestructuring field bindingAdd DestructuringBindingsForFieldClass to use pystencils kernels in a more C++-ish way
DestructuringBindingsForFieldClass defines all field-related variables
in its subordinated block.
However, it leaves a TypedSymbol of type `Field...Add DestructuringBindingsForFieldClass to use pystencils kernels in a more C++-ish way
DestructuringBindingsForFieldClass defines all field-related variables
in its subordinated block.
However, it leaves a TypedSymbol of type `Field` for each field
undefined.
By that trick we can generate kernels that accept structs as
kernelparameters.
Either to include a pystencils specific Field struct of the following
definition:
```cpp
template<DTYPE_T, DIMENSION>
struct Field
{
DTYPE_T* data;
std::array<int64_t, DIMENSION> shape;
std::array<int64_t, DIMENSION> stride;
}
```
or to be able to destructure user defined types like `pybind11::array`,
`at::Tensor`, `tensorflow::Tensor`.
The test generates a kernel like that:
```cpp
FUNC_PREFIX void kernel(Field<double, 2>& x, Field<double, 2>& y, Field<double, 2>& z)
{
_stride_z_1 = z.stride[1];
_size_x_0 = x.shape[0];
_stride_x_1 = x.stride[1];
_stride_z_0 = z.stride[0];
_size_x_1 = x.shape[1];
_stride_y_1 = y.stride[1];
_data_x = x.data;
_stride_x_0 = x.stride[0];
_data_z = z.data;
_stride_y_0 = y.stride[0];
_data_y = y.data;
{
for (int ctr_0 = 0; ctr_0 < _size_x_0; ctr_0 += 1)
{
double * RESTRICT _data_z_00 = _data_z + _stride_z_0*ctr_0;
double * RESTRICT const _data_y_00 = _data_y + _stride_y_0*ctr_0;
double * RESTRICT const _data_x_00 = _data_x + _stride_x_0*ctr_0;
for (int ctr_1 = 0; ctr_1 < _size_x_1; ctr_1 += 1)
{
_data_z_00[_stride_z_1*ctr_1] = log(_data_x_00[_stride_x_1*ctr_1]*_data_y_00[_stride_y_1*ctr_1])*_data_y_00[_stride_y_1*ctr_1];
}
}
}
}
```https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/5Add global_declarations to cbackend2019-07-10T16:20:44+02:00Stephan SeitzAdd global_declarations to cbackendThis enables `astnodes.Nodes` to have a member `required_global_declarations`
by which they can specify a global declaration required for their usage.
In the test, I added a AST-Node Bogus which requires a global declaration. The global...This enables `astnodes.Nodes` to have a member `required_global_declarations`
by which they can specify a global declaration required for their usage.
In the test, I added a AST-Node Bogus which requires a global declaration. The global declaration can define symbols required in the kernel that will then not appear in the kernel parameters
```cpp
// Declaration would go here
FUNC_PREFIX void kernel(double * RESTRICT const _data_x, double * RESTRICT const _data_y, double * RESTRICT _data_z, int64_t const _size_1, int64_t const _stride_z_0, int64_t const _stride_z_1)
{
for (int ctr_0 = 0; ctr_0 < _size_x_0; ctr_0 += 1)
{
double * RESTRICT _data_z_00 = _data_z + _stride_z_0*ctr_0;
double * RESTRICT const _data_y_00 = _data_y + _stride_y_0*ctr_0;
double * RESTRICT const _data_x_00 = _data_x + _stride_x_0*ctr_0;
for (int ctr_1 = 0; ctr_1 < _size_x_1; ctr_1 += 1)
{
_data_z_00[_stride_z_1*ctr_1] = log(_data_x_00[_stride_x_1*ctr_1]*_data_y_00[_stride_y_1*ctr_1])*_data_y_00[_stride_y_1*ctr_1];
}
}
// Bogus would go here
}
```
I used this code for my CudaBackend (instead of CBackend) to enable the forward declaration of textures and constant memory.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/6Fix deprecation warning: `collections.abc` instead of `abc`2019-07-10T17:04:10+02:00Stephan SeitzFix deprecation warning: `collections.abc` instead of `abc`DeprecationWarning: Using or importing the ABCs from 'collections'
instead of from '' is deprecated, and in 3.8 it will stop workingDeprecationWarning: Using or importing the ABCs from 'collections'
instead of from '' is deprecated, and in 3.8 it will stop workinghttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/7Add `pystencils.make_python_function` used for KernelFunction.compile2019-07-18T11:56:29+02:00Stephan SeitzAdd `pystencils.make_python_function` used for KernelFunction.compile`KernelFunction.compile = None` is currently set by the
`create_kernel` function of each respective backend as partial function
of `<backend>.make_python_function`.
The code would be clearer with a unified `make_python_function`.
`Kerne...`KernelFunction.compile = None` is currently set by the
`create_kernel` function of each respective backend as partial function
of `<backend>.make_python_function`.
The code would be clearer with a unified `make_python_function`.
`KernelFunction.compile` can then be implemented as a call to this
function with the respective backend.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/8Auto-format pystencils/rng.py (trailing whitespace)2019-07-18T10:28:27+02:00Stephan SeitzAuto-format pystencils/rng.py (trailing whitespace)My editor feels better if that whitespace is not there.My editor feels better if that whitespace is not there.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/9Add CudaBackend, CudaSympyPrinter2019-07-18T10:04:27+02:00Stephan SeitzAdd CudaBackend, CudaSympyPrinterAdd CudaBackend, CudaSympyPrinter to extract CUDA-specific code from CBackend, CustomSympyPrinter
Cuda built-ins are added to `CudaSympyPrinter.known_functions` to use them as sympy.FunctionAdd CudaBackend, CudaSympyPrinter to extract CUDA-specific code from CBackend, CustomSympyPrinter
Cuda built-ins are added to `CudaSympyPrinter.known_functions` to use them as sympy.Functionhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/10Add CustomSympyPrinter._print_Sum2019-08-05T16:44:54+02:00Stephan SeitzAdd CustomSympyPrinter._print_SumThis makes sympy.Sum printable as instantaniously invoked lambda (Attention: C++-only, works in CUDA)This makes sympy.Sum printable as instantaniously invoked lambda (Attention: C++-only, works in CUDA)https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/11Fixup for DestructuringBindingsForFieldClass2019-07-18T10:28:09+02:00Stephan SeitzFixup for DestructuringBindingsForFieldClass- rename header Field.h is not a unique name in waLBerla context
- add PyStencilsField.h
- bindings were lacking data type- rename header Field.h is not a unique name in waLBerla context
- add PyStencilsField.h
- bindings were lacking data typehttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/12fix compiler options for macOS2019-07-31T09:14:52+02:00Michael Kuronmkuron@icp.uni-stuttgart.defix compiler options for macOSMartin BauerMartin Bauerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/13implemented derivation of gradient weights via rotation2019-08-03T14:01:23+02:00Markus Holzerimplemented derivation of gradient weights via rotationderive gradient weights of other direction with
already calculated weights of one direction
via rotation and apply them to a field.derive gradient weights of other direction with
already calculated weights of one direction
via rotation and apply them to a field.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/14Remove floor, ceiling for integer symbols2019-08-02T22:26:37+02:00Stephan SeitzRemove floor, ceiling for integer symbols# Original Intent
Allow optimizations by SymPy when we know that a `TypedSymbol` `is_integer` or `is_real`
(e.g. drop rounding functions).
We can deduce some of those properties with Numpy's type system (https://docs.scipy.org/doc...# Original Intent
Allow optimizations by SymPy when we know that a `TypedSymbol` `is_integer` or `is_real`
(e.g. drop rounding functions).
We can deduce some of those properties with Numpy's type system (https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.scalars.html).
We have to be careful since all the `is_*` methods have ternary logic (`True`, `False`, `None`== we don't know).
Field.Access can take advantage of those optimizations by making it a subclass of `TypedSymbol`.
# Extended Changes
By writing a test I realized that it would be handy to compare `AssignmentCollection`s and use the functions `find`, `match`, `subs`, `replace` of SymPy.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/15implemented derivation of gradient weights via rotation2020-11-25T13:23:50+01:00Markus Holzerimplemented derivation of gradient weights via rotationderive gradient weights of other direction with
already calculated weights of one direction
via rotation and apply them to a field.derive gradient weights of other direction with
already calculated weights of one direction
via rotation and apply them to a field.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/16Declare FieldShapeSymbol and FieldStrideSymbol as strictly positive2019-08-06T08:06:27+02:00Stephan SeitzDeclare FieldShapeSymbol and FieldStrideSymbol as strictly positiveWe can assume that FieldShapeSymbol and FieldStrideSymbol are always positive.
`TypedSymbol` should forward kwargs to `sympy.Symbol`.We can assume that FieldShapeSymbol and FieldStrideSymbol are always positive.
`TypedSymbol` should forward kwargs to `sympy.Symbol`.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/17Fix #10: Add jinja2 to pystencils's dependencies2019-08-06T08:05:38+02:00Stephan SeitzFix #10: Add jinja2 to pystencils's dependenciesAlternative would be remove jinja2 (see other PR).
However, I think dependency on jinja2 is not to heavy.
This could make some implementations more elegant.Alternative would be remove jinja2 (see other PR).
However, I think dependency on jinja2 is not to heavy.
This could make some implementations more elegant.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/18Fix #10: Avoid jinja2 dependency2019-08-06T08:05:02+02:00Stephan SeitzFix #10: Avoid jinja2 dependencyThis commit avoid dependency of core pystencils on jinja2.
However this could make the printing of some AST-nodes less elegant (see https://i10git.cs.fau.de/pycodegen/pystencils/merge_requests/17).This commit avoid dependency of core pystencils on jinja2.
However this could make the printing of some AST-nodes less elegant (see https://i10git.cs.fau.de/pycodegen/pystencils/merge_requests/17).https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/19Run flynt (https://pypi.org/project/flynt/) on pystencils2019-08-06T08:03:25+02:00Stephan SeitzRun flynt (https://pypi.org/project/flynt/) on pystencilsThis replaces usages of "" % s by Python's f-strings.
Is this a good thing? I don't know. :shrug:This replaces usages of "" % s by Python's f-strings.
Is this a good thing? I don't know. :shrug:https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/20WIP: Astnodes for interpolation2019-09-24T18:59:50+02:00Stephan SeitzWIP: Astnodes for interpolationThis PR needs maybe still needs some clean-up.
However, it would be good to recieve already some feed-back.
What works:
- Using CUDA textures
- Using HW accelerated interpolation for float32 textures
- Implement linear interpol...This PR needs maybe still needs some clean-up.
However, it would be good to recieve already some feed-back.
What works:
- Using CUDA textures
- Using HW accelerated interpolation for float32 textures
- Implement linear interpolations either via software (CPU, GPU), texture accesses without HW-interpolation but HW boundary handling
- Adding transformed coordinate systems to fields
What does not work:
- HW boundary handling for CUDA textures for the boundary handling modes `mirror` and `wrap` (apparently they have been removed from CUDA's API but are still present in pycuda. Now there's only
```
cudaBoundaryModeZero = 0
Zero boundary mode
cudaBoundaryModeClamp = 1
Clamp boundary mode
cudaBoundaryModeTrap = 2
Trap boundary mode
```
Wtf is trap boundary mode? Nothing is documented so we can only experiment.
What kind of works:
- B-Spline interpolation on GPU using this repo as a submodule (http://www.dannyruijters.nl/cubicinterpolation/), to lazy for tests. Don't know how to prove correctness
- Textures for dtypes with itemsize > 4. PyCUDA has helper header (https://github.com/inducer/pycuda/blob/master/pycuda/cuda/pycuda-helpers.hpp) that loads doubles by two int fetches. However, this hack seems to be only working if we add a 0.5 offset and make all functions in this header accept float.