pystencils issueshttps://i10git.cs.fau.de/pycodegen/pystencils/-/issues2021-11-19T15:43:59+01:00https://i10git.cs.fau.de/pycodegen/pystencils/-/issues/26Use AVX512 masked intrinsics2021-11-19T15:43:59+01:00Michael Kuronmkuron@icp.uni-stuttgart.deUse AVX512 masked intrinsicsAVX512 provides intrinsics like `_mm512_mask_add_pd`, which is like `_mm512_add_pd` with a write mask. This can be used to efficiently filter out writes to non-fluid cells. It might also be useful to optimize things like `sp.Piecewise`. ...AVX512 provides intrinsics like `_mm512_mask_add_pd`, which is like `_mm512_add_pd` with a write mask. This can be used to efficiently filter out writes to non-fluid cells. It might also be useful to optimize things like `sp.Piecewise`. Would also work with SVE vectorization on future ARM processors.Jan HönigJan Hönighttps://i10git.cs.fau.de/pycodegen/pystencils/-/issues/25Non-temporal stores do not use fences2021-04-12T10:40:22+02:00Michael Kuronmkuron@icp.uni-stuttgart.deNon-temporal stores do not use fencesWhen vectorization is enabled, instructions like `_mm(|256|512)_stream_p[sd]` are generated. However, the corresponding fence `_mm_mfence` is never generated. This is not a problem in practice as enough time will have passed by the time ...When vectorization is enabled, instructions like `_mm(|256|512)_stream_p[sd]` are generated. However, the corresponding fence `_mm_mfence` is never generated. This is not a problem in practice as enough time will have passed by the time the data is next read. However, an explicit fence should be added to guarantee safety.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/issues/24CBackend uses aligned_alloc, which requires C++172021-08-19T11:13:32+02:00Michael Kuronmkuron@icp.uni-stuttgart.deCBackend uses aligned_alloc, which requires C++17backends/cbackend.py generates code that contains `aligned_alloc`. This is incompatible with our default compiler flags, which include `-std=c++11`. It is also incompatible with Walberla, which defaults to C++14. I guess GCC doesn't care...backends/cbackend.py generates code that contains `aligned_alloc`. This is incompatible with our default compiler flags, which include `-std=c++11`. It is also incompatible with Walberla, which defaults to C++14. I guess GCC doesn't care, but I've seen the issue come up with the latest Apple Clang, which interprets the standard more strictly than it used to.
We need to fall back to `posix_memalign` on POSIX and `_aligned_malloc` on Windows.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/issues/23Cannot simplify piecewise function with field access in condition2021-01-05T17:44:27+01:00Michael Kuronmkuron@icp.uni-stuttgart.deCannot simplify piecewise function with field access in conditionThis fails:
```python
from pystencils.session import *
dh = ps.create_data_handling((20,20))
ρ = dh.add_array('rho')
pw = sp.Piecewise((0, 1 < sp.Max(-0.5, ρ.center+0.5)), (1, True))
sp.simplify(pw)
```
with the following error:
```
./p...This fails:
```python
from pystencils.session import *
dh = ps.create_data_handling((20,20))
ρ = dh.add_array('rho')
pw = sp.Piecewise((0, 1 < sp.Max(-0.5, ρ.center+0.5)), (1, True))
sp.simplify(pw)
```
with the following error:
```
./pystencils/pystencils/field.py in __iter__(self)
760 """This is necessary to work with parts of sympy that test if an object is iterable (e.g. simplify).
761 The __getitem__ would make it iterable"""
--> 762 raise TypeError("Field access is not iterable")
763
764 @property
TypeError: Field access is not iterable
```
Here are two similar examples that do not produce such an error:
```python
s = sp.Symbol("s")
pw = sp.Piecewise((0, 1 < sp.Max(-0.5, s+0.5)), (1, True))
sp.simplify(pw)
pw = sp.Piecewise((0, 1 < ρ.center+0.5), (1, True))
sp.simplify(pw)
```Stephan SeitzStephan Seitzhttps://i10git.cs.fau.de/pycodegen/pystencils/-/issues/222nd order finite volume discretizer2021-04-18T00:05:56+02:00Michael Kuronmkuron@icp.uni-stuttgart.de2nd order finite volume discretizerWe currently have a 1st order FVM and 1st and 2nd order FDM discretizer. 2nd order FVM would be nice to have too. Some problems decompose into two subgrids with 1st order, causing stability problems unless extreme resolutions are used.We currently have a 1st order FVM and 1st and 2nd order FDM discretizer. 2nd order FVM would be nice to have too. Some problems decompose into two subgrids with 1st order, causing stability problems unless extreme resolutions are used.https://i10git.cs.fau.de/pycodegen/pystencils/-/issues/21Increase Python minimum version to 3.82021-09-10T11:50:36+02:00Michael Kuronmkuron@icp.uni-stuttgart.deIncrease Python minimum version to 3.8Once Ubuntu 20.04 has been out for a year and Anaconda supports it, let's update to Python 3.8.Once Ubuntu 20.04 has been out for a year and Anaconda supports it, let's update to Python 3.8.Jan HönigJan Hönighttps://i10git.cs.fau.de/pycodegen/pystencils/-/issues/19Need test job against older SymPy version2020-01-24T12:15:59+01:00Michael Kuronmkuron@icp.uni-stuttgart.deNeed test job against older SymPy versionMy desktop computer runs Ubuntu 18.04, which includes SymPy 1.1.1. pystencils generally works fine on that version, but it is sufficiently different that we have occasionally broken support of it in the past, e.g. https://i10git.cs.fau.d...My desktop computer runs Ubuntu 18.04, which includes SymPy 1.1.1. pystencils generally works fine on that version, but it is sufficiently different that we have occasionally broken support of it in the past, e.g. https://i10git.cs.fau.de/pycodegen/pystencils/merge_requests/105. Furthermore, a slightly different simplification engine in newer Sympy versions has previously masked actual bugs (https://i10git.cs.fau.de/pycodegen/lbmpy/merge_requests/14 / https://i10git.cs.fau.de/pycodegen/pystencils/commit/721fdf454c024bc1ed8db65b31a91cf802b2dae7). To solve this properly, we should define a minimum SymPy version and perform CI tests using that specific version. Currently we only test the latest release and the latest master.Michael Kuronmkuron@icp.uni-stuttgart.deMichael Kuronmkuron@icp.uni-stuttgart.dehttps://i10git.cs.fau.de/pycodegen/pystencils/-/issues/18Checking for double/int instead of np.floating np.integer in LLVM printer2021-11-19T15:43:26+01:00Stephan SeitzChecking for double/int instead of np.floating np.integer in LLVM printerThose checks could get problematic when compling a kernel with float32/int32 instead of the default double/int types.
https://i10git.cs.fau.de/seitz/pystencils/blob/79a6e728e789a890ca36a7231baf4274487ddfdb/pystencils/llvm/llvm.py#L130Those checks could get problematic when compling a kernel with float32/int32 instead of the default double/int types.
https://i10git.cs.fau.de/seitz/pystencils/blob/79a6e728e789a890ca36a7231baf4274487ddfdb/pystencils/llvm/llvm.py#L130Jan HönigJan Hönighttps://i10git.cs.fau.de/pycodegen/pystencils/-/issues/17Allow ps.Assignment to sp.Matrix2020-01-23T15:22:33+01:00Michael Kuronmkuron@icp.uni-stuttgart.deAllow ps.Assignment to sp.MatrixThis currently doesn't work:
```python
import pystencils as ps
import sympy as sp
a, b, c = sp.symbols("a b c")
ps.Assignment(sp.Matrix([a,b,c]), sp.Matrix([1,2,3]))
```
The assignment should be automatically transformed into assignments...This currently doesn't work:
```python
import pystencils as ps
import sympy as sp
a, b, c = sp.symbols("a b c")
ps.Assignment(sp.Matrix([a,b,c]), sp.Matrix([1,2,3]))
```
The assignment should be automatically transformed into assignments for each component.Stephan SeitzStephan Seitzhttps://i10git.cs.fau.de/pycodegen/pystencils/-/issues/16Staggered grids: allow more than just face neighbors2019-12-01T09:44:31+01:00Michael Kuronmkuron@icp.uni-stuttgart.deStaggered grids: allow more than just face neighborsWhen defining a kernel on a staggered grid, one can currently only store values on the faces/edges (3D/2D) of a cell. This is sufficient for common finite volume schemes. However, when one has a specific discretization in mind (in my cas...When defining a kernel on a staggered grid, one can currently only store values on the faces/edges (3D/2D) of a cell. This is sufficient for common finite volume schemes. However, when one has a specific discretization in mind (in my case, Capuani's electrokinetics solver), one may also need to calculate fluxes for the edges/corners (3D/2D). For a volume-of-fluid-like scheme, one also needs the corners in 3D.
So currently we only support D3Q6/D2Q4 staggered grids and I need D3Q26/D2Q8. This was already discussed with @bauer and is probably not a whole lot of work to implement.Michael Kuronmkuron@icp.uni-stuttgart.deMichael Kuronmkuron@icp.uni-stuttgart.dehttps://i10git.cs.fau.de/pycodegen/pystencils/-/issues/15OpenCL backend still requires pycuda to be installed2020-01-18T15:51:37+01:00Michael Kuronmkuron@icp.uni-stuttgart.deOpenCL backend still requires pycuda to be installedI wanted to try out the OpenCL backend on a machine that does not have the CUDA SDK or pycuda installed. Unfortunately, pystencils imports stuff from pycuda in multiple places throughout the code, so I cannot use the OpenCL backend on th...I wanted to try out the OpenCL backend on a machine that does not have the CUDA SDK or pycuda installed. Unfortunately, pystencils imports stuff from pycuda in multiple places throughout the code, so I cannot use the OpenCL backend on this machine.Stephan SeitzStephan Seitzhttps://i10git.cs.fau.de/pycodegen/pystencils/-/issues/14Minor installation issue if only cloning last commit2021-03-15T10:42:15+01:00Nils KohlMinor installation issue if only cloning last commitIf I clone pystencils with `--depth 1` the function `version_number_from_git()` in `pystencils/doc/`
will crash during `python setup.py` since `git tag` will return nothing:
```
$ git clone --depth 1 --branch hyteg git@i10git.cs.fau.de...If I clone pystencils with `--depth 1` the function `version_number_from_git()` in `pystencils/doc/`
will crash during `python setup.py` since `git tag` will return nothing:
```
$ git clone --depth 1 --branch hyteg git@i10git.cs.fau.de:pycodegen/pystencils.git
$ git tag
$ python3 setup.py develop
Traceback (most recent call last):
File "setup.py", line 52, in <module>
version=version_number_from_git(),
File "/builds/terraneo/pystencils/doc/version_from_git.py", line 18, in version_number_from_git
latest_release = get_released_versions()[-1]
IndexError: list index out of range
```https://i10git.cs.fau.de/pycodegen/pystencils/-/issues/12Vectorized Philox RNG2021-02-11T14:49:03+01:00Michael Kuronmkuron@icp.uni-stuttgart.deVectorized Philox RNGMartin BauerMartin Bauerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/issues/11FieldPointerSymbol is rendered as a sympy.Symbol instead of a TypedSymbol2020-01-20T11:58:48+01:00Stephan SeitzFieldPointerSymbol is rendered as a sympy.Symbol instead of a TypedSymbol`python3 setup.py quicktest` fails on current sympy master.
I don't know whether this is a temporary issue or a change of SymPy's behavior.`python3 setup.py quicktest` fails on current sympy master.
I don't know whether this is a temporary issue or a change of SymPy's behavior.https://i10git.cs.fau.de/pycodegen/pystencils/-/issues/10replace jinja2 with default python features2019-08-08T12:21:54+02:00Dominik Thoennesdominik.thoennes@fau.dereplace jinja2 with default python featuresjinja2 should not be used in base pystencils since it introduces an additional dependency.
it is e.g. used in:
- https://i10git.cs.fau.de/pycodegen/pystencils/blob/master/pystencils/astnodes.py#L676jinja2 should not be used in base pystencils since it introduces an additional dependency.
it is e.g. used in:
- https://i10git.cs.fau.de/pycodegen/pystencils/blob/master/pystencils/astnodes.py#L676https://i10git.cs.fau.de/pycodegen/pystencils/-/issues/9Explore usage of `fma` for optimization on CUDA2019-08-21T18:39:09+02:00Stephan SeitzExplore usage of `fma` for optimization on CUDAI don't know whether nvcc automatically uses `fma` instructions (fused-multiply-add) when compiling with `-fast-math` flag.
If not, it could be easy to use `fma` whenever possible to accelerate compute-bound kernels.
https://devblogs....I don't know whether nvcc automatically uses `fma` instructions (fused-multiply-add) when compiling with `-fast-math` flag.
If not, it could be easy to use `fma` whenever possible to accelerate compute-bound kernels.
https://devblogs.nvidia.com/lerp-faster-cuda/https://i10git.cs.fau.de/pycodegen/pystencils/-/issues/8Simplification of derivation of gradient2020-11-25T13:23:51+01:00Markus HolzerSimplification of derivation of gradientWhen the weights of one direction are computed it would be possible to get the weights of the other directions via rotating the previous calculated. This functionality could be inserted in: [derivation.py](https://i10git.cs.fau.de/pycode...When the weights of one direction are computed it would be possible to get the weights of the other directions via rotating the previous calculated. This functionality could be inserted in: [derivation.py](https://i10git.cs.fau.de/pycodegen/pystencils/blob/master/pystencils/fd/derivation.py)https://i10git.cs.fau.de/pycodegen/pystencils/-/issues/7In compile_and_load C-code is generated twice2021-03-03T16:41:23+01:00Stephan SeitzIn compile_and_load C-code is generated twiceWhen I was reviewing my old code I saw that I placed a TODO here.
`generate_c` is just called to generate the hash and then again for the real code (if not loaded from shared object on disk).
Maybe the ast can be hashed directly instead ...When I was reviewing my old code I saw that I placed a TODO here.
`generate_c` is just called to generate the hash and then again for the real code (if not loaded from shared object on disk).
Maybe the ast can be hashed directly instead (danger point: generate_c may have changed since last generation).
I remember that I directly hashed the assignments for generation of torch/tensorflow code.
This had the disadvantage that I had to deactivate caching when developing/changing code affecting `generate_c`.
```python
def compile_and_load(ast):
cache_config = get_cache_config()
# TODO: inefficient to generate_c just for hash? could reuse it
code_hash_str = "mod_" + hashlib.sha256(generate_c(ast).encode()).hexdigest()
code = ExtensionModuleCode(module_name=code_hash_str)
code.add_function(ast, ast.function_name)
if cache_config['object_cache'] is False:
with TemporaryDirectory() as base_dir:
lib_file = compile_module(code, code_hash_str, base_dir)
result = load_kernel_from_file(code_hash_str, ast.function_name, lib_file)
else:
lib_file = compile_module(code, code_hash_str, base_dir=cache_config['object_cache'])
result = load_kernel_from_file(code_hash_str, ast.function_name, lib_file)
rtn = KernelWrapper(result, ast.get_parameters(), ast)
rtn.code = code.code
return rtn
```https://i10git.cs.fau.de/pycodegen/pystencils/-/issues/6show_code does not show code2020-01-23T15:22:33+01:00Stephan Seitzshow_code does not show codeshow_code does not show code but returns an object. Maybe rename it? `get_code`
`show_code` would still be useful to implement `print(show_code(ast))`
When I first saw the function I dropped the return value.show_code does not show code but returns an object. Maybe rename it? `get_code`
`show_code` would still be useful to implement `print(show_code(ast))`
When I first saw the function I dropped the return value.https://i10git.cs.fau.de/pycodegen/pystencils/-/issues/5Compare pystencils and loopy2020-06-08T13:30:29+02:00Jan HönigCompare pystencils and loopyExcluisve Features loopy
- more general indexing
- loop transformations
- unrolling
- tiling
- blocking
- OpenCL
Exclusive Features pystencils
- automatic compiling
- struct support??
- LLVM
- CUDA mapping strategiesExcluisve Features loopy
- more general indexing
- loop transformations
- unrolling
- tiling
- blocking
- OpenCL
Exclusive Features pystencils
- automatic compiling
- struct support??
- LLVM
- CUDA mapping strategiesJan HönigJan Hönig