pystencils merge requestshttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests2022-05-23T17:28:23+02:00https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/295Revision32022-05-23T17:28:23+02:00Markus HolzerRevision3More revisions due to !292More revisions due to !292Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/293Revisions2022-05-25T10:19:33+02:00Markus HolzerRevisionsRevisions for !292Revisions for !292Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/48RNG SIMD2021-02-12T22:31:36+01:00Michael Kuronmkuron@icp.uni-stuttgart.deRNG SIMDI've vectorized the Philox and AES-NI RNGs, fixes #12. I had to add a very minimal integer vectorization that only supports `int32`, `makeVec`, `+`, and loop counters. Also, the `RNGNode` now needs to know already during construction how...I've vectorized the Philox and AES-NI RNGs, fixes #12. I had to add a very minimal integer vectorization that only supports `int32`, `makeVec`, `+`, and loop counters. Also, the `RNGNode` now needs to know already during construction how it's vectorized, which is ugly, but could only be resolved by a better type system (#20). For the same reason, it is not possible to use a vectorized float RNG with double fields or vice versa. Also, we essentially discard half the random numbers in double precision mode because otherwise the number of variables we return would change between the vectorized and non-vectorized version, which is incompatible with the interface.
For the tests, we need to add `pip3 install randomgen` to the Dockerfile.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/77Run opencl without pycuda2019-10-21T14:06:35+02:00Stephan SeitzRun opencl without pycudaFix #15
This includes !76.
If anyone wants to use textures on OpenCL, we need to decouple `TextureInterpolatedField` from CUDA.Fix #15
This includes !76.
If anyone wants to use textures on OpenCL, we need to decouple `TextureInterpolatedField` from CUDA.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/307Sane Defaults for CreateKernelConfig2022-10-25T11:16:04+02:00Markus HolzerSane Defaults for CreateKernelConfigBy default, the number type of float numbers should be the same as the default typeBy default, the number type of float numbers should be the same as the default typeMarkus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/122SerialDataHandling: synchronization_function for tensor fields2020-01-16T10:57:16+01:00Michael Kuronmkuron@icp.uni-stuttgart.deSerialDataHandling: synchronization_function for tensor fieldsIt already worked on the CPU, just needed to remove the check. On the GPU, we use `itertools.product` to create the nested loop needed.It already worked on the CPU, just needed to remove the check. On the GPU, we use `itertools.product` to create the nested loop needed.Martin BauerMartin Bauerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/66Set assumptions for TypedSymbol/cast_func/IntegerFunctionTwoArgsMixIn the Sym...2019-09-30T14:11:05+02:00Stephan SeitzSet assumptions for TypedSymbol/cast_func/IntegerFunctionTwoArgsMixIn the SymPy wayAfter having a nearly week long discussion on assumptions in my SymPy PR, I got some idea of how the assumptions in SymPy are working.
It's interesting that you can use `Function.__new__(cls, integer=True)` for `UndefinedFunction`s li...After having a nearly week long discussion on assumptions in my SymPy PR, I got some idea of how the assumptions in SymPy are working.
It's interesting that you can use `Function.__new__(cls, integer=True)` for `UndefinedFunction`s like `Function('f', interger=True)` but not for subclassese of `Function`.
Now things like `(2*f.shape[0]).is_integer` are working.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/285sharedmethodcache2022-03-16T10:20:00+01:00Frederik HennigsharedmethodcacheAdded a per-instance method cache decorator, which can be shared among multiple methods with the same signature.
`sharedmethodcache` allows memoization similarily to `functools.cache`, but for instance methods of classes. The cache dict...Added a per-instance method cache decorator, which can be shared among multiple methods with the same signature.
`sharedmethodcache` allows memoization similarily to `functools.cache`, but for instance methods of classes. The cache dictionary is added as a member to the method's owning instance. Further, multiple methods with the same signature (up to kwargs) may use the same cache dict by specifying the same `cache_id`. This makes sense for methods that produce the same results on identical inputs, but by different computational paths.
This decorator is currently employed in pycodegen/lbmpy!113 by [abstract_equilibrium.py](https://i10git.cs.fau.de/da15siwa/lbmpy/-/blob/zero_centered_storage/lbmpy/equilibrium/abstract_equilibrium.py), but surely, more use cases will follow.
Example:
```
class Fib:
def __init__(self):
self.fib_rec_called = 0
self.fib_iter_called = 0
@sharedmethodcache("fib_cache")
def fib_rec(self, n):
self.fib_rec_called += 1
return 1 if n <= 1 else self.fib_rec(n-1) + self.fib_rec(n-2)
@sharedmethodcache("fib_cache")
def fib_iter(self, n):
self.fib_iter_called += 1
f1, f2 = 0, 1
for i in range(n):
f2 = f1 + f2
f1 = f2 - f1
return f2
>>> fib = Fib()
>>> fib.fib_rec(13)
377
>>> fib.fib_cache
{(1,): 1,
(0,): 1,
(2,): 2,
(3,): 3,
(4,): 5,
(5,): 8,
(6,): 13,
(7,): 21,
(8,): 34,
(9,): 55,
(10,): 89,
(11,): 144,
(12,): 233,
(13,): 377}
>>> fib.fib_rec_called
14
>>> fib.fib_iter(11)
144
>>> fib.fib_iter_called
0
```https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/246ship C-file2021-05-11T09:31:17+02:00Markus Holzership C-fileShipping the generated C-files to pypi is a good idea since it is less error-prone. A New Cython version might deal with the provided pyx file in a way we did not intend.
In more detail the best practice can be found here:
http://blog.b...Shipping the generated C-files to pypi is a good idea since it is less error-prone. A New Cython version might deal with the provided pyx file in a way we did not intend.
In more detail the best practice can be found here:
http://blog.behnel.de/posts/ship-generated-c-code-or-not.htmlMarkus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/221Ship c-file with pypi2021-02-22T22:59:51+01:00Markus HolzerShip c-file with pypiShip the generated C-File for boundary creation with pypiShip the generated C-File for boundary creation with pypiMarkus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/234Sizeless vectorization2021-05-21T10:11:44+02:00Michael Kuronmkuron@icp.uni-stuttgart.deSizeless vectorizationSurprisingly easy follow-up to !232 to support sizeless ARM SVE and RISC-V V. It uses some ugly hacks to sneak C functions like `svcntb()` into places that expect Python integers. Python duck-typing and SymPy made it possible. Not sure w...Surprisingly easy follow-up to !232 to support sizeless ARM SVE and RISC-V V. It uses some ugly hacks to sneak C functions like `svcntb()` into places that expect Python integers. Python duck-typing and SymPy made it possible. Not sure whether this should be merged as-is, but making it nicer would require re-writing `CBackend`. At least I couldn't think of a better way to obtain the innermost loop counter and loop stop.Jan HönigJan Hönighttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/160Skip 01_tutorial_getting_started if graphviz is not installed2020-07-09T15:29:03+02:00Stephan SeitzSkip 01_tutorial_getting_started if graphviz is not installedCell 19 of Tutorial 1 requires graphviz. Skip if not installed.
Alternative would be to execute this only if graphviz is installed.Cell 19 of Tutorial 1 requires graphviz. Skip if not installed.
Alternative would be to execute this only if graphviz is installed.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/152Skip llvm tests if llvmlite is not installed2020-06-03T09:18:40+02:00Stephan SeitzSkip llvm tests if llvmlite is not installedhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/267Small clean up2021-11-02T21:51:26+01:00Markus HolzerSmall clean upsmall clean upsmall clean upMarkus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/69Small fixes2019-10-01T15:12:52+02:00Stephan SeitzSmall fixeshttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/216some fixes for lbmpy vectorization2021-02-21T12:58:19+01:00Michael Kuronmkuron@icp.uni-stuttgart.desome fixes for lbmpy vectorizationFollow-up to !212 and new feature for https://i10git.cs.fau.de/pycodegen/lbmpy/-/merge_requests/65.Follow-up to !212 and new feature for https://i10git.cs.fau.de/pycodegen/lbmpy/-/merge_requests/65.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/52Sort headers/global definitions to enable reproducible code generation2019-09-23T11:03:53+02:00Stephan SeitzSort headers/global definitions to enable reproducible code generationheaders and global_declarations are generated by methods that return
sets. So even with the same inputs it is not guaranteed that the same
source code is generated since sets do not guarantee a specific order
when iterating over them.
I...headers and global_declarations are generated by methods that return
sets. So even with the same inputs it is not guaranteed that the same
source code is generated since sets do not guarantee a specific order
when iterating over them.
I was supprised that my generated code could often not be reused from the cache. The problem was that the included headers appeared in random order.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/86Staggered field access and staggered fields with fluxes to edges/faces2019-12-17T14:52:00+01:00Michael Kuronmkuron@icp.uni-stuttgart.deStaggered field access and staggered fields with fluxes to edges/facesThe first index dimension is always used to identify the staggered point, any further ones can be used to store vectors/tensors at these points. `f.staggered_access("N")` or `f.staggered_access(0, sp.Rational(1, 2)))` is now supported. T...The first index dimension is always used to identify the staggered point, any further ones can be used to store vectors/tensors at these points. `f.staggered_access("N")` or `f.staggered_access(0, sp.Rational(1, 2)))` is now supported. The string representation of the resulting accessor is $`f_{(0,\frac{1}{2})}`$. Furthermore, staggered fields can now have more staggered points than spatial dimensions, i.e. to store fluxes to edge/face neighbors (e.g. `f.staggered_access("NE")`.Martin BauerMartin Bauerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/72Support complex numbers2023-01-02T23:44:24+01:00Stephan SeitzSupport complex numbersOnly down side in the moment is that `complex<double>` and `complex<float>` must never be mixed in a kernel (real scalars of the other type are mostly ok due to manually implemented templates).
Should work on CPU and GPU.
Another thing...Only down side in the moment is that `complex<double>` and `complex<float>` must never be mixed in a kernel (real scalars of the other type are mostly ok due to manually implemented templates).
Should work on CPU and GPU.
Another thing that this PR changes is that also the `headers` attribute of SymPy Expression is checked to determine necessary headers.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/328Support Windows on ARM642023-06-04T18:19:41+02:00Michael Kuronmkuron@icp.uni-stuttgart.deSupport Windows on ARM64When I was working on !321, it occured to me that Windows also runs on ARM64 nowadays. So here is a patch to make pystencils run there. It only required some minor workarounds, including one for the lack of inline assembly in MSVC on ARM...When I was working on !321, it occured to me that Windows also runs on ARM64 nowadays. So here is a patch to make pystencils run there. It only required some minor workarounds, including one for the lack of inline assembly in MSVC on ARM64 (which makes cacheline clearing impossible). ARM64 implies Neon, and MSVC does not support SVE -- this make the CPU capability detection as easy as on macOS on ARM64.Markus HolzerMarkus Holzer