pystencils merge requestshttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests2024-01-15T14:20:27+01:00https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/364Refactor packaging, part I2024-01-15T14:20:27+01:00Frederik HennigRefactor packaging, part IRelates to pycodegen/pystencils#75.
Move all project info to `pyproject.toml`.
Tasks:
- [x] Static Project Info
- [x] Register and run versioneer
- [x] Register Cython extension modules
- [x] Clean up package data files & manifest
...Relates to pycodegen/pystencils#75.
Move all project info to `pyproject.toml`.
Tasks:
- [x] Static Project Info
- [x] Register and run versioneer
- [x] Register Cython extension modules
- [x] Clean up package data files & manifest
- [x] Remove `quicktest` action from `setup.py` and realize quicktests some other way -> introduced `quicktest.py`
Notes:
- The `quicktest` command in `setup.py` uses the `distutils` command interface, but `distutils` is deprecated since Python 3.10. We must find some other way to realize quicktests.Frederik HennigFrederik Hennighttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/341Refactor gpu indexing2023-09-04T16:53:09+02:00Markus HolzerRefactor gpu indexingTo map an iteration space to GPU Threads indexing classes are used. These indexing classes receive a field and iterations slice to determine the iteration space. This MR refactors the indexing classes to directly receive an iteration spa...To map an iteration space to GPU Threads indexing classes are used. These indexing classes receive a field and iterations slice to determine the iteration space. This MR refactors the indexing classes to directly receive an iteration space. With this, the indexing classes are more general and not dependent on pystencils Fields.
Further improvements/fixes:
- Line indexing works now with iteration slices. This did not work at all before
- Both indexing schemes calculate a correct block and grid size for iteration slices. This means if for example if only every second element is touched (due to a given iteration slice) the number of threads will be half. This removes modulo calculation that was needed before
- Both indexing schemes now support up to 4 dimensionsMarkus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/162Recursively convert dictionary in DotDict2020-07-10T16:56:17+02:00Stephan SeitzRecursively convert dictionary in DotDictEverybody love `DotDict` because we are lazy. This PR recursively converts all dict into `DotDict` not only top level.
So you can do:
my_super.nested.data.scheme = 42Everybody love `DotDict` because we are lazy. This PR recursively converts all dict into `DotDict` not only top level.
So you can do:
my_super.nested.data.scheme = 42https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/292Rebase of pystencils Type System2022-12-06T10:23:14+01:00Markus HolzerRebase of pystencils Type SystemFixes #20
Complex numbers are not supported anymoreFixes #20
Complex numbers are not supported anymoreRelease 1.0Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/334Re-enable test_loop_cutting.py::test_staggered_iteration2023-06-30T08:55:41+02:00Michael Kuronmkuron@icp.uni-stuttgart.deRe-enable test_loop_cutting.py::test_staggered_iterationIt passes on current master, so don't xfail it.It passes on current master, so don't xfail it.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/321Properly detect and enable vectorization on ARM2023-06-07T09:26:30+02:00Michael Kuronmkuron@icp.uni-stuttgart.deProperly detect and enable vectorization on ARM!320 has the side effect of breaking detection of SVE vectorization support and enablement of SVE in the compiler. My patch should properly fix the underlying problem.
py-cpuinfo is supported on ARM64 and can be used to detect Neon and ...!320 has the side effect of breaking detection of SVE vectorization support and enablement of SVE in the compiler. My patch should properly fix the underlying problem.
py-cpuinfo is supported on ARM64 and can be used to detect Neon and SVE. However, there was indeed a bug here -- Neon is identified as `asimd` in /proc/cpuinfo, so we should check for `asimd` instead of `neon`.
While `-march=native` was not supported by [Clang before 15](https://github.com/llvm/llvm-project/commit/955cff803e081640e149fed0742f57ae1b84db7d), `-mcpu=native` is supported by [GCC 6+](https://gcc.gnu.org/onlinedocs/gcc-6.1.0/gcc/AArch64-Options.html) and [Clang 7+](https://github.com/llvm-mirror/clang/commit/86c991513001535af6b82bcb1f7c45ab60d2adf0). Let's use that instead of not adding a flag at all -- otherwise SVE support is not enabled in the compiler even if the hardware supports it.Helen SchottenhammlHelen Schottenhammlhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/357Print small integer powers as divisions/multiplications2023-10-13T08:33:59+02:00Daniel BauerPrint small integer powers as divisions/multiplicationsFixes #72.
After a longer discussion we decided to reintroduce the special logic into the printer after it was removed in 939241f2.
The problem is that it is simply impossible to keep Muls unevaluated in cut_loops.
Neither deep_copy nor...Fixes #72.
After a longer discussion we decided to reintroduce the special logic into the printer after it was removed in 939241f2.
The problem is that it is simply impossible to keep Muls unevaluated in cut_loops.
Neither deep_copy nor func(*args) works.
A logic-free printer simply goes against how SymPy works.
We can have it, but first we must switch to a SymPy free ast datastructure.Daniel BauerDaniel Bauerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/131Print actual block contents in Conditional.{__repr__,__str__}2020-01-17T16:55:16+01:00Stephan SeitzPrint actual block contents in Conditional.{__repr__,__str__}https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/36Pre-push hook2019-08-20T16:28:40+02:00Stephan SeitzPre-push hookThis prevents me from pushing stuff that either fails in quicktest or flake8.
Has to be copied manually to `.git/hooks` and `python3` has to be adapted to your Python executable.
~~Is there an update in flake8 that `.flake8` is not...This prevents me from pushing stuff that either fails in quicktest or flake8.
Has to be copied manually to `.git/hooks` and `python3` has to be adapted to your Python executable.
~~Is there an update in flake8 that `.flake8` is not recognized automatically anymore and that we need to append C901?~~
Probably, I installed just different linter on my PC at home. flake8 can use different linters.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/28Philox tests and clean up2019-08-13T14:17:37+02:00Michael Kuronmkuron@icp.uni-stuttgart.dePhilox tests and clean upTest the Philox against reference data and clean up duplicated code in the code generation. The latter will make it easier to later add a vectorized Philox.Test the Philox against reference data and clean up duplicated code in the code generation. The latter will make it easier to later add a vectorized Philox.Martin BauerMartin Bauerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/247OpenCL RNG2021-05-26T13:44:59+02:00Michael Kuronmkuron@icp.uni-stuttgart.deOpenCL RNGUnfortunately most OpenCL implementations don't support C++Unfortunately most OpenCL implementations don't support C++Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/87OpenCL macOS support2019-12-03T13:38:45+01:00Michael Kuronmkuron@icp.uni-stuttgart.deOpenCL macOS supportEither my laptop's GPU (Intel Iris Graphics 550) or Apple's OpenCL implementation does not support double precision. This patch checks all kernel arguments for double precision types, though I guess there is probably some easier way to j...Either my laptop's GPU (Intel Iris Graphics 550) or Apple's OpenCL implementation does not support double precision. This patch checks all kernel arguments for double precision types, though I guess there is probably some easier way to just check the entire AST, but I couldn't figure out how.
Also, `get_local_id` et al. return `size_t` per the OpenCL specification, while CUDA's `threadIdx` et al. return an `int`, so there is a cast needed to silence a conversion warning.Stephan SeitzStephan Seitzhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/148Opencl fixes2020-03-12T20:21:59+01:00Stephan SeitzOpencl fixesThis resolves https://i10git.cs.fau.de/pycodegen/lbmpy/issues/9
`SerialDataHandling.swap` was not aware of OpenCL and neither a `to_cpu` method in `BoundaryHandling`.This resolves https://i10git.cs.fau.de/pycodegen/lbmpy/issues/9
`SerialDataHandling.swap` was not aware of OpenCL and neither a `to_cpu` method in `BoundaryHandling`.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/119Opencl datahandling2020-01-10T10:22:59+01:00Stephan SeitzOpencl datahandlingSince the `target_dh_refactoring` was never merged, we decided to do the refactoring at smaller steps and recover the original OpenCL datahandling PR.
Refactoring to merge `target` and `backend` can be done later.Since the `target_dh_refactoring` was never merged, we decided to do the refactoring at smaller steps and recover the original OpenCL datahandling PR.
Refactoring to merge `target` and `backend` can be done later.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/85Opencl datahandling2019-12-05T12:14:33+01:00Stephan SeitzOpencl datahandlingCloses #15
OpenCL kernels are now integrated in the normal `create_kernel` workflow. Also there exists a created a `opencljit.init_globally` function that just creates some CL queue/contex if you do not want to give it as a parameter...Closes #15
OpenCL kernels are now integrated in the normal `create_kernel` workflow. Also there exists a created a `opencljit.init_globally` function that just creates some CL queue/contex if you do not want to give it as a parameter to every kernel.
SerialDatahandling is extended to work with alternative GPU array libraries to PyCuda.
There is now some overlapping code with the `_custom_transfer_functions` but I suppose they are for certain quantities that have a separate transfer function as oppose to using a whole different backend.
@kuron can you have a look on it? I think the solution is not as elegant as I thought it would be.
pycuda.gpuarray.GPUArrays are not wrapped. So if you use `dh.gpuarrays['foo']` you get either a pycuda array or a opencl array. I thought this step would be to drastic for one PR. Using OpenCL should still be a lot easier now.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/81Oops, forgot a return in TextureCachedField.reproducible_hash2019-10-28T13:26:22+01:00Stephan SeitzOops, forgot a return in TextureCachedField.reproducible_hashhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/161No cuda required for linting pystencils2020-07-10T16:55:16+02:00Stephan SeitzNo cuda required for linting pystencilshttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/188Neon intrinsics2021-03-16T20:29:52+01:00Markus HolzerNeon intrinsicsThis MR implements neon intrinsics to enable vectorization for the ARM architecture.
This may also become useful once ARM HPC clusters actually get deployed, though these might end up using SVE instead of NEON. For that case, additional...This MR implements neon intrinsics to enable vectorization for the ARM architecture.
This may also become useful once ARM HPC clusters actually get deployed, though these might end up using SVE instead of NEON. For that case, additional work is needed because SVE's vector width is determined at runtime.Michael Kuronmkuron@icp.uni-stuttgart.deMichael Kuronmkuron@icp.uni-stuttgart.dehttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/92More staggered grid improvements2019-12-17T14:52:00+01:00Michael Kuronmkuron@icp.uni-stuttgart.deMore staggered grid improvements- Fix access to directions with mixed signs (`NW`, `(-1/2, 1/2)` and the like). They were previously mapped to the wrong cell.
- When storing fluxes on a staggered grid, the usual sign convention is that
fluxes point outward from the c...- Fix access to directions with mixed signs (`NW`, `(-1/2, 1/2)` and the like). They were previously mapped to the wrong cell.
- When storing fluxes on a staggered grid, the usual sign convention is that
fluxes point outward from the cell. Previously, we did not consistently respect that
(`staggered_access("E")` would return the same thing as `staggered_access("W")`
did when called from the eastern-next cell). Now, when a field is declared
as `STAGGERED_FLUX`, it returns an accessor with a prefactor of -1 in that
case. The previous behavior where sign is not reversed is still useful when
e.g. storing sums (e.g. mean values) instead of differenes (e.g. finite
difference fluxes) on the staggered grid.
- `staggered_vector_access` returns a vector/tensor full of `Access` objects corresponding to the shape (`index_shape[1:]`) of the staggered fieldMartin BauerMartin Bauerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/104Minor improvements to FiniteDifferenceStaggeredStencilDerivation2019-12-06T20:20:51+01:00Michael Kuronmkuron@icp.uni-stuttgart.deMinor improvements to FiniteDifferenceStaggeredStencilDerivationMartin BauerMartin Bauer