pystencils merge requestshttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests2023-06-24T08:23:36+02:00https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/331Implement Pinned GPU memory2023-06-24T08:23:36+02:00Markus HolzerImplement Pinned GPU memoryCPU arrys with an equivalent GPU array should be pinned. Further, this MR fixes non-aligned strides between CPU and GPU arrays.CPU arrys with an equivalent GPU array should be pinned. Further, this MR fixes non-aligned strides between CPU and GPU arrays.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/330Replace PyCuda with CuPy2023-06-23T08:31:06+02:00Markus HolzerReplace PyCuda with CuPyReplaces [PyCuda](https://documen.tician.de/pycuda/) with [CuPy](https://cupy.dev/)
Advantages of [CuPy](https://cupy.dev/):
- AMD support
- probably higher maintained due to NVIDIA support
- SciPy compatible.
Fixes #70
Fixes #69Replaces [PyCuda](https://documen.tician.de/pycuda/) with [CuPy](https://cupy.dev/)
Advantages of [CuPy](https://cupy.dev/):
- AMD support
- probably higher maintained due to NVIDIA support
- SciPy compatible.
Fixes #70
Fixes #69Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/329Fix deepcopying on Python 3.112023-06-04T16:14:23+02:00Michael Kuronmkuron@icp.uni-stuttgart.deFix deepcopying on Python 3.11Python 3.11 added `object.__setstate__` (https://github.com/python/cpython/commit/884eba3c76916889fd6bff3b37b8552bfb4f9566), which breaks our fix for #40, but only on Sympy version 1.11 because that version added `sympy.Basic.__setstate_...Python 3.11 added `object.__setstate__` (https://github.com/python/cpython/commit/884eba3c76916889fd6bff3b37b8552bfb4f9566), which breaks our fix for #40, but only on Sympy version 1.11 because that version added `sympy.Basic.__setstate__` (https://github.com/sympy/sympy/commit/24e1e1a2ea4e3952c577d36138f31780b7548512). The updated fix is actually more logical (because it actually does what the comment says), so hopefully it will survive future version updates better.
This fixes both the `AttributeError: 'tuple' object has no attribute 'items'` issue which we already have a fix for in !327, and the `sp.Pow` issue that came up next.
Tested with Python 3.10 and 3.11 and Sympy 1.9 and 1.11.
@holzer, you'll need to merge manually because of builds failing due to !327.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/328Support Windows on ARM642023-06-04T18:19:41+02:00Michael Kuronmkuron@icp.uni-stuttgart.deSupport Windows on ARM64When I was working on !321, it occured to me that Windows also runs on ARM64 nowadays. So here is a patch to make pystencils run there. It only required some minor workarounds, including one for the lack of inline assembly in MSVC on ARM...When I was working on !321, it occured to me that Windows also runs on ARM64 nowadays. So here is a patch to make pystencils run there. It only required some minor workarounds, including one for the lack of inline assembly in MSVC on ARM64 (which makes cacheline clearing impossible). ARM64 implies Neon, and MSVC does not support SVE -- this make the CPU capability detection as easy as on macOS on ARM64.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/327[Fix] Update for Docker Images2023-06-04T16:14:23+02:00Markus Holzer[Fix] Update for Docker ImagesDue to an update of the docker images minor changes are required for the CIDue to an update of the docker images minor changes are required for the CIMarkus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/326RISC-V cacheline zero2023-09-12T08:00:03+02:00Michael Kuronmkuron@icp.uni-stuttgart.deRISC-V cacheline zeroThe `cbo.zero` instruction was added to RISC-V a year ago as part of the "Zicboz" extension (https://github.com/riscv/riscv-CMOs/blob/master/specifications/cmobase-v1.0.pdf). I assume it's going to be available on any forthcoming RISC-V ...The `cbo.zero` instruction was added to RISC-V a year ago as part of the "Zicboz" extension (https://github.com/riscv/riscv-CMOs/blob/master/specifications/cmobase-v1.0.pdf). I assume it's going to be available on any forthcoming RISC-V HPC processor (e.g. the [Ventana Veyron V1](https://www.ventanamicro.com/technology/risc-v-cpu-ip/)). It is supported by Clang 15+ and GCC 11+.
However, we still need to wait for the QEMU 8 release (https://github.com/qemu/qemu/commit/a939c500793ae7672defe5e3dc83220576a7b202) before we can test it in CI. The multiarch Docker images (https://github.com/multiarch/qemu-user-static) sometimes take a few months after the corresponding QEMU release. If they go straight to QEMU 8.1, we can also switch on SIMD autodetection (https://github.com/qemu/qemu/commit/4333f0924c2f2ca8efaebaed8c24f55f77d8b013).Michael Kuronmkuron@icp.uni-stuttgart.deMichael Kuronmkuron@icp.uni-stuttgart.dehttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/325Remove support for non-power-of-2 SVE vector widths2023-04-11T08:01:08+02:00Michael Kuronmkuron@icp.uni-stuttgart.deRemove support for non-power-of-2 SVE vector widthsARM retroactiveley removed non-power-of-2 vector widths from SVE [last year](https://documentation-service.arm.com/static/62ff4928e95b0a633aff8a6c?token=#page30). This allows us to remove some code. While we're at it, also enable CI test...ARM retroactiveley removed non-power-of-2 vector widths from SVE [last year](https://documentation-service.arm.com/static/62ff4928e95b0a633aff8a6c?token=#page30). This allows us to remove some code. While we're at it, also enable CI testing for 128-bit width (which is what the ARM Neoverse N2 and V1 have).
pystencils in principle also supports 1024 and 2048 bits, which are the other two sizes that the SVE spec allows (for a total of five sizes, not 15 like before the non-power-of-2 sizes were removed), but Linux does not really support them. Both qemu-user and the Linux kernel will cap the width at 512 bits for this reason. Setting it to something higher at runtime will break any user-space executables that is not aware -- in our case the Docker container will just hang on startup. For details, see https://github.com/torvalds/linux/commit/4ffa09a939ab6d95655b3aee6ff79de48df95be7, https://github.com/torvalds/linux/blob/v6.2/Documentation/arm64/sve.rst#9--system-runtime-configuration and https://blog.linuxplumbersconf.org/2017/ocw/system/presentations/4671/original/plumbers-dm-2017.pdf. There is no hardware available (or even announced) that will have more than 512 bits, so it doesn't actually matter right now. There will probably one day be Linux distributions available that have more than 512 bits supported by their entire userspace (at which point these widths should be added to the CI job) -- glibc [2.34](https://github.com/bminor/glibc/commit/57fb02b2cf26847380352fa06e6c711eff5faae9) is a requirement, but we already have that in our Docker image.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/324[Fix] Absolut field access2023-04-03T15:18:43+02:00Markus Holzer[Fix] Absolut field accessIn !319 we started to make sure that the `Field.Access` keeps the property `is_absolut_access` even when sympys simplifications or substitutions kick in. However, there were a few parts missed which are added in this MRIn !319 we started to make sure that the `Field.Access` keeps the property `is_absolut_access` even when sympys simplifications or substitutions kick in. However, there were a few parts missed which are added in this MRMarkus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/323Update pystencils project links2023-04-09T12:29:52+02:00Markus HolzerUpdate pystencils project linksAs the name saysAs the name saysMarkus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/322[Fix] Matplotlib arrow rendering2023-03-28T09:23:11+02:00Markus Holzer[Fix] Matplotlib arrow renderingStarting from `matplotlib = 3.5`, pystencils' definition for 3D arrows is deprecated. Therefore, 3D stencils cannot be rendered anymore.
The exact issue is stated, e.g., here : https://github.com/matplotlib/matplotlib/issues/21688
F...Starting from `matplotlib = 3.5`, pystencils' definition for 3D arrows is deprecated. Therefore, 3D stencils cannot be rendered anymore.
The exact issue is stated, e.g., here : https://github.com/matplotlib/matplotlib/issues/21688
Fixes #67Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/321Properly detect and enable vectorization on ARM2023-06-07T09:26:30+02:00Michael Kuronmkuron@icp.uni-stuttgart.deProperly detect and enable vectorization on ARM!320 has the side effect of breaking detection of SVE vectorization support and enablement of SVE in the compiler. My patch should properly fix the underlying problem.
py-cpuinfo is supported on ARM64 and can be used to detect Neon and ...!320 has the side effect of breaking detection of SVE vectorization support and enablement of SVE in the compiler. My patch should properly fix the underlying problem.
py-cpuinfo is supported on ARM64 and can be used to detect Neon and SVE. However, there was indeed a bug here -- Neon is identified as `asimd` in /proc/cpuinfo, so we should check for `asimd` instead of `neon`.
While `-march=native` was not supported by [Clang before 15](https://github.com/llvm/llvm-project/commit/955cff803e081640e149fed0742f57ae1b84db7d), `-mcpu=native` is supported by [GCC 6+](https://gcc.gnu.org/onlinedocs/gcc-6.1.0/gcc/AArch64-Options.html) and [Clang 7+](https://github.com/llvm-mirror/clang/commit/86c991513001535af6b82bcb1f7c45ab60d2adf0). Let's use that instead of not adding a flag at all -- otherwise SVE support is not enabled in the compiler even if the hardware supports it.Helen SchottenhammlHelen Schottenhammlhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/320ARM for linux2023-03-27T20:07:49+02:00Helen SchottenhammlARM for linuxUntil now, ARM architectures are only allowed for Darwin systems. This MR extends their usage to Linux systems.Until now, ARM architectures are only allowed for Darwin systems. This MR extends their usage to Linux systems.Helen SchottenhammlHelen Schottenhammlhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/319Resolve "Absolute access is probably not copied correctly after _eval_subs()"2023-03-31T17:05:29+02:00Nils KohlResolve "Absolute access is probably not copied correctly after _eval_subs()"Closes #66Closes #66Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/318[Fix] GPU Buffer with iteration slices2023-03-31T09:18:50+02:00Markus Holzer[Fix] GPU Buffer with iteration slicesGPU buffers did not work in combination with iteration slices before. This is resolved hereGPU buffers did not work in combination with iteration slices before. This is resolved hereMarkus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/317[FIX] Iteration slices with GPU kernels2023-03-17T11:26:57+01:00Markus Holzer[FIX] Iteration slices with GPU kernelsFixes #58Fixes #58Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/316Draft: feat: implement `__cuda_array_interface__`2023-09-14T10:43:31+02:00Stephan SeitzDraft: feat: implement `__cuda_array_interface__`https://numba.readthedocs.io/en/stable/cuda/cuda_array_interface.html
This is supported by:
- pycuda
- numba
- cupy
- torch
- nvcv https://github.com/CvCuda/CV-CUDA
- maybe by tensorflow in future: https://github.com/tensorflow/tensorfl...https://numba.readthedocs.io/en/stable/cuda/cuda_array_interface.html
This is supported by:
- pycuda
- numba
- cupy
- torch
- nvcv https://github.com/CvCuda/CV-CUDA
- maybe by tensorflow in future: https://github.com/tensorflow/tensorflow/issues/29039
Also allow to execute with cupy (https://docs.cupy.dev/en/stable/index.html)
instead of pycuda
TODO:
- [ ] check that pointers in correct CUDA context and if not import into
current
- [x] make execution with pycuda aware of `__cuda_array_interface__`
- [ ] what/how to testhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/315Fix execution for Python 3.112023-03-27T10:39:01+02:00Stephan SeitzFix execution for Python 3.11Fixes execution on Python 3.11
Prevents the following error:
```
ImportError while loading conftest '/home/stephan/projects/pystencils/conftest.py'.
conftest.py:14: in <module>
from pystencils.cpu import cpujit
pystencils/__init__.p...Fixes execution on Python 3.11
Prevents the following error:
```
ImportError while loading conftest '/home/stephan/projects/pystencils/conftest.py'.
conftest.py:14: in <module>
from pystencils.cpu import cpujit
pystencils/__init__.py:10: in <module>
from .config import CreateKernelConfig
pystencils/config.py:19: in <module>
@dataclass
/usr/lib/python3.11/dataclasses.py:1220: in dataclass
return wrap(cls)
/usr/lib/python3.11/dataclasses.py:1210: in wrap
return _process_class(cls, init, repr, eq, order, unsafe_hash,
/usr/lib/python3.11/dataclasses.py:958: in _process_class
cls_fields.append(_get_field(cls, name, type, kw_only))
/usr/lib/python3.11/dataclasses.py:815: in _get_field
raise ValueError(f'mutable default {type(f.default)} for field '
E ValueError: mutable default <class 'mappingproxy'> for field gpu_indexing_params is not allowed: use default_factory
```
I just did as I was told by Python :shrug:https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/314Gpu bufferfield fix2023-03-11T19:06:31+01:00Philipp SuffaGpu bufferfield fixSome small changes in the calculation of the field sizes to allow only buffered fields as well as only absolute access fields.
This is needed to allow AA-pattern and communication hiding for sparse kernels (ListLBM)Some small changes in the calculation of the field sizes to allow only buffered fields as well as only absolute access fields.
This is needed to allow AA-pattern and communication hiding for sparse kernels (ListLBM)Philipp SuffaPhilipp Suffahttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/313Add cache clearing function2023-02-23T17:20:18+01:00Markus HolzerAdd cache clearing functionFor developing purposes it is useful to have a cache-clearing functionFor developing purposes it is useful to have a cache-clearing functionMarkus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/312Use common shape to resolve buffer access2022-12-22T09:41:41+01:00Markus HolzerUse common shape to resolve buffer accessPystencils assume that all fields have the same spatial shape. Thus the field access should also be resolved by one common field shape. This was violated in the GPU kernel creation and should be fixed with this MRPystencils assume that all fields have the same spatial shape. Thus the field access should also be resolved by one common field shape. This was violated in the GPU kernel creation and should be fixed with this MRMarkus HolzerMarkus Holzer