pystencils merge requestshttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests2023-08-18T12:15:30+02:00https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/343Do not reorder accesses in `move_constants_before_loop` (quickly)2023-08-18T12:15:30+02:00Daniel BauerDo not reorder accesses in `move_constants_before_loop` (quickly)Reimplementation of https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/342.
While playing around with the old MR, I realized that the changes proposed there have a significant impact on the execution time of `move_constants_...Reimplementation of https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/342.
While playing around with the old MR, I realized that the changes proposed there have a significant impact on the execution time of `move_constants_before_loop` (for some kernels).
Before the MR, we would not descend into blocks, loops or conditionals to check whether dependencies are modified in their body.
The MR changed that for the sake of correctness.
However, the implementation was quite inefficient.
Note that for each assignment we must find a block to move the assignment to.
Essentially, the old MR would move up the AST, at each level determining a set of "critical symbols" by *descending* the tree from the current element again.
This means that the AST was traversed a lot, and set objects were created and updated a lot.
This MR changes this behavior.
Now, the AST is only traversed once, from the current assignment up to the block we can move the assignment to.
If we encounter blocks, loops, etc. on the way, we still descend into the block.
However, we do this only once.
Moreover, the new implementation does not create a huge set of critical symbols but instead exits early once it finds a dependency.
Overall, my not-sophisticated-at-all tests suggest that the new implementation is even slightly faster than the version from master.
The new implementation also does not change the `ast.Node` interface, which I like quite a lot.https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/340Fix symbol counters2023-07-25T11:39:39+02:00Markus HolzerFix symbol countersWhen simplifications are applied on an AssignementCollection that is created with Assignments coming from another AssignementCollection that was simplified before the counter for the symbol creation was not respectedWhen simplifications are applied on an AssignementCollection that is created with Assignments coming from another AssignementCollection that was simplified before the counter for the symbol creation was not respectedMarkus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/338JSON Serializer for pystencils config2023-07-17T14:16:40+02:00Helen SchottenhammlJSON Serializer for pystencils configThis MR adds a custom JSON serializer to allow pystencils configs to be used as parameters in databases. Useful in parameter studies when using the more modern way of setting up simulations, i.e., using pystencils' CreateKernelConfig.
...This MR adds a custom JSON serializer to allow pystencils configs to be used as parameters in databases. Useful in parameter studies when using the more modern way of setting up simulations, i.e., using pystencils' CreateKernelConfig.
Can be extended in the future for other custom classes if needed.Helen SchottenhammlHelen Schottenhammlhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/336Remove pystencils.GPU_DEVICE2023-07-13T09:58:30+02:00Michael Kuronmkuron@icp.uni-stuttgart.deRemove pystencils.GPU_DEVICE- `SerialDataHandling` now performs the device selection upon construction. It can also be constructed with an explicit device number to deviate from the default selection.
- For `ParallelDataHandling`, the assignment of devices to MPI r...- `SerialDataHandling` now performs the device selection upon construction. It can also be constructed with an explicit device number to deviate from the default selection.
- For `ParallelDataHandling`, the assignment of devices to MPI ranks _should_ be handled by Walberla by calling `cudaSetDevice()`. It has [`selectDeviceBasedOnMpiRank`](https://i10git.cs.fau.de/walberla/walberla/-/blob/master/src/gpu/DeviceSelectMPI.cpp) for this purpose. I am not sure it actually calls it -- I think it should be called from [`MPIManager::initializeMPI`](https://i10git.cs.fau.de/walberla/walberla/-/blob/master/src/core/mpi/MPIManager.cpp). Right now everything probably just ends up on the first GPU.
- The kernel wrapper now determines the correct device by inspecting the fields.
- `gpu_indexing_params` needs an explicit device number, I don't think any kind of default is reasonable.
- Some tests now iterate over all devices instead of using a default device. This is actually the right thing to do because it tests whether the device selection works correctly.
lbmpy's test_gpu_block_size_limiting.py::test_gpu_block_size_limiting fails since !335, but that is due to an error in the test, which https://i10git.cs.fau.de/pycodegen/lbmpy/-/merge_requests/146 fixes.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/337Add adjacent direcitons to stencil module2023-07-12T18:13:30+02:00Markus HolzerAdd adjacent direcitons to stencil moduleMarkus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/339Remove windows CI2023-07-12T15:37:13+02:00Markus HolzerRemove windows CIMarkus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/335Fix indexing for AMD GPUs2023-07-08T12:43:56+02:00Markus HolzerFix indexing for AMD GPUsDue to https://github.com/cupy/cupy/issues/7676 `BlockIndexing` did not work correctly on AMD GPUs. This is MR fixes it.Due to https://github.com/cupy/cupy/issues/7676 `BlockIndexing` did not work correctly on AMD GPUs. This is MR fixes it.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/333Make AMD GPU support compatible with both hipcc and hiprtc2023-06-30T21:53:24+02:00Michael Kuronmkuron@icp.uni-stuttgart.deMake AMD GPU support compatible with both hipcc and hiprtcPlease give this a test on your AMD machine, @holzer. I think it should now work everywhere with both backend=nvcc and backend=nvrtc.Please give this a test on your AMD machine, @holzer. I think it should now work everywhere with both backend=nvcc and backend=nvrtc.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/334Re-enable test_loop_cutting.py::test_staggered_iteration2023-06-30T08:55:41+02:00Michael Kuronmkuron@icp.uni-stuttgart.deRe-enable test_loop_cutting.py::test_staggered_iterationIt passes on current master, so don't xfail it.It passes on current master, so don't xfail it.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/332Add experimental half precison support2023-06-28T20:35:25+02:00Markus HolzerAdd experimental half precison supportWith this MR experimental half-precision support is addedWith this MR experimental half-precision support is addedMarkus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/331Implement Pinned GPU memory2023-06-24T08:23:36+02:00Markus HolzerImplement Pinned GPU memoryCPU arrys with an equivalent GPU array should be pinned. Further, this MR fixes non-aligned strides between CPU and GPU arrays.CPU arrys with an equivalent GPU array should be pinned. Further, this MR fixes non-aligned strides between CPU and GPU arrays.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/330Replace PyCuda with CuPy2023-06-23T08:31:06+02:00Markus HolzerReplace PyCuda with CuPyReplaces [PyCuda](https://documen.tician.de/pycuda/) with [CuPy](https://cupy.dev/)
Advantages of [CuPy](https://cupy.dev/):
- AMD support
- probably higher maintained due to NVIDIA support
- SciPy compatible.
Fixes #70
Fixes #69Replaces [PyCuda](https://documen.tician.de/pycuda/) with [CuPy](https://cupy.dev/)
Advantages of [CuPy](https://cupy.dev/):
- AMD support
- probably higher maintained due to NVIDIA support
- SciPy compatible.
Fixes #70
Fixes #69Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/321Properly detect and enable vectorization on ARM2023-06-07T09:26:30+02:00Michael Kuronmkuron@icp.uni-stuttgart.deProperly detect and enable vectorization on ARM!320 has the side effect of breaking detection of SVE vectorization support and enablement of SVE in the compiler. My patch should properly fix the underlying problem.
py-cpuinfo is supported on ARM64 and can be used to detect Neon and ...!320 has the side effect of breaking detection of SVE vectorization support and enablement of SVE in the compiler. My patch should properly fix the underlying problem.
py-cpuinfo is supported on ARM64 and can be used to detect Neon and SVE. However, there was indeed a bug here -- Neon is identified as `asimd` in /proc/cpuinfo, so we should check for `asimd` instead of `neon`.
While `-march=native` was not supported by [Clang before 15](https://github.com/llvm/llvm-project/commit/955cff803e081640e149fed0742f57ae1b84db7d), `-mcpu=native` is supported by [GCC 6+](https://gcc.gnu.org/onlinedocs/gcc-6.1.0/gcc/AArch64-Options.html) and [Clang 7+](https://github.com/llvm-mirror/clang/commit/86c991513001535af6b82bcb1f7c45ab60d2adf0). Let's use that instead of not adding a flag at all -- otherwise SVE support is not enabled in the compiler even if the hardware supports it.Helen SchottenhammlHelen Schottenhammlhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/328Support Windows on ARM642023-06-04T18:19:41+02:00Michael Kuronmkuron@icp.uni-stuttgart.deSupport Windows on ARM64When I was working on !321, it occured to me that Windows also runs on ARM64 nowadays. So here is a patch to make pystencils run there. It only required some minor workarounds, including one for the lack of inline assembly in MSVC on ARM...When I was working on !321, it occured to me that Windows also runs on ARM64 nowadays. So here is a patch to make pystencils run there. It only required some minor workarounds, including one for the lack of inline assembly in MSVC on ARM64 (which makes cacheline clearing impossible). ARM64 implies Neon, and MSVC does not support SVE -- this make the CPU capability detection as easy as on macOS on ARM64.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/329Fix deepcopying on Python 3.112023-06-04T16:14:23+02:00Michael Kuronmkuron@icp.uni-stuttgart.deFix deepcopying on Python 3.11Python 3.11 added `object.__setstate__` (https://github.com/python/cpython/commit/884eba3c76916889fd6bff3b37b8552bfb4f9566), which breaks our fix for #40, but only on Sympy version 1.11 because that version added `sympy.Basic.__setstate_...Python 3.11 added `object.__setstate__` (https://github.com/python/cpython/commit/884eba3c76916889fd6bff3b37b8552bfb4f9566), which breaks our fix for #40, but only on Sympy version 1.11 because that version added `sympy.Basic.__setstate__` (https://github.com/sympy/sympy/commit/24e1e1a2ea4e3952c577d36138f31780b7548512). The updated fix is actually more logical (because it actually does what the comment says), so hopefully it will survive future version updates better.
This fixes both the `AttributeError: 'tuple' object has no attribute 'items'` issue which we already have a fix for in !327, and the `sp.Pow` issue that came up next.
Tested with Python 3.10 and 3.11 and Sympy 1.9 and 1.11.
@holzer, you'll need to merge manually because of builds failing due to !327.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/265Fix deepcopy issue with Sympy 1.92023-06-01T22:17:18+02:00Michael Kuronmkuron@icp.uni-stuttgart.deFix deepcopy issue with Sympy 1.9Fixes https://i10git.cs.fau.de/pycodegen/pystencils/-/issues/40. Caused by https://github.com/sympy/sympy/pull/21260.Fixes https://i10git.cs.fau.de/pycodegen/pystencils/-/issues/40. Caused by https://github.com/sympy/sympy/pull/21260.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/306Improve Vectorisation2023-05-02T10:07:49+02:00Markus HolzerImprove VectorisationThis MR fixes some bugs caused in the vectorisationThis MR fixes some bugs caused in the vectorisationMarkus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/325Remove support for non-power-of-2 SVE vector widths2023-04-11T08:01:08+02:00Michael Kuronmkuron@icp.uni-stuttgart.deRemove support for non-power-of-2 SVE vector widthsARM retroactiveley removed non-power-of-2 vector widths from SVE [last year](https://documentation-service.arm.com/static/62ff4928e95b0a633aff8a6c?token=#page30). This allows us to remove some code. While we're at it, also enable CI test...ARM retroactiveley removed non-power-of-2 vector widths from SVE [last year](https://documentation-service.arm.com/static/62ff4928e95b0a633aff8a6c?token=#page30). This allows us to remove some code. While we're at it, also enable CI testing for 128-bit width (which is what the ARM Neoverse N2 and V1 have).
pystencils in principle also supports 1024 and 2048 bits, which are the other two sizes that the SVE spec allows (for a total of five sizes, not 15 like before the non-power-of-2 sizes were removed), but Linux does not really support them. Both qemu-user and the Linux kernel will cap the width at 512 bits for this reason. Setting it to something higher at runtime will break any user-space executables that is not aware -- in our case the Docker container will just hang on startup. For details, see https://github.com/torvalds/linux/commit/4ffa09a939ab6d95655b3aee6ff79de48df95be7, https://github.com/torvalds/linux/blob/v6.2/Documentation/arm64/sve.rst#9--system-runtime-configuration and https://blog.linuxplumbersconf.org/2017/ocw/system/presentations/4671/original/plumbers-dm-2017.pdf. There is no hardware available (or even announced) that will have more than 512 bits, so it doesn't actually matter right now. There will probably one day be Linux distributions available that have more than 512 bits supported by their entire userspace (at which point these widths should be added to the CI job) -- glibc [2.34](https://github.com/bminor/glibc/commit/57fb02b2cf26847380352fa06e6c711eff5faae9) is a requirement, but we already have that in our Docker image.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/323Update pystencils project links2023-04-09T12:29:52+02:00Markus HolzerUpdate pystencils project linksAs the name saysAs the name saysMarkus HolzerMarkus Holzerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/324[Fix] Absolut field access2023-04-03T15:18:43+02:00Markus Holzer[Fix] Absolut field accessIn !319 we started to make sure that the `Field.Access` keeps the property `is_absolut_access` even when sympys simplifications or substitutions kick in. However, there were a few parts missed which are added in this MRIn !319 we started to make sure that the `Field.Access` keeps the property `is_absolut_access` even when sympys simplifications or substitutions kick in. However, there were a few parts missed which are added in this MRMarkus HolzerMarkus Holzer