pystencils merge requestshttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests2024-01-12T12:35:19+01:00https://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/359Draft: Loop counter dependent kernels: Vector casts and smaller fixes2024-01-12T12:35:19+01:00Daniel BauerDraft: Loop counter dependent kernels: Vector casts and smaller fixesWe would like to generate code for kernels which contain spatially dependent expressions.
Thus, the expression tree contains loop counter symbols.
Unfortunately, vectorizing these kernels does not currently work.
The goal of this MR is ...We would like to generate code for kernels which contain spatially dependent expressions.
Thus, the expression tree contains loop counter symbols.
Unfortunately, vectorizing these kernels does not currently work.
The goal of this MR is to change that.
It is far, far from being finished but I would appreciate some feedback early on.
With the current state, I was able to generate one of our kernels with vectorization (AVX) enabled and the code runs on my AVX512 machine (the code contains AVX512 instructions...).
This MR includes a very basic test so that you can play around with the new feature if you want:
```
pytest pystencils_tests/test_vectorization.py::test_vectorize_loop_ctr
```
Issues with the current implementation are:
1. The vectorized loop counter uses a different vector width than the remaining code. E.g. for AVX the (f64) vector width is 4 and the loop is vectorized accordingly. However, the vectorized loop counter has 8 lanes.
2. Vector expressions can not be casted to different types.
3. Integer expressions can not be properly vectorized.
4. `create_type('int')`/`BasicType('int')` creates an `int64` but many places assume that `int` is an `int32`. This is relevant here since conversions from `int64` are not available before AVX512 (they require rounding).
The proposed change is to rework the instruction sets slightly.
Instead of "dumb" dictionaries, they get promoted to proper classes.
Every time an instruction is queried, the data type (base type and number of lanes) must be specified.
This way, we can properly handle integers and vectors with less than the maximum bitwidth which is supported by the instruction set.
The latter is e.g. necessary to work with i32x4 vectors on AVX.
Furthermore, type conversion is implemented in the instruction sets.
This obviously changes the interface and has strong implications on how the vectorization is handled throughout the code base.
Therefore, I would appreciate any feedback, whether these changes are in line with the goals of the project and the thoughts behind #46.
Any comments are welcome.Daniel BauerDaniel Bauerhttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/326RISC-V cacheline zero2023-09-12T08:00:03+02:00Michael Kuronmkuron@icp.uni-stuttgart.deRISC-V cacheline zeroThe `cbo.zero` instruction was added to RISC-V a year ago as part of the "Zicboz" extension (https://github.com/riscv/riscv-CMOs/blob/master/specifications/cmobase-v1.0.pdf). I assume it's going to be available on any forthcoming RISC-V ...The `cbo.zero` instruction was added to RISC-V a year ago as part of the "Zicboz" extension (https://github.com/riscv/riscv-CMOs/blob/master/specifications/cmobase-v1.0.pdf). I assume it's going to be available on any forthcoming RISC-V HPC processor (e.g. the [Ventana Veyron V1](https://www.ventanamicro.com/technology/risc-v-cpu-ip/)). It is supported by Clang 15+ and GCC 11+.
However, we still need to wait for the QEMU 8 release (https://github.com/qemu/qemu/commit/a939c500793ae7672defe5e3dc83220576a7b202) before we can test it in CI. The multiarch Docker images (https://github.com/multiarch/qemu-user-static) sometimes take a few months after the corresponding QEMU release. If they go straight to QEMU 8.1, we can also switch on SIMD autodetection (https://github.com/qemu/qemu/commit/4333f0924c2f2ca8efaebaed8c24f55f77d8b013).Michael Kuronmkuron@icp.uni-stuttgart.deMichael Kuronmkuron@icp.uni-stuttgart.dehttps://i10git.cs.fau.de/pycodegen/pystencils/-/merge_requests/149WIP: Hyteg2021-01-08T14:04:20+01:00Dominik Thoennesdominik.thoennes@fau.deWIP: Hytegintegrate changes made to enable code generation for hytegintegrate changes made to enable code generation for hytegDominik Thoennesdominik.thoennes@fau.deDominik Thoennesdominik.thoennes@fau.de