pystencils merge requests

Revert previous front-end API changes + Cherry-pick fixes from master

2024-07-09T18:32:48+02:00

This MR reverts changes previously applied to the front-end API, in order to minimize divergence on the front-end side when the new backend is introduced. This MR furthermore reapplies pycodegen/pystencils!377 and pycodegen/pystencils!382 onto the development branch.

Extensions and fixes to the type system

2024-07-09T11:19:31+02:00

- Fix hashing and equality checks of `TypeAtom`s - Fix natural alignment in struct types - Allow to manually specify data types for shape and stride symbols - Add "half"/"float16" to the type parser - Add a `DynamicType` front-end class for `CastFunc` to decide the cast target during kernel translation; might find additional applications for this in the future

Fix coverage-report deploy

2024-07-08T14:09:44+02:00

Deploy of coverage reports to gitlab pages accidentally got broken in pycodegen/pystencils!398.

Update SymPy Dependency + Pipeline Cleanup

2024-07-08T13:44:59+02:00

This MR - raises the upper limit for SymPy to `1.12.1` - introduces a weekly test run against the latest sympy pre-release - cleans up the CI configuration a bit Closes pycodegen/pystencils#91

Add AST pass for counting operations

2024-07-04T21:02:14+02:00

Let me know if I should split up the analysis file.

[BUGFIX] GPU slicing

2024-07-05T14:22:33+02:00

[BUGFIX] GPU slicing

Fix: `np.issctype` was removed in NumPy 2.0. Use `issubdtype(_, np.generic)` instead.

2024-07-04T11:04:40+02:00

Fix: `np.issctype` was removed in NumPy 2.0. Use `issubdtype(_, np.generic)` instead.

Extend symbolic language support

2024-07-03T09:34:40+02:00

This MR reintroduces a number of missing features relating to the symbolic language to the new backend. - `AddressOf` - Logarithm, Hyperbolic and inverse trigonometric functions - Multi-Argument and integer min/max - Symbolic C-style integer remainder - `ConditionalFieldAccess` The documentation on the symbolic language is slightly extended, and a few bug fixes are included.

Ternary Expressions, Improved Integer Divisions, and Iteration Space Fix

2024-07-01T10:21:45+02:00

## Ternary expressions This MR introduces ast nodes, typification, constant folding, and printing of ternary `a ? b : c` expressions. On top of that, freezing support for `sympy.Piecewise` is introduced. ## Integer Division Improvements This MR introduces the `PsRem` expression node for the C `%` operator, and adds constant folding for C-style integer division and remainder operations. ## Fix: Iteration Space work items Using the above, this MR fixes the computation of the amount of work items in an interation space if start, stop and step don't align.

Add license to header files

2024-06-05T23:20:46+02:00

I recently learned a few things about open-source licensing and realized we should add a license header to the files in src/pystencils/include. Since a user might use pystencils to generate a kernel, save it as a .cpp file, and copy that into another project (waLBerla, ESPResSo, ...), they might also need to copy these header files along with it. The generated code doesn't have or need a license header (it is not an original work), but the non-generated header files should have one so their origin and usage permissions are clear. While we're at it, let's relicense these header files to BSD-3. Right now they are covered by pystencils' AGPLv3 license, but that can theoretically be a problem even for incorporating into GPLv3-licensed projects. I am the sole copyright holder almost everything in these files, with just a few small exceptions according to their Git history: - The original version of philox_rand.h (6a01f3e2d934e951a95fb16bb50e51f6f0e4845b) consisted of code that @bauer extracted from https://github.com/DEShawResearch/random123. That actually was BSD-3-licensed already and should thus have included a notice all along. - PyStencilsField.h is not used anymore and can be deleted. - half_precision.h and gpu_defines.h created and only ever edited by @holzer. So we need @holzer's permission to relicense these files.

Thread safety

2024-07-05T14:24:02+02:00

This MR introduces a simple check for thread safety in the kernel constrains check. If the assignments are not thread safe, optimizations like OpenMP or GPU should fail.

Eliminate branches: implement isl analysis and recurse into conditionals

2024-06-11T11:41:44+02:00

TODO - [x] Implement some tests - [x] Shall we make the isl analysis optional? (Previously it was so probably yes.) In combination with loop cutting/peeling I noticed a difference to the old backend. Previously loop counters got inlined, so the AST contained expressions like ```c if (n < n) { // ... } ``` which got elided by ISL. Now, the loop counter is declared as a variable, i.e. ```c int i = n; if (i < n) { // ... } ``` which can not be removed because we do not analyze declarations/assignments. I think this is OK because these conditionals do not appear inside the loop. Anyway, I wanted to mention it.

Refine printing of integer literals

2024-05-31T10:09:20+02:00

As discussed today; this MR partially reverts pycodegen/pystencils!387 to only print casts for integer literals less than 32 bit wide. 64-bit literals are instead printed with the `LL` suffix, since `long long` is the only type guaranteed to be 64 bit wide on both Unix and Windows systems ([cppreference](https://en.cppreference.com/w/c/language/arithmetic_types)).

Implement loop peeling from back

2024-05-28T10:04:12+02:00

As discussed today IRL. A copy of `peel_loop_front` mutatis mutandis.

Cast integer literals to target type

2024-05-29T21:51:08+02:00

I ran into a case where the C++ compiler wants the suffix (or a cast): `std::max(0, x)` does not compile if `x` is of 64 bit integer type. ~~On *my* machine 32 bit is the correct threshold and I think this is fairly standard. However, AFAIK it is technically platform dependent, so should we emit a cast to the target type instead?~~ Update: According to [the standard](https://en.cppreference.com/w/c/language/arithmetic_types) long long (`ll` suffix) guarantees at least 64 bit. However, we might run into similar issues with smaller types. So I updated the MR such that integer literals are always cast to the target type. Starting with C99, an integer literal is automatically assigned a 64 bit type, if the number does not fit into 32 bit. This means that we currently do not support integer literals >32 bit prior to C99. A quarter century later, I think this is OK.

Add support for PsConditional to UndefinedSymbolsCollector and PsStatement to CanonicalClone

2024-05-22T09:56:53+02:00

Add support for PsConditional to UndefinedSymbolsCollector and PsStatement to CanonicalClone

Support ARM64 Streaming SVE

2024-05-28T12:35:20+02:00

I learned the other day that the Scalable Matrix Extensions (SME) that came with ARMv9.2 also include a variation of SVE called Streaming SVE. Unlike SVE, which is executed on a vector unit, SME and Streaming SVE are executed on something more like a matrix coprocessor that has higher latency but greater throughput. While pystencils doesn't care about the matrix aspect of it, it can still use Streaming SVE. It's just another vector ISA dialect, so I couldn't resist implementing it. A processor may support SVE or SME or both, though I am not sure whether any with SME are already shipping. Nevertheless Linux, Clang, and QEMU have supported it for a year or two so we can test it already. It appears as if the new Apple M4 supports SME but not SVE so it might benefit from this PR once they put it into a Mac, though that will require a follow-up PR for CPU detection and compiler flags. Streaming SVE is enabled via a function attribute (`__attribute__((arm_locally_streaming))`). Functions with `__attribute__((arm_streaming_compatible))` can be called from both streaming and non-streaming SVE. While in streaming mode, SME's new matrix instructions which we don't need are available and some SVE instructions are unavailable, mostly exotic stuff like the histogramming or Neon interoperability, but sadly (though not unexpectedly) also the scatter/gather instructions. The changes required in pystencils were thus quite minimal.

Fundamental GPU Support

2024-07-15T09:00:38+02:00

This MR introduces the fundamentals of GPU support to the new backend ## General - Introduce `GenericGpu` platform and threads range export: GPU platforms communicate the kernel's required thread grid size to the outside via a `GpuThreadsRange` object separate from the AST - Add configuration options relating to GPUs ## CUDA Platform - Introduce CUDA platform - Add materialization + guards for full and sparse iteration spaces - Add materialization of math functions ## SYCL Platform - Introduce SYCL platform - Add materialization + guards for full and sparse iteration spaces - Add materialization of math functions ## CUDA Just-In-Time Compiler - Migrate implementation of `cupy`-based JIT to new backend as an object-oriented structure ## Deviations and Missing Features In the new implementation, block size selection is entirely up to the JIT / the runtime system and no longer affects the backend. Adaptive block sizes, register restrictions, etc. are not yet implemented by this MR.

Pragmas and OpenMP Support

2024-06-11T11:41:44+02:00

Add support for insertion of pragmas before loops, and use this to generate OpenMP directives. To Do: - [x] Test cases for pragmas - [x] Test cases for OpenMP

Remove vendored versioneer

2024-07-09T16:29:41+02:00

Remove vendored `versioneer.py` and load versioneer as a build dependency instead. Fixes problems with PEP-517 editable installs (c.f. https://github.com/python-versioneer/python-versioneer/issues/192).