Iteration Slices: Extended GPU support + bugfixes
This MR slightly extends the support for more general iteration slices on the CUDA platform, greatly extends the test suite for iteration slices on all targets, and fixes some bugs found along the way.
Code Generator Configuration
- Add special value
AUTO
to pystencils.config to model automatic behavior; no longer useNone
to mean "automatic" in specifying ghost layers - Add configuration option
manual_launch_grid
to disable automatic inference of the GPU launch grid size
Iteration Slices on GPU
- Have the Cuda platform raise a warning if it can't figure out a launch grid because of dependencies between dimensions
- Extend the JIT-compiled kernel object to allow manual specification of the launch grid, and enforce this if no grid size was inferred from the kernel
This now enables the iteration limits of faster coordinates to depend on the current counter value of slower coordinates; e.g. triangular iteration patterns, red-black checkerboard iteration, ... (see test cases)
Documentation for these features will be added in a follow-up MR.
Bugfixes
- Fix parsing of iteration slices that are negative integers
- Fix a bug in the loop vectorizer where the trailing loop was only executed if the SIMD-loop had run for at least one iteration
Test Suite
- Add pytest fixtures for available codegen targets, to simplify writing tests that should succeed on all hardware
- Add extensive tests for common and more uncommon iteration slices on all targets