Skip to content

Iteration Slices: Extended GPU support + bugfixes

Frederik Hennig requested to merge fhennig/gpu-iteration-spaces into v2.0-dev

This MR slightly extends the support for more general iteration slices on the CUDA platform, greatly extends the test suite for iteration slices on all targets, and fixes some bugs found along the way.

Code Generator Configuration

  • Add special value AUTO to pystencils.config to model automatic behavior; no longer use None to mean "automatic" in specifying ghost layers
  • Add configuration option manual_launch_grid to disable automatic inference of the GPU launch grid size

Iteration Slices on GPU

  • Have the Cuda platform raise a warning if it can't figure out a launch grid because of dependencies between dimensions
  • Extend the JIT-compiled kernel object to allow manual specification of the launch grid, and enforce this if no grid size was inferred from the kernel

This now enables the iteration limits of faster coordinates to depend on the current counter value of slower coordinates; e.g. triangular iteration patterns, red-black checkerboard iteration, ... (see test cases)

Documentation for these features will be added in a follow-up MR.

Bugfixes

  • Fix parsing of iteration slices that are negative integers
  • Fix a bug in the loop vectorizer where the trailing loop was only executed if the SIMD-loop had run for at least one iteration

Test Suite

  • Add pytest fixtures for available codegen targets, to simplify writing tests that should succeed on all hardware
  • Add extensive tests for common and more uncommon iteration slices on all targets

Merge request reports