pystencils_walberla/templates/GpuPackInfo.tmpl.h · 0cdd23d8123ba12917ce5a6c63e4a81ffc866b69 · Stephan Seitz / pystencils_walberla

Failed to fetch fork details. Try again later.

Improvements for GPU code generation · 0cdd23d8

Martin Bauer authored 5 years ago

- turned on restrict keyword by default (makes large difference on GPUs)
- smarter block indexing: changing block size depending on domain size
  Example: previously there where (1,1,1) blocks when requested
  block size was (64, 1, 1) and domain size (1, 512, 512), now the
  block size is changed automatically to (1, 64, 1) in this case
- added __lauch_bounds__ to kernels to allow better optimizations from
  the CUDA compiler

0cdd23d8

Forked from pycodegen / pystencils_walberla

Source project has a limited visibility.

Admin message