Improvements for GPU code generation (f504b40f) · Commits · pycodegen / pystencils

There was an error fetching the commit references. Please try again later.

Commit f504b40f authored 6 years ago by

Martin Bauer

Improvements for GPU code generation

- turned on restrict keyword by default (makes large difference on GPUs)
- smarter block indexing: changing block size depending on domain size
  Example: previously there where (1,1,1) blocks when requested
  block size was (64, 1, 1) and domain size (1, 512, 512), now the
  block size is changed automatically to (1, 64, 1) in this case
- added __lauch_bounds__ to kernels to allow better optimizations from
  the CUDA compiler

parent d8e498fa

No related merge requests found

Hide whitespace changes

Inline Side-by-side

Showing with 75 additions and 12 deletions

Please register or to comment

Admin message

Improvements for GPU code generation