- 24 Apr, 2019 1 commit
-
-
Martin Bauer authored
- turned on restrict keyword by default (makes large difference on GPUs) - smarter block indexing: changing block size depending on domain size Example: previously there where (1,1,1) blocks when requested block size was (64, 1, 1) and domain size (1, 512, 512), now the block size is changed automatically to (1, 64, 1) in this case - added __lauch_bounds__ to kernels to allow better optimizations from the CUDA compiler
-
- 21 Mar, 2019 1 commit
-
-
Martin Bauer authored
This restructuring allows for easier separation of modules into separate repositories later. Also, now pip install with repo url can be used. The setup.py files have also been updated to correctly reference each other. Module versions are not extracted from git state
-
- 14 Nov, 2018 2 commits
-
-
Martin Bauer authored
- was not used consistently before - symbol names are expected to be valid C identifiers - for complicated field names, the latex_name of field should be used
-
Martin Bauer authored
- small (length < 5) arrays with shape and stride information had to be memcpy'd to the GPU before every kernel call - instead of passing the information as arrays, the single elements are passed - leads to more function arguments, but simplifies GPU kernel calls -> changes in all backends required
-