1. 26 Apr, 2019 1 commit
  2. 24 Apr, 2019 1 commit
    • Martin Bauer's avatar
      Improvements for GPU code generation · f504b40f
      Martin Bauer authored
      - turned on restrict keyword by default (makes large difference on GPUs)
      - smarter block indexing: changing block size depending on domain size
        Example: previously there where (1,1,1) blocks when requested
        block size was (64, 1, 1) and domain size (1, 512, 512), now the
        block size is changed automatically to (1, 64, 1) in this case
      - added __lauch_bounds__ to kernels to allow better optimizations from
        the CUDA compiler
      f504b40f
  3. 21 Mar, 2019 1 commit
    • Martin Bauer's avatar
      Separated modules into subfolders with own setup.py · 1e02cdc7
      Martin Bauer authored
      This restructuring allows for easier separation of modules into
      separate repositories later. Also, now pip install with repo url can be
      used.
      
      The setup.py files have also been updated to correctly reference each
      other. Module versions are not extracted from git state
      1e02cdc7
  4. 19 Oct, 2018 1 commit
  5. 07 Jun, 2018 1 commit
  6. 05 Jun, 2018 1 commit
  7. 13 May, 2018 1 commit
    • Martin Bauer's avatar
      Improved Vectorization · 501b2d7e
      Martin Bauer authored
      - support aligned load/stores
      - nontemporal stores
      - aligned memory allocation for arrays and temporary buffers
      501b2d7e
  8. 11 May, 2018 1 commit
    • Martin Bauer's avatar
      Generalized vectorization · 57a3c27e
      Martin Bauer authored
      - vectorization for loops with ranges that are not a multiple of vector width
      - vectorization for variable sized loops if special transformation
        replace_inner_stride_with_one is run
      57a3c27e