Either my laptop's GPU (Intel Iris Graphics 550) or Apple's OpenCL implementation does not support double precision. This patch checks all kernel arguments for double precision types, though I guess there is probably some easier way to just check the entire AST, but I couldn't figure out how.
get_local_id et al. return
size_t per the OpenCL specification, while CUDA's
threadIdx et al. return an
int, so there is a cast needed to silence a conversion warning.