CUDA indexing: clip to maximum cuda block size

- previous method did not work with kernels generated for walberla where
  block size changes are made at runtime
- device query does not always work, since the compile system may have
  no GPU or not the same GPU
-> max block size is passed as parameter and only optionally determined
   by a device query
6 jobs for release/0.2.3 in 3 minutes and 47 seconds (queued for 2 seconds)
latest
Status Job ID Name Coverage
  Test
passed #310963
cuda docker
build-documentation

00:00:43

passed #310962
cuda docker
flake8-lint

00:00:24

passed #310961
docker
minimal-conda

00:01:09

passed #310960
docker
minimal-ubuntu

00:00:41

passed #310959
win
minimal-windows

00:01:22

failed #310958
AVX cuda docker
tests-and-coverage

00:03:47

 
Name Stage Failure
failed
tests-and-coverage Test
3.43s call     pystencils_tests/test_buffer_gpu.py::test_full_scalar_field
3.37s call pystencils_tests/test_buffer_gpu.py::test_subset_cell_values
3.20s call pystencils_tests/test_datahandling_parallel.py::test_kernel
3.08s call pystencils_tests/test_loop_cutting.py::test_staggered_iteration
============= 11 failed, 164 passed, 339 warnings in 81.34 seconds =============
Uploading artifacts...
coverage_report: found 68 matching files
Uploading artifacts to coordinator... ok
id=310958 responseStatus=201 Created token=PnkyFTa8
ERROR: Job failed: exit code 1