Change block_size -> block_and_thread_numbers

8 jobs for cuda-autotune in 7 minutes and 36 seconds (queued for 3 seconds)
latest