Bugfix: Readd __launch_bounds__ for dialect 'cuda'

6 jobs for bugfix-readd-launch-bounds in 6 minutes and 25 seconds (queued for 5 minutes and 1 second)