Bugfix: Readd __launch_bounds__ for dialect 'cuda'

6 jobs for bugfix-readd-launch-bounds
in 6 minutes and 25 seconds and was queued for 5 minutes and 1 second