Fix output of warp-level kernel in reduction user guide
- Fixes a critical TypeError in the (now called)
_block_local_thread_index_per_dimfunction occurring for optimized GPU reductions with dimensionality > 1 - Improves variable names and formatting in
reduction.mduser guide. Optimized example actually uses warp-level reductions now
Edited by Richard Angersbach