Fix output of warp-level kernel in reduction user guide
- Fixes a critical TypeError in the (now called)
_block_local_thread_index_per_dim
function occurring for optimized GPU reductions with dimensionality > 1 - Improves variable names and formatting in
reduction.md
user guide. Optimized example actually uses warp-level reductions now
Edited by Richard Angersbach