Do not reorder accesses in `move_constants_before_loop` (quickly)
Reimplementation of !342 (closed).
While playing around with the old MR, I realized that the changes proposed there have a significant impact on the execution time of move_constants_before_loop
(for some kernels).
Before the MR, we would not descend into blocks, loops or conditionals to check whether dependencies are modified in their body.
The MR changed that for the sake of correctness.
However, the implementation was quite inefficient.
Note that for each assignment we must find a block to move the assignment to.
Essentially, the old MR would move up the AST, at each level determining a set of "critical symbols" by descending the tree from the current element again.
This means that the AST was traversed a lot, and set objects were created and updated a lot.
This MR changes this behavior. Now, the AST is only traversed once, from the current assignment up to the block we can move the assignment to. If we encounter blocks, loops, etc. on the way, we still descend into the block. However, we do this only once. Moreover, the new implementation does not create a huge set of critical symbols but instead exits early once it finds a dependency. Overall, my not-sophisticated-at-all tests suggest that the new implementation is even slightly faster than the version from master.
The new implementation also does not change the ast.Node
interface, which I like quite a lot.