Skip to content

Improve freezing of additions

Daniel Bauer requested to merge hyteg/pystencils:bauerd/freeze-add into v2.0-dev

Improves freezing of additions in two ways:

  • Resolves a performance issue.
  • Transforms x+y to x - (-y) only when there is a clear benefit.

Profiling our code generator revealed that freezing of additions takes a considerable amount of time. This is because it heavily relies on sympy's is_negative which is really slow. (We had just fixed a similar performance issue in our own code which also relied on the assumptions system. It is just slow.)

However, it is not only slow, it is also overeager to use it in the freezer. Before this MR, the freezer would behave strangely in presence of negative symbols:

>>> import sympy as sp
>>> from sympy.abc import *
>>> import pystencils as ps
>>> ctx = ps.backend.kernelcreation.KernelCreationContext()
>>> freeze = ps.backend.kernelcreation.FreezeExpressions(ctx)
>>> freeze(x+sp.Symbol("y",negative=True))
PsSub(Symbol(PsSymbol(x, None)), PsMul(PsConstantExpr(-1: <untyped>), Symbol(PsSymbol(y, None))))

Moreover, it would freeze x - 2*y (which in sympy's representation is Add(Symbol('x'), Mul(Integer(-1), Integer(2), Symbol('y')))) to PsSub(x, PsMul(2, y)). I do not see a benefit of this form over PsAdd(x, PsMul(-2, y)).

Therefore, this MR changes the freezer such that is_negative is not used. Additions are only frozen to PsSub if the second argument is a multiplication with -1. In this case, the resulting code benefits from emitting PsSub: x - y instead of x + -1 * y.

As mentioned before, the performance benefit is dramatic. Freezing one particular expression tree used to take 0.4s and that is now reduced to mere milliseconds. Removing the check for -1, too, improves the performance even more but it is hard to measure such short times. (While 0.4s might seem insignificant to begin with, it is not in our scenario, as we must handle a multitude of such expressions and it really adds up. Overall, the generation time dropped from 12.6s to 7.5s. Most of the remainder is due to CSE.)

Merge request reports