Improve freezing of additions
Improves freezing of additions in two ways:
- Resolves a performance issue.
- Transforms
x+y
tox - (-y)
only when there is a clear benefit.
Profiling our code generator revealed that freezing of additions takes a considerable amount of time.
This is because it heavily relies on sympy's is_negative
which is really slow.
(We had just fixed a similar performance issue in our own code which also relied on the assumptions system. It is just slow.)
However, it is not only slow, it is also overeager to use it in the freezer. Before this MR, the freezer would behave strangely in presence of negative symbols:
>>> import sympy as sp
>>> from sympy.abc import *
>>> import pystencils as ps
>>> ctx = ps.backend.kernelcreation.KernelCreationContext()
>>> freeze = ps.backend.kernelcreation.FreezeExpressions(ctx)
>>> freeze(x+sp.Symbol("y",negative=True))
PsSub(Symbol(PsSymbol(x, None)), PsMul(PsConstantExpr(-1: <untyped>), Symbol(PsSymbol(y, None))))
Moreover, it would freeze x - 2*y
(which in sympy's representation is Add(Symbol('x'), Mul(Integer(-1), Integer(2), Symbol('y')))
) to PsSub(x, PsMul(2, y))
.
I do not see a benefit of this form over PsAdd(x, PsMul(-2, y))
.
Therefore, this MR changes the freezer such that is_negative
is not used.
Additions are only frozen to PsSub
if the second argument is a multiplication with -1
.
In this case, the resulting code benefits from emitting PsSub
: x - y
instead of x + -1 * y
.
As mentioned before, the performance benefit is dramatic.
Freezing one particular expression tree used to take 0.4s and that is now reduced to mere milliseconds.
Removing the check for -1
, too, improves the performance even more but it is hard to measure such short times.
(While 0.4s might seem insignificant to begin with, it is not in our scenario, as we must handle a multitude of such expressions and it really adds up. Overall, the generation time dropped from 12.6s to 7.5s. Most of the remainder is due to CSE.)