Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
No results found
Show changes
Showing
with 1970 additions and 320 deletions
%% Cell type:code id: tags:
``` python
from pystencils.session import *
```
%% Cell type:markdown id: tags:
# Demo: Assignment collections and simplification
## Assignment collections
The assignment collection class helps to formulate and simplify assignments for numerical kernels.
An ``AssignmentCollection`` is an ordered collection of assignments, together with an optional ordered collection of subexpressions, that are required to evaluate the main assignments. There are various simplification rules available that operate on ``AssignmentCollection``s.
%% Cell type:markdown id: tags:
We start by defining some stencil update rule. Here we also use the *pystencils* ``Field``, note however that the assignment collection module works purely on the *sympy* level.
%% Cell type:code id: tags:
``` python
a,b,c = sp.symbols("a b c")
f = ps.fields("f(2) : [2D]")
g = ps.fields("g(2) : [2D]")
a1 = ps.Assignment(g[0,0](1), (a**2 +b) * f[0,1] + \
(a**2 - c) * f[1,0] + \
(a**2 - 2*c) * f[-1,0] + \
(a**2) * f[0, -1])
a2 = ps.Assignment(g[0,0](0), (c**2 +b) * f[0,1] + \
(c**2 - c) * f[1,0] + \
(c**2 - 2*c) * f[-1,0] + \
(c**2 - a**2) * f[0, -1])
ac = ps.AssignmentCollection([a1, a2], subexpressions=[])
ac
```
%% Output
AssignmentCollection: g_C^0, g_C^1 <- f(f_N^0, b, f_S^0, f_E^0, a, f_W^0, c)
%% Cell type:markdown id: tags:
*sympy* operations can be applied on an assignment collection: In this example we first expand the collection, then look for common subexpressions.
%% Cell type:code id: tags:
``` python
expand_all = ps.simp.apply_to_all_assignments(sp.expand)
expandedEc = expand_all(ac)
```
%% Cell type:code id: tags:
``` python
ac_cse = ps.simp.sympy_cse(expandedEc)
ac_cse
```
%% Output
AssignmentCollection: g_C^0, g_C^1 <- f(f_N^0, b, f_S^0, f_E^0, a, f_W^0, c)
%% Cell type:markdown id: tags:
Symbols occuring in assignment collections are classified into 3 categories:
- ``free_symbols``: symbols that occur in right-hand-sides but never on left-hand-sides
- ``bound_symbols``: symbols that occur on left-hand-sides
- ``defined_symbols``: symbols that occur on left-hand-sides of a main assignment
%% Cell type:code id: tags:
``` python
ac_cse.free_symbols
```
%% Output
$\displaystyle \left\{{f}_{(1,0)}^{0}, {f}_{(0,1)}^{0}, {f}_{(0,-1)}^{0}, {f}_{(-1,0)}^{0}, a, b, c\right\}$
{f_E__0, f_N__0, f_S__0, f_W__0, a, b, c}
%% Cell type:code id: tags:
``` python
ac_cse.bound_symbols
```
%% Output
$\displaystyle \left\{{g}_{(0,0)}^{0}, {g}_{(0,0)}^{1}, \xi_{0}, \xi_{1}, \xi_{2}, \xi_{3}\right\}$
{g_C__0, g_C__1, ξ₀, ξ₁, ξ₂, ξ₃}
%% Cell type:code id: tags:
``` python
ac_cse.defined_symbols
```
%% Output
$\displaystyle \left\{{g}_{(0,0)}^{0}, {g}_{(0,0)}^{1}\right\}$
{g_C__0, g_C__1}
%% Cell type:markdown id: tags:
Assignment collections can be splitted up, and merged together. For splitting, a list of symbols that occur on the left-hand-side in the main assignments has to be passed. The returned assignment collection only contains these main assignments together with all necessary subexpressions.
%% Cell type:code id: tags:
``` python
ac_f0 = ac_cse.new_filtered([g(0)])
ac_f1 = ac_cse.new_filtered([g(1)])
ac_f1
```
%% Output
AssignmentCollection: g_C^1, <- f(f_N^0, b, f_S^0, f_E^0, a, f_W^0, c)
%% Cell type:markdown id: tags:
Note here that $\xi_4$ is no longer part of the subexpressions, since it is not used in the main assignment of $f_C^1$.
If we merge both collections together, we end up with the original collection.
%% Cell type:code id: tags:
``` python
ac_f0.new_merged(ac_f1)
```
%% Output
AssignmentCollection: g_C^0, g_C^1 <- f(f_N^0, b, f_S^0, f_E^0, a, f_W^0, c)
%% Cell type:markdown id: tags:
There is also a method that inserts all subexpressions into the main assignments. This is the inverse operation of common subexpression elimination.
%% Cell type:code id: tags:
``` python
assert sp.simplify(ac_f0.new_without_subexpressions().main_assignments[0].rhs - a2.rhs) == 0
ac_f0.new_without_subexpressions()
```
%% Output
AssignmentCollection: g_C^0, <- f(f_N^0, b, f_S^0, f_E^0, a, f_W^0, c)
%% Cell type:markdown id: tags:
To evaluate an assignment collection, use the ``lambdify`` method. It is very similar to *sympy*s ``lambdify`` function.
%% Cell type:code id: tags:
``` python
evalFct = ac_cse.lambdify([f[0,1], f[1,0]], # new parameters of returned function
fixed_symbols={a:1, b:2, c:3, f[0,-1]: 4, f[-1,0]: 5}) # fix values of other symbols
evalFct(2,1)
```
%% Output
$\displaystyle \left\{ {g}_{(0,0)}^{0} : 75, \ {g}_{(0,0)}^{1} : -17\right\}$
{g_C__0: 75, g_C__1: -17}
%% Cell type:markdown id: tags:
lambdify is rather slow for evaluation. The intended way to evaluate an assignment collection is *pystencils* i.e. create a fast kernel, that applies the update at every site of a structured grid. The collection can be directly passed to the `create_kernel` function.
%% Cell type:code id: tags:
``` python
func = ps.create_kernel(ac_cse).compile()
```
%% Cell type:markdown id: tags:
## Simplification Strategies
In above examples, we already applied simplification rules to assignment collections. Simplification rules are functions that take, as a single argument, an assignment collection and return an modified/simplified copy of it. The ``SimplificationStrategy`` class holds a list of simplification rules and can apply all of them in the specified order. Additionally it provides useful printing and reporting functions.
We start by creating a simplification strategy, consisting of the expand and CSE simplifications we have already applied above:
%% Cell type:code id: tags:
``` python
strategy = ps.simp.SimplificationStrategy()
strategy.add(ps.simp.apply_to_all_assignments(sp.expand))
strategy.add(ps.simp.sympy_cse)
```
%% Cell type:markdown id: tags:
This strategy can be applied to any assignment collection:
%% Cell type:code id: tags:
``` python
strategy(ac)
```
%% Output
AssignmentCollection: g_C^0, g_C^1 <- f(f_N^0, b, f_S^0, f_E^0, a, f_W^0, c)
%% Cell type:markdown id: tags:
The strategy can also print the simplification results at each stage.
The report contains information about the number of operations after each simplification as well as the runtime of each simplification routine.
%% Cell type:code id: tags:
``` python
strategy.create_simplification_report(ac)
```
%% Output
<pystencils.simp.simplificationstrategy.SimplificationStrategy.create_simplification_report.<locals>.Report at 0x147de3e90>
%% Cell type:markdown id: tags:
The strategy can also print the full collection after each simplification...
%% Cell type:code id: tags:
``` python
strategy.show_intermediate_results(ac)
```
%% Output
<pystencils.simp.simplificationstrategy.SimplificationStrategy.show_intermediate_results.<locals>.IntermediateResults at 0x147e09c90>
%% Cell type:markdown id: tags:
... or only specific assignments for better readability
%% Cell type:code id: tags:
``` python
strategy.show_intermediate_results(ac, symbols=[g(1)])
```
%% Output
<pystencils.simp.simplificationstrategy.SimplificationStrategy.show_intermediate_results.<locals>.IntermediateResults at 0x1265a1b90>
%% Cell type:code id: tags:
``` python
```
%% Cell type:code id: tags:
``` python
from pystencils.session import *
import timeit
%load_ext Cython
```
%% Cell type:markdown id: tags:
# Demo: Benchmark numpy, Cython, pystencils
In this notebook we compare and benchmark different ways of implementing a simple stencil kernel in Python.
Our simple example computes the average of the four neighbors in 2D and stores it in a second array. To prevent out-of-bounds accesses, we skip the cells at the border and compute values only in the range `[1:-1, 1:-1]`
%% Cell type:markdown id: tags:
## Implementations
The first implementation is a pure Python implementation:
%% Cell type:code id: tags:
``` python
def avg_pure_python(src, dst):
for x in range(1, src.shape[0] - 1):
for y in range(1, src.shape[1] - 1):
dst[x, y] = (src[x + 1, y] + src[x - 1, y] +
src[x, y + 1] + src[x, y - 1]) / 4
```
%% Cell type:markdown id: tags:
Obviously, this will be a rather slow version, since the loops are written directly in Python.
Next, we use *numpy* functions to delegate the looping to numpy. The first version uses the `roll` function to shift the array by one element in each direction. This version has to allocate a new array for each accessed neighbor.
%% Cell type:code id: tags:
``` python
def avg_numpy_roll(src, dst):
neighbors = [np.roll(src, axis=a, shift=s) for a in (0, 1) for s in (-1, 1)]
np.divide(sum(neighbors), 4, out=dst)
```
%% Cell type:markdown id: tags:
Using views, we can get rid of the additional copies:
%% Cell type:code id: tags:
``` python
def avg_numpy_slice(src, dst):
dst[1:-1, 1:-1] = src[2:, 1:-1] + src[:-2, 1:-1] + \
src[1:-1, 2:] + src[1:-1, :-2]
```
%% Cell type:markdown id: tags:
To further optimize the kernel we switch to Cython, to get a compiled C version.
%% Cell type:code id: tags:
``` python
%%cython
import cython
@cython.boundscheck(False)
@cython.wraparound(False)
def avg_cython(object[double, ndim=2] src, object[double, ndim=2] dst):
cdef int xs, ys, x, y
xs, ys = src.shape
for x in range(1, xs - 1):
for y in range(1, ys - 1):
dst[x, y] = (src[x + 1, y] + src[x - 1, y] +
src[x, y + 1] + src[x, y - 1]) / 4
```
%% Cell type:markdown id: tags:
If available, we also try the numba just-in-time compiler
%% Cell type:code id: tags:
``` python
try:
from numba import jit
@jit(nopython=True)
def avg_numba(src, dst):
dst[1:-1, 1:-1] = src[2:, 1:-1] + src[:-2, 1:-1] + \
src[1:-1, 2:] + src[1:-1, :-2]
except ImportError:
pass
```
%% Cell type:markdown id: tags:
And finally we also create a *pystencils* version of the same stencil code:
%% Cell type:code id: tags:
``` python
src, dst = ps.fields("src, dst: [2D]")
update = ps.Assignment(dst[0,0],
(src[1, 0] + src[-1, 0] + src[0, 1] + src[0, -1]) / 4)
avg_pystencils = ps.create_kernel(update).compile()
```
%% Cell type:code id: tags:
``` python
all_implementations = {
'pure Python': avg_pure_python,
'numpy roll': avg_numpy_roll,
'numpy slice': avg_numpy_slice,
'pystencils': avg_pystencils,
}
if 'avg_cython' in globals():
all_implementations['Cython'] = avg_cython
if 'avg_numba' in globals():
all_implementations['numba'] = avg_numba
```
%% Cell type:markdown id: tags:
## Benchmark functions
We implement a short function to get in- and output arrays of a given shape and to measure the runtime.
%% Cell type:code id: tags:
``` python
def get_arrays(shape):
in_arr = np.random.rand(*shape)
out_arr = np.empty_like(in_arr)
return in_arr, out_arr
def do_benchmark(func, shape):
in_arr, out_arr = get_arrays(shape)
func(src=in_arr, dst=out_arr) # warmup
timer = timeit.Timer('f(src=src, dst=dst)', globals={'f': func, 'src': in_arr, 'dst': out_arr})
calls, time_taken = timer.autorange()
return time_taken / calls
```
%% Cell type:markdown id: tags:
## Comparison
%% Cell type:code id: tags:
``` python
plot_order = ['pystencils', 'Cython', 'numba', 'numpy slice', 'numpy roll', 'pure Python']
plot_order = [p for p in plot_order if p in all_implementations]
def bar_plot(*shape):
names = plot_order
runtimes = tuple(do_benchmark(all_implementations[name], shape) for name in names)
speedups = tuple(runtime / min(runtimes) for runtime in runtimes)
y_pos = np.arange(len(names))
labels = tuple(f"{name} ({round(speedup, 1)} x)" for name, speedup in zip(names, speedups))
plt.text(0.5, 0.5, f"Size {shape}", horizontalalignment='center', fontsize=16,
verticalalignment='center', transform=plt.gca().transAxes)
plt.barh(y_pos, runtimes, log=True)
plt.yticks(y_pos, labels);
plt.xlabel('Runtime of single iteration')
plt.figure(figsize=(8, 8))
plt.subplot(3, 1, 1)
bar_plot(32, 32)
plt.subplot(3, 1, 2)
bar_plot(128, 128)
plt.subplot(3, 1, 3)
bar_plot(2048, 2048)
```
%% Output
%% Cell type:markdown id: tags:
All runtimes are plotted logarithmically. Numbers next to the labels show how much slower the version is than the fastest one.
%% Cell type:code id: tags:
``` python
from pystencils.session import *
```
%% Cell type:markdown id: tags:
# Demo: Working with derivatives
## Overview
This notebook demonstrates how to formulate continuous differential operators in *pystencils* and automatically derive finite difference stencils from them.
Instead of using the built-in derivatives in *sympy*, *pystencils* comes with its own derivative objects. They represent spatial derivatives of pystencils fields.
%% Cell type:code id: tags:
``` python
f = ps.fields("f: [2D]")
first_derivative_x = ps.fd.diff(f, 0)
first_derivative_x
```
%% Output
$\displaystyle {\partial_{0} {f}_{(0,0)}}$
D(f[0,0])
%% Cell type:markdown id: tags:
This object is the derivative of the field $f$ with respect to the first spatial coordinate $x$. To get a finite difference approximation a discretization strategy is required:
%% Cell type:code id: tags:
``` python
discretize_2nd_order = ps.fd.Discretization2ndOrder(dx=sp.symbols("h"))
discretize_2nd_order(first_derivative_x)
```
%% Output
$\displaystyle \frac{{f}_{(1,0)} - {f}_{(-1,0)}}{2 h}$
f_E - f_W
─────────
2⋅h
%% Cell type:markdown id: tags:
Strictly speaking, derivative objects act on *field accesses*, not *fields*, that why there is a $(0,0)$ index at the field:
%% Cell type:code id: tags:
``` python
first_derivative_x
```
%% Output
$\displaystyle {\partial_{0} {f}_{(0,0)}}$
D(f[0,0])
%% Cell type:markdown id: tags:
Sometimes it might be useful to specify derivatives at an offset e.g.
%% Cell type:code id: tags:
``` python
derivative_offset = ps.fd.diff(f[0, 1], 0)
derivative_offset, discretize_2nd_order(derivative_offset)
```
%% Output
$\displaystyle \left( {\partial_{0} {f}_{(0,1)}}, \ \frac{{f}_{(1,1)} - {f}_{(-1,1)}}{2 h}\right)$
⎛ f_NE - f_NW⎞
⎜D(f[0,1]), ───────────⎟
⎝ 2⋅h ⎠
%% Cell type:markdown id: tags:
Another example with second order derivatives:
%% Cell type:code id: tags:
``` python
laplacian = ps.fd.diff(f, 0, 0) + ps.fd.diff(f, 1, 1)
laplacian
```
%% Output
$\displaystyle {\partial_{0} {\partial_{0} {f}_{(0,0)}}} + {\partial_{1} {\partial_{1} {f}_{(0,0)}}}$
D(D(f[0,0])) + D(D(f[0,0]))
%% Cell type:code id: tags:
``` python
discretize_2nd_order(laplacian)
```
%% Output
$\displaystyle \frac{- 2 {f}_{(0,0)} + {f}_{(1,0)} + {f}_{(-1,0)}}{h^{2}} + \frac{- 2 {f}_{(0,0)} + {f}_{(0,1)} + {f}_{(0,-1)}}{h^{2}}$
-2⋅f_C + f_E + f_W -2⋅f_C + f_N + f_S
────────────────── + ──────────────────
2 2
h h
%% Cell type:markdown id: tags:
## Working with derivative terms
No automatic simplifications are done on derivative terms i.e. linearity relations or product rule are not applied automatically.
%% Cell type:code id: tags:
``` python
f, g = ps.fields("f, g :[2D]")
c = sp.symbols("c")
δ = ps.fd.diff
expr = δ( δ(f, 0) + δ(g, 0) + c + 5 , 0)
expr
```
%% Output
$\displaystyle {\partial_{0} (c + {\partial_{0} {f}_{(0,0)}} + {\partial_{0} {g}_{(0,0)}} + 5) }$
D(c + Diff(f_C, 0, -1) + Diff(g_C, 0, -1) + 5)
%% Cell type:markdown id: tags:
This nested term can not be discretized automatically.
%% Cell type:code id: tags:
``` python
try:
discretize_2nd_order(expr)
except ValueError as e:
print(e)
```
%% Output
Only derivatives with field or field accesses as arguments can be discretized
%% Cell type:markdown id: tags:
### Linearity
The following function expands all derivatives exploiting linearity:
%% Cell type:code id: tags:
``` python
ps.fd.expand_diff_linear(expr)
```
%% Output
$\displaystyle {\partial_{0} c} + {\partial_{0} {\partial_{0} {f}_{(0,0)}}} + {\partial_{0} {\partial_{0} {g}_{(0,0)}}}$
D(c) + D(D(f[0,0])) + D(D(g[0,0]))
%% Cell type:markdown id: tags:
The symbol $c$ that was included is interpreted as a function by default.
We can control the simplification behaviour by specifying all functions or all constants:
%% Cell type:code id: tags:
``` python
ps.fd.expand_diff_linear(expr, functions=(f[0,0], g[0, 0]))
```
%% Output
$\displaystyle {\partial_{0} {\partial_{0} {f}_{(0,0)}}} + {\partial_{0} {\partial_{0} {g}_{(0,0)}}}$
D(D(f[0,0])) + D(D(g[0,0]))
%% Cell type:code id: tags:
``` python
ps.fd.expand_diff_linear(expr, constants=[c])
```
%% Output
$\displaystyle {\partial_{0} {\partial_{0} {f}_{(0,0)}}} + {\partial_{0} {\partial_{0} {g}_{(0,0)}}}$
D(D(f[0,0])) + D(D(g[0,0]))
%% Cell type:markdown id: tags:
The expanded term can then be discretized:
%% Cell type:code id: tags:
``` python
discretize_2nd_order(ps.fd.expand_diff_linear(expr, constants=[c]))
```
%% Output
$\displaystyle \frac{- 2 {f}_{(0,0)} + {f}_{(1,0)} + {f}_{(-1,0)}}{h^{2}} + \frac{- 2 {g}_{(0,0)} + {g}_{(1,0)} + {g}_{(-1,0)}}{h^{2}}$
-2⋅f_C + f_E + f_W -2⋅g_C + g_E + g_W
────────────────── + ──────────────────
2 2
h h
%% Cell type:markdown id: tags:
### Product rule
The next cells show how to apply product rule and its reverse:
%% Cell type:code id: tags:
``` python
expr = δ(f[0, 0] * g[0, 0], 0 )
expr
```
%% Output
$\displaystyle {\partial_{0} ({f}_{(0,0)} {g}_{(0,0)}) }$
D(f_C*g_C)
%% Cell type:code id: tags:
``` python
expanded_expr = ps.fd.expand_diff_products(expr)
expanded_expr
```
%% Output
$\displaystyle {f}_{(0,0)} {\partial_{0} {g}_{(0,0)}} + {g}_{(0,0)} {\partial_{0} {f}_{(0,0)}}$
f_C⋅D(g[0,0]) + g_C⋅D(f[0,0])
%% Cell type:code id: tags:
``` python
recombined_expr = ps.fd.combine_diff_products(expanded_expr)
recombined_expr
```
%% Output
$\displaystyle {\partial_{0} ({f}_{(0,0)} {g}_{(0,0)}) }$
D(f_C*g_C)
%% Cell type:code id: tags:
``` python
assert recombined_expr == expr
```
%% Cell type:markdown id: tags:
### Evaluate derivatives
Arguments of derivative need not be to be fields, only when trying to discretize them.
The next cells show how to transform them to *sympy* derivatives and evaluate them.
%% Cell type:code id: tags:
``` python
k = sp.symbols("k")
expr = δ(k**3 + 2 * k, 0 )
expr
```
%% Output
$\displaystyle {\partial_{0} (k^{3} + 2 k) }$
D(k**3 + 2*k)
%% Cell type:code id: tags:
``` python
ps.fd.evaluate_diffs(expr, var=k)
```
%% Output
$\displaystyle 3 k^{2} + 2$
2
3⋅k + 2
This source diff could not be displayed because it is too large. You can view the blob instead.
%% Cell type:code id: tags:
``` python
import psutil
from pystencils.session import *
import shutil
```
%% Cell type:markdown id: tags:
# Demo: Finite differences - 2D wave equation
In this tutorial we show how to use the finite difference module of *pystencils* to solve a 2D wave equations. The time derivative is discretized by a simple forward Euler method.
%% Cell type:markdown id: tags:
$$ \frac{\partial^2 u}{\partial t^2} = \mbox{div} \left( q(x,y) \nabla u \right)$$
%% Cell type:markdown id: tags:
We begin by creating three *numpy* arrays for the current, the previous and the next timestep. Actually we will see later that two fields are enough, but let's keep it simple. From these *numpy* arrays we create *pystencils* fields to formulate our update rule.
%% Cell type:code id: tags:
``` python
size = (60, 70) # domain size
u_arrays = [np.zeros(size), np.zeros(size), np.zeros(size)]
u_fields = [ps.Field.create_from_numpy_array("u%s" % (name,), arr)
for name, arr in zip(["0", "1", "2"], u_arrays)]
# Nicer display for fields
for i, field in enumerate(u_fields):
field.latex_name = "u^{(%d)}" % (i,)
```
%% Cell type:markdown id: tags:
*pystencils* contains already simple rules to discretize the a diffusion term. The time discretization is done manually.
%% Cell type:code id: tags:
``` python
discretize = ps.fd.Discretization2ndOrder()
def central2nd_time_derivative(fields):
f_next, f_current, f_last = fields
return (f_next[0, 0] - 2 * f_current[0, 0] + f_last[0, 0]) / discretize.dt**2
rhs = ps.fd.diffusion(u_fields[1], 1)
wave_eq = sp.Eq(central2nd_time_derivative(u_fields), discretize(rhs))
wave_eq = sp.simplify(wave_eq)
wave_eq
```
%% Output
$\displaystyle \frac{{u^{(0)}}_{(0,0)} - 2 {u^{(1)}}_{(0,0)} + {u^{(2)}}_{(0,0)}}{dt^{2}} = \frac{- 4 {u^{(1)}}_{(0,0)} + {u^{(1)}}_{(1,0)} + {u^{(1)}}_{(0,1)} + {u^{(1)}}_{(0,-1)} + {u^{(1)}}_{(-1,0)}}{dx^{2}}$
u_0_C - 2⋅u_1_C + u_2_C -4⋅u_1_C + u_1_E + u_1_N + u_1_S + u_1_W
─────────────────────── = ────────────────────────────────────────
2 2
dt dx
%% Cell type:markdown id: tags:
The explicit Euler scheme is now obtained by solving above equation with respect to $u_C^{next}$.
%% Cell type:code id: tags:
``` python
u_next_C = u_fields[-1][0, 0]
update_rule = ps.Assignment(u_next_C, sp.solve(wave_eq, u_next_C)[0])
update_rule
```
%% Output
$\displaystyle {u^{(2)}}_{(0,0)} \leftarrow \frac{- 4 {u^{(1)}}_{(0,0)} dt^{2} + {u^{(1)}}_{(1,0)} dt^{2} + {u^{(1)}}_{(0,1)} dt^{2} + {u^{(1)}}_{(0,-1)} dt^{2} + {u^{(1)}}_{(-1,0)} dt^{2} + dx^{2} \left(- {u^{(0)}}_{(0,0)} + 2 {u^{(1)}}_{(0,0)}\right)}{dx^{2}}$
2 2 2 2 2 2
- 4⋅u_1_C⋅dt + u_1_E⋅dt + u_1_N⋅dt + u_1_S⋅dt + u_1_W⋅dt + dx ⋅(-u_0_C + 2⋅u_1_C)
u_2_C := ──────────────────────────────────────────────────────────────────────────────────────
2
dx
%% Cell type:markdown id: tags:
Before creating the kernel, we substitute numeric values for $dx$ and $dt$.
Then a kernel is created just like in the last tutorial.
%% Cell type:code id: tags:
``` python
update_rule = update_rule.subs({discretize.dx: 0.1, discretize.dt: 0.05})
ast = ps.create_kernel(update_rule)
kernel = ast.compile()
ps.show_code(ast)
```
%% Output
%% Cell type:code id: tags:
``` python
ast.get_parameters()
```
%% Output
[_data_u0, _data_u1, _data_u2]
%% Cell type:code id: tags:
``` python
ast.fields_accessed
```
%% Output
{u0: double[60,70], u1: double[60,70], u2: double[60,70]}
%% Cell type:markdown id: tags:
To run simulation a suitable initial condition and boundary treatment is required. We chose an initial condition which is zero at the borders of the domain. The outermost layer is not changed by the update kernel, so we have an implicit homogenous Dirichlet boundary condition.
%% Cell type:code id: tags:
``` python
X,Y = np.meshgrid( np.linspace(0, 1, size[1]), np.linspace(0,1, size[0]))
Z = np.sin(2*X*np.pi) * np.sin(2*Y*np.pi)
# Initialize the previous and current values with the initial function
np.copyto(u_arrays[0], Z)
np.copyto(u_arrays[1], Z)
# The values for the next timesteps do not matter, since they are overwritten
u_arrays[2][:, :] = 0
```
%% Cell type:markdown id: tags:
One timestep now consists of applying the kernel once, then shifting the arrays.
%% Cell type:code id: tags:
``` python
def run(timesteps=1):
for t in range(timesteps):
kernel(u0=u_arrays[0], u1=u_arrays[1], u2=u_arrays[2])
u_arrays[0], u_arrays[1], u_arrays[2] = u_arrays[1], u_arrays[2], u_arrays[0]
return u_arrays[2]
```
%% Cell type:markdown id: tags:
Lets create an animation of the solution:
%% Cell type:code id: tags:
``` python
if shutil.which("ffmpeg") is not None:
ani = plt.surface_plot_animation(run, zlim=(-1, 1))
ps.jupyter.display_as_html_video(ani)
else:
print("No ffmpeg installed")
```
%% Output
No ffmpeg installed
%% Cell type:code id: tags:
``` python
assert np.isfinite(np.max(u_arrays[2]))
```
%% Cell type:markdown id: tags:
__Runing on GPU__
We can also run the same kernel on the GPU, by using the ``cupy`` package.
%% Cell type:code id: tags:
``` python
try:
import cupy
except ImportError:
cupy=None
print('No cupy installed')
res = None
if cupy:
gpu_ast = ps.create_kernel(update_rule, target=ps.Target.GPU)
gpu_kernel = gpu_ast.compile()
res = ps.show_code(gpu_ast)
res
```
%% Output
%% Cell type:markdown id: tags:
The run function has to be changed now slightly, since the data has to be transfered to the GPU first, then the kernel can be executed, and in the end the data has to be transfered back
%% Cell type:code id: tags:
``` python
if cupy:
def run_on_gpu(timesteps=1):
# Transfer arrays to GPU
gpuArrs = [cupy.asarray(cpu_array) for cpu_array in u_arrays]
for t in range(timesteps):
gpu_kernel(u0=gpuArrs[0], u1=gpuArrs[1], u2=gpuArrs[2])
gpuArrs[0], gpuArrs[1], gpuArrs[2] = gpuArrs[1], gpuArrs[2], gpuArrs[0]
# Transfer arrays to CPU
for gpuArr, cpuArr in zip(gpuArrs, u_arrays):
cpuArr[:] = gpuArr.get()
assert np.isfinite(np.max(u_arrays[2]))
```
%% Cell type:code id: tags:
``` python
if cupy:
run_on_gpu(400)
```
API Reference
=============
.. toctree::
:maxdepth: 3
kernel_compile_and_call.rst
enums.rst
simplifications.rst
datahandling.rst
configuration.rst
field.rst
stencil.rst
finite_differences.rst
plot.rst
ast.rst
*********************************************
For developers: AST Nodes and Transformations
*********************************************
AST Nodes
=========
.. automodule:: pystencils.astnodes
:members:
Transformations
===============
.. automodule:: pystencils.transformations
:members:
*************
Configuration
*************
.. automodule:: pystencils.cpu.cpujit
\ No newline at end of file
************
DataHandling
************
.. autoclass:: pystencils.datahandling.DataHandling
:members:
\ No newline at end of file
************
Enumerations
************
.. automodule:: pystencils.enums
:members:
*****
Field
*****
.. automodule:: pystencils.field
:members:
\ No newline at end of file
******************
Finite Differences
******************
.. automodule:: pystencils.fd
:members:
*****************************************
Creating and calling kernels from Python
*****************************************
Creating kernels
----------------
.. autofunction:: pystencils.create_kernel
.. autoclass:: pystencils.CreateKernelConfig
:members:
.. autofunction:: pystencils.kernelcreation.create_domain_kernel
.. autofunction:: pystencils.kernelcreation.create_indexed_kernel
.. autofunction:: pystencils.kernelcreation.create_staggered_kernel
Code printing
-------------
.. autofunction:: pystencils.show_code
GPU Indexing
-------------
.. autoclass:: pystencils.gpu.AbstractIndexing
:members:
.. autoclass:: pystencils.gpu.BlockIndexing
:members:
.. autoclass:: pystencils.gpu.LineIndexing
:members:
**********************
Plotting and Animation
**********************
.. automodule:: pystencils.plot
:members:
***************************************
Assignment Collection & Simplifications
***************************************
AssignmentCollection
====================
.. autoclass:: pystencils.AssignmentCollection
:members:
SimplificationStrategy
======================
.. autoclass:: pystencils.simp.SimplificationStrategy
:members:
Simplifications
===============
.. automodule:: pystencils.simp.simplifications
:members:
Subexpression insertion
=======================
The subexpression insertions have the goal to insert subexpressions which will not reduce the number of FLOPs.
For example a constant value kept as subexpression will lead to a new variable in the code which will occupy
a register slot. On the other side a single variable could just be inserted in all assignments.
.. automodule:: pystencils.simp.subexpression_insertion
:members:
*******
Stencil
*******
.. automodule:: pystencils.stencil
:members:
Tutorials
=========
These tutorials are a good place to start if you are new to pystencils.
All tutorials and demos listed here are based on Jupyter notebooks that you can find in the pystencils repository.
It is a good idea to download them and run them directly to be able to play around with the code.
.. toctree::
:maxdepth: 1
/notebooks/01_tutorial_getting_started.ipynb
/notebooks/02_tutorial_basic_kernels.ipynb
/notebooks/03_tutorial_datahandling.ipynb
/notebooks/04_tutorial_advection_diffusion.ipynb
/notebooks/05_tutorial_phasefield_spinodal_decomposition.ipynb
/notebooks/06_tutorial_phasefield_dentritic_growth.ipynb
/notebooks/demo_assignment_collection.ipynb
/notebooks/demo_plotting_and_animation.ipynb
/notebooks/demo_derivatives.ipynb
/notebooks/demo_benchmark.ipynb
/notebooks/demo_wave_equation.ipynb
import abc
from typing import Tuple # noqa
import sympy as sp
from pystencils.astnodes import Conditional, Block
from pystencils.integer_functions import div_ceil
from pystencils.slicing import normalize_slice
from pystencils.data_types import TypedSymbol, create_type
from functools import partial
AUTO_BLOCK_SIZE_LIMITING = True
BLOCK_IDX = [TypedSymbol("blockIdx." + coord, create_type("int")) for coord in ('x', 'y', 'z')]
THREAD_IDX = [TypedSymbol("threadIdx." + coord, create_type("int")) for coord in ('x', 'y', 'z')]
BLOCK_DIM = [TypedSymbol("blockDim." + coord, create_type("int")) for coord in ('x', 'y', 'z')]
GRID_DIM = [TypedSymbol("gridDim." + coord, create_type("int")) for coord in ('x', 'y', 'z')]
class AbstractIndexing(abc.ABC):
"""
Abstract base class for all Indexing classes. An Indexing class defines how a multidimensional
field is mapped to CUDA's block and grid system. It calculates indices based on CUDA's thread and block indices
and computes the number of blocks and threads a kernel is started with. The Indexing class is created with
a pystencils field, a slice to iterate over, and further optional parameters that must have default values.
"""
@property
@abc.abstractmethod
def coordinates(self):
"""Returns a sequence of coordinate expressions for (x,y,z) depending on symbolic CUDA block and thread indices.
These symbolic indices can be obtained with the method `index_variables` """
@property
def index_variables(self):
"""Sympy symbols for CUDA's block and thread indices, and block and grid dimensions. """
return BLOCK_IDX + THREAD_IDX + BLOCK_DIM + GRID_DIM
@abc.abstractmethod
def call_parameters(self, arr_shape):
"""Determine grid and block size for kernel call.
Args:
arr_shape: the numeric (not symbolic) shape of the array
Returns:
dict with keys 'blocks' and 'threads' with tuple values for number of (x,y,z) threads and blocks
the kernel should be started with
"""
@abc.abstractmethod
def guard(self, kernel_content, arr_shape):
"""In some indexing schemes not all threads of a block execute the kernel content.
This function can return a Conditional ast node, defining this execution guard.
Args:
kernel_content: the actual kernel contents which can e.g. be put into the Conditional node as true block
arr_shape: the numeric or symbolic shape of the field
Returns:
ast node, which is put inside the kernel function
"""
# -------------------------------------------- Implementations ---------------------------------------------------------
class BlockIndexing(AbstractIndexing):
"""Generic indexing scheme that maps sub-blocks of an array to CUDA blocks.
Args:
field: pystencils field (common to all Indexing classes)
iteration_slice: slice that defines rectangular subarea which is iterated over
permute_block_size_dependent_on_layout: if True the block_size is permuted such that the fastest coordinate
gets the largest amount of threads
compile_time_block_size: compile in concrete block size, otherwise the cuda variable 'blockDim' is used
"""
def __init__(self, field, iteration_slice=None,
block_size=(16, 16, 1), permute_block_size_dependent_on_layout=True, compile_time_block_size=False):
if field.spatial_dimensions > 3:
raise NotImplementedError("This indexing scheme supports at most 3 spatial dimensions")
if permute_block_size_dependent_on_layout:
block_size = self.permute_block_size_according_to_layout(block_size, field.layout)
if AUTO_BLOCK_SIZE_LIMITING:
block_size = self.limit_block_size_to_device_maximum(block_size)
self._block_size = block_size
self._iterationSlice = normalize_slice(iteration_slice, field.spatial_shape)
self._dim = field.spatial_dimensions
self._symbolic_shape = [e if isinstance(e, sp.Basic) else None for e in field.spatial_shape]
self._compile_time_block_size = compile_time_block_size
@property
def coordinates(self):
offsets = _get_start_from_slice(self._iterationSlice)
block_size = self._block_size if self._compile_time_block_size else BLOCK_DIM
coordinates = [block_index * bs + thread_idx + off
for block_index, bs, thread_idx, off in zip(BLOCK_IDX, block_size, THREAD_IDX, offsets)]
return coordinates[:self._dim]
def call_parameters(self, arr_shape):
substitution_dict = {sym: value for sym, value in zip(self._symbolic_shape, arr_shape) if sym is not None}
widths = [end - start for start, end in zip(_get_start_from_slice(self._iterationSlice),
_get_end_from_slice(self._iterationSlice, arr_shape))]
widths = sp.Matrix(widths).subs(substitution_dict)
extend_bs = (1,) * (3 - len(self._block_size))
block_size = self._block_size + extend_bs
if not self._compile_time_block_size:
block_size = tuple(sp.Min(bs, shape) for bs, shape in zip(block_size, widths)) + extend_bs
grid = tuple(div_ceil(length, block_size)
for length, block_size in zip(widths, block_size))
extend_gr = (1,) * (3 - len(grid))
return {'block': block_size,
'grid': grid + extend_gr}
def guard(self, kernel_content, arr_shape):
arr_shape = arr_shape[:self._dim]
conditions = [c < end
for c, end in zip(self.coordinates, _get_end_from_slice(self._iterationSlice, arr_shape))]
condition = conditions[0]
for c in conditions[1:]:
condition = sp.And(condition, c)
return Block([Conditional(condition, kernel_content)])
@staticmethod
def limit_block_size_to_device_maximum(block_size):
"""Changes block size according to match device limits.
* if the total amount of threads is too big for the current device, the biggest coordinate is divided by 2.
* next, if one component is still too big, the component which is too big is divided by 2 and the smallest
component is multiplied by 2, such that the total amount of threads stays the same
Returns:
the altered block_size
"""
# Get device limits
import pycuda.driver as cuda
# noinspection PyUnresolvedReferences
import pycuda.autoinit # NOQA
da = cuda.device_attribute
device = cuda.Context.get_device()
block_size = list(block_size)
max_threads = device.get_attribute(da.MAX_THREADS_PER_BLOCK)
max_block_size = [device.get_attribute(a)
for a in (da.MAX_BLOCK_DIM_X, da.MAX_BLOCK_DIM_Y, da.MAX_BLOCK_DIM_Z)]
def prod(seq):
result = 1
for e in seq:
result *= e
return result
def get_index_of_too_big_element():
for i, bs in enumerate(block_size):
if bs > max_block_size[i]:
return i
return None
def get_index_of_too_small_element():
for i, bs in enumerate(block_size):
if bs // 2 <= max_block_size[i]:
return i
return None
# Reduce the total number of threads if necessary
while prod(block_size) > max_threads:
item_to_reduce = block_size.index(max(block_size))
for j, block_size_entry in enumerate(block_size):
if block_size_entry > max_block_size[j]:
item_to_reduce = j
block_size[item_to_reduce] //= 2
# Cap individual elements
too_big_element_index = get_index_of_too_big_element()
while too_big_element_index is not None:
too_small_element_index = get_index_of_too_small_element()
block_size[too_small_element_index] *= 2
block_size[too_big_element_index] //= 2
too_big_element_index = get_index_of_too_big_element()
return tuple(block_size)
@staticmethod
def limit_block_size_by_register_restriction(block_size, required_registers_per_thread, device=None):
"""Shrinks the block_size if there are too many registers used per multiprocessor.
This is not done automatically, since the required_registers_per_thread are not known before compilation.
They can be obtained by ``func.num_regs`` from a pycuda function.
:returns smaller block_size if too many registers are used.
"""
import pycuda.driver as cuda
# noinspection PyUnresolvedReferences
import pycuda.autoinit # NOQA
da = cuda.device_attribute
if device is None:
device = cuda.Context.get_device()
available_registers_per_mp = device.get_attribute(da.MAX_REGISTERS_PER_MULTIPROCESSOR)
block = block_size
while True:
num_threads = 1
for t in block:
num_threads *= t
required_registers_per_mt = num_threads * required_registers_per_thread
if required_registers_per_mt <= available_registers_per_mp:
return block
else:
largest_grid_entry_idx = max(range(len(block)), key=lambda e: block[e])
assert block[largest_grid_entry_idx] >= 2
block[largest_grid_entry_idx] //= 2
@staticmethod
def permute_block_size_according_to_layout(block_size, layout):
"""Returns modified block_size such that the fastest coordinate gets the biggest block dimension"""
sorted_block_size = list(sorted(block_size, reverse=True))
while len(sorted_block_size) > len(layout):
sorted_block_size[0] *= sorted_block_size[-1]
sorted_block_size = sorted_block_size[:-1]
result = list(block_size)
for l, bs in zip(reversed(layout), sorted_block_size):
result[l] = bs
return tuple(result[:len(layout)])
class LineIndexing(AbstractIndexing):
"""
Indexing scheme that assigns the innermost 'line' i.e. the elements which are adjacent in memory to a 1D CUDA block.
The fastest coordinate is indexed with thread_idx.x, the remaining coordinates are mapped to block_idx.{x,y,z}
This indexing scheme supports up to 4 spatial dimensions, where the innermost dimensions is not larger than the
maximum amount of threads allowed in a CUDA block (which depends on device).
"""
def __init__(self, field, iteration_slice=None):
available_indices = [THREAD_IDX[0]] + BLOCK_IDX
if field.spatial_dimensions > 4:
raise NotImplementedError("This indexing scheme supports at most 4 spatial dimensions")
coordinates = available_indices[:field.spatial_dimensions]
fastest_coordinate = field.layout[-1]
coordinates[0], coordinates[fastest_coordinate] = coordinates[fastest_coordinate], coordinates[0]
self._coordinates = coordinates
self._iterationSlice = normalize_slice(iteration_slice, field.spatial_shape)
self._symbolicShape = [e if isinstance(e, sp.Basic) else None for e in field.spatial_shape]
@property
def coordinates(self):
return [i + offset for i, offset in zip(self._coordinates, _get_start_from_slice(self._iterationSlice))]
def call_parameters(self, arr_shape):
substitution_dict = {sym: value for sym, value in zip(self._symbolicShape, arr_shape) if sym is not None}
widths = [end - start for start, end in zip(_get_start_from_slice(self._iterationSlice),
_get_end_from_slice(self._iterationSlice, arr_shape))]
widths = sp.Matrix(widths).subs(substitution_dict)
def get_shape_of_cuda_idx(cuda_idx):
if cuda_idx not in self._coordinates:
return 1
else:
idx = self._coordinates.index(cuda_idx)
return widths[idx]
return {'block': tuple([get_shape_of_cuda_idx(idx) for idx in THREAD_IDX]),
'grid': tuple([get_shape_of_cuda_idx(idx) for idx in BLOCK_IDX])}
def guard(self, kernel_content, arr_shape):
return kernel_content
# -------------------------------------- Helper functions --------------------------------------------------------------
def _get_start_from_slice(iteration_slice):
res = []
for slice_component in iteration_slice:
if type(slice_component) is slice:
res.append(slice_component.start if slice_component.start is not None else 0)
else:
assert isinstance(slice_component, int)
res.append(slice_component)
return res
def _get_end_from_slice(iteration_slice, arr_shape):
iter_slice = normalize_slice(iteration_slice, arr_shape)
res = []
for slice_component in iter_slice:
if type(slice_component) is slice:
res.append(slice_component.stop)
else:
assert isinstance(slice_component, int)
res.append(slice_component + 1)
return res
def indexing_creator_from_params(gpu_indexing, gpu_indexing_params):
if isinstance(gpu_indexing, str):
if gpu_indexing == 'block':
indexing_creator = BlockIndexing
elif gpu_indexing == 'line':
indexing_creator = LineIndexing
else:
raise ValueError("Unknown GPU indexing %s. Valid values are 'block' and 'line'" % (gpu_indexing,))
if gpu_indexing_params:
indexing_creator = partial(indexing_creator, **gpu_indexing_params)
return indexing_creator
else:
return gpu_indexing
from .kerncraft_interface import PyStencilsKerncraftKernel, KerncraftParameters
__all__ = ['PyStencilsKerncraftKernel', 'KerncraftParameters']