// (re)set boundaries = (re)initialize flag field for every block with respect to the new block structure (the size of neighbor blocks might have changed)
@@ -22,7 +22,7 @@ For interactive devolvement, the next section can be written in a <a target="_bl
First, we introduce the variables contained in the PDE and its discretization as symbols. For the two-grid algorithm, we require one source field `u` and one destination field `u_tmp`. Both are set as generic 2-dimensional fields. We explicitly set their memory layout to `fzyx`. Both waLBerla and pystencils support two kinds of memory layouts. The short `fzyx` lists the four domain dimensions (three spatial, one for values per cell) in the order of arrangement in memory. `fzyx` describes a Struct of Arrays (SOA) layout where the domain is split along `f` and then linearized. When iterating, the outermost loop runs over `f`, and the innermost loop runs over `x`. The alternative is an %Array of Structs layout (AOS) which is designated `zyxf`, iterating over `f` in the innermost loop. In our case, where we only have one value per cell, it does not matter which layout is selected. In contrast, for simulating an Advection-Diffusion-Process with multiple, independent particle distributions, `fzyx` performs better in most cases as it improves data locality and enables vectorization (SIMD, SIMT). For more information on SOA and AOS, consider <a target="_blank" href="https://software.intel.com/content/www/us/en/develop/articles/memory-layout-transformations.html">this</a> article.
\code
\code{.py}
u, u_tmp = ps.fields("u, u_tmp: [2D]", layout='fzyx')
kappa = sp.Symbol("kappa")
dx = sp.Symbol("dx")
...
...
@@ -31,7 +31,7 @@ dt = sp.Symbol("dt")
With the pystencils buildings blocks, we can directly define the time and spatial derivative of the PDE.
\code
\code{.py}
heat_pde = ps.fd.transient(u) - kappa * ( ps.fd.diff( u, 0, 0 ) + ps.fd.diff( u, 1, 1 ) )
\endcode
...
...
@@ -42,7 +42,7 @@ Printing `heat_pde` inside a Jupyter notebook shows the equation as:
Next, the PDE will be discretized. We use the `Discretization2ndOrder` class to apply finite differences discretization to the spatial components, and explicit Euler discretization for the time step.
This equation can be simplified by combining the two fractions on the right-hand side. Furthermore, we would like to pre-calculate the division outside the loop of the compute kernel. To achieve this, we will first apply the simplification functionality of sympy, and then replace the division by introducing a subexpression.
@@ -85,7 +85,7 @@ We will now use the waLBerla build system to generate a sweep from this symbolic
We create a python file called *HeatEquationKernel.py* in our application folder. This file contains the python code we have developed above. Additionally, to `sympy` and `pystencils`, we add the import directive `from pystencils_walberla import CodeGeneration, generate_sweep`. At the end of the file, we add these two lines:
\code
\code{.py}
with CodeGeneration() as ctx:
generate_sweep(ctx, 'HeatEquationKernel', ac)
\endcode
...
...
@@ -94,7 +94,7 @@ The `CodeGeneration` context and the function `generate_sweep` are provided by w
The code generation script will later be called by the build system while compiling the application. The complete script looks like this:
\code
\code{.py}
import sympy as sp
import pystencils as ps
from pystencils_walberla import CodeGeneration, generate_sweep
...
...
@@ -124,7 +124,7 @@ with CodeGeneration() as ctx:
\endcode
As a next step, we register the script with the CMake build system. Outside of our application folder, open *CMakeLists.txt* and add these lines (replace `codegen` by the name of your folder):
\code
\code{.unparsed}
if( WALBERLA_BUILD_WITH_CODEGEN )
add_subdirectory(codegen)
endif()
...
...
@@ -132,7 +132,7 @@ endif()
The `if` block makes sure our application is only built if the CMake flag `WALBERLA_BUILD_WITH_CODEGEN` is set. In the application folder, create another *CMakeLists.txt* file. For registering a code generation target, the build system provides the `walberla_generate_target_from_python` macro. Apart from the target name, we need to pass it the name of our python script and the names of the generated C++ header and source files. Their names need to match the class name passed to `generate_sweep` in the script. Add the following lines to your *CMakeLists.txt*.
\code
\code{.unparsed}
if( WALBERLA_BUILD_WITH_CODEGEN )
walberla_generate_target_from_python( NAME CodegenHeatEquationKernel
FILE HeatEquationKernel.py
...
...
@@ -148,7 +148,7 @@ When running `make` again at a later time, the code will only be regenerated if
Finally, we can use the generated sweep in an actual waLBerla application. In the application folder, create the source file *01_CodegenHeatEquation.cpp*. Open *CMakeLists.txt* and register the source file as an executable using the macro `walberla_add_executable`. Add all required waLBerla modules as dependencies, as well as the generated target.
\code
\code{.unparsed}
walberla_add_executable ( NAME 01_CodegenHeatEquation
FILES 01_CodegenHeatEquation.cpp
DEPENDS blockforest core field stencil timeloop vtk pde CodegenHeatEquationKernel )
@@ -22,7 +22,7 @@ In the code generation python script, we first require a few imports from lbmpy
From the `lbmpy.creationfunctions` we require the functions to create collision and update rules. For the actual code generation, `generate_lattice_model` from `lbmpy_walberla` is required. Since we will define symbols, `SymPy` is also needed.
\code
\code{.py}
import sympy as sp
from lbmpy.creationfunctions import create_lb_collision_rule, create_lb_update_rule
...
...
@@ -32,7 +32,7 @@ from lbmpy_walberla import generate_lattice_model
\endcode
First, we define a few general parameters. These include the stencil (D2Q9) and the memory layout (`fzyx`, see \ref tutorial_codegen01 ). We define a SymPy symbol for the relaxation rate \f$ \omega \f$. This means we can later set it to a specific value from the waLBerla code. A dictionary with optimization parameters is also set up. Here, we enable global common subexpression elimination (`cse_global`) and set the PDF field's memory layout.
\code
\code{.py}
stencil = 'D2Q9'
omega = sp.Symbol('omega')
layout = 'fzyx'
...
...
@@ -45,7 +45,7 @@ Next, we set the parameters for the SRT method in a dictionary and create both t
The update rule is still needed in the code generation process; namely for the pack info generation. The collision step only acts within one cell. Thus, the collision rule's equations contain no neighbour accesses. Calling `create_lb_update_rule` inserts the two-fields pull scheme as `generate_lattice_model`, and resulting update rule contains exactly those neighbour accesses which are required for `generate_pack_info_from_kernel` to build the optimized pack info.
Finally, we create the code generation context and call the respective functions for generating the lattice model and the pack info. Both require the context and a class name as parameters. To `generate_lattice_model`, we also pass the collision rule and the field layout; `generate_pack_info_from_kernel` receives the update rule.
@@ -68,7 +68,7 @@ Furthermore, if we optimise the waLBerla for the machine, it is compiled on with
As a final touch, we still need to set up the CMake build target for the code generation script. This time, two distinct classes (the lattice model and the pack information) will be generated. Therefore, we need to list the header and source file names for both classes separately.
\code
\code{.unparsed}
walberla_generate_target_from_python( NAME 02_LBMLatticeModelGenerationPython
This tutorial demonstrates how to use [pystencils](https://pycodegen.pages.i10git.cs.fau.de/pystencils) and [lbmpy](https://pycodegen.pages.i10git.cs.fau.de/lbmpy) to generate highly optimised and hardware-specific Lattice Boltzmann simulation code within the waLBerla framework. Other than in \ref tutorial_codegen02, we will be generating a full LBM sweep instead of a lattice model class. Furthermore, we will generate a communication pack info class and a sweep to initialise the PDF field. A hardware-specific implementation of a NoSlip boundary handler will also be generated. Those components will then be combined in a waLBerla application for simulating the same shear flow scenario as in the previous tutorial.
...
...
@@ -19,7 +19,7 @@ For the stream-pull-collide type kernel, we need two PDF fields which we set up
For VTK output and the initial velocity setup, we define a velocity vector field as an output field for the LB method.
We set up the cumulant-based MRT method with relaxation rates as described above. We use `generate_lb_update_rule` from lbmpy to derive the set of equations describing the collision operator together with the *pull* streaming pattern. These equations define the entire LBM sweep.
In \ref tutorial_codegen02, we were able to use the framework built around the waLBerla lattice model template API for setting up the shear flow's initial velocity profile. Since we are not using a lattice model class this time, this API is not available to us. With lbmpy, though, we can generate a kernel which takes in scalar values or fields for the initial density and velocity and sets the initial PDF values to the corresponding equilibrium. The function `macroscopic_values_setter` from `lbmpy.macroscopic_value_kernels` returns a set of assignments for this initialization procedure. It takes the LB method definition as an argument, as well as either symbols or pystencils field accesses for the initial density `rho` and the initial velocity. Lastly, it takes the PDF field's centre vector as the destination for the PDF values. We define a separate symbol for the density and use the velocity field defined above.
@@ -74,7 +74,7 @@ Several functions from `pystencils_walberla` and `lbmpy_walberla` are called to
- The PDF initialization kernel is generated from the `pdfs_setter` assignment collection using `generate_sweep`.
- Using `generate_boundary`, we generate an optimised implementation of a NoSlip boundary handler for the domain's walls.
\code
\code{.py}
with CodeGeneration() as ctx:
if ctx.cuda:
target = 'gpu'
...
...
@@ -104,7 +104,7 @@ We will now integrate the generated classes into a waLBerla application. After a
#include "CumulantMRTNoSlip.h"
#include "CumulantMRTPackInfo.h"
#include "CumulantMRTSweep.h"
#include "DensityAndVelocityFieldSetter.h"
#include "InitialPDFsSetter.h"
\endcode
We set up typedef aliases for the generated pack info and the D2Q9 stencil. For the PDF and velocity fields, we use instances of the field::GhostLayerField template. The number of entries of the PDF field is specified by the `Stencil_T::Size` parameter. As our domain is two-dimensional, the velocity at each lattice node is a two-dimensional vector. Thus, we set up the velocity field to have two index dimensions passing the stencil's dimension as a template parameter. Finally, we also define a typedef alias for our generated NoSlip boundary.
"Our numeric solver's symbolic representation is now complete! Next, we use pystencils to generate and compile a C implementation of our kernel. The code is generated as shown below, compiled into a shared libary and then bound to `kernel_func`. All unbound sympy symbols (`dx`, `dt` and `kappa`) as well as the fields `u` and `u_tmp` are arguments to the generated kernel function. "
"Our numeric solver's symbolic representation is now complete! Next, we use pystencils to generate and compile a C implementation of our kernel. The code is generated as shown below, compiled into a shared library and then bound to `kernel_func`. All unbound sympy symbols (`dx`, `dt` and `kappa`) as well as the fields `u` and `u_tmp` are arguments to the generated kernel function. "
]
},
{
...
...
@@ -355,7 +355,7 @@
"source": [
"### Prototype Simulation\n",
"\n",
"We can set up and run a simple simulation wich the generated kernel right here. The first step is to set up the fields and simulation parameters."
"We can set up and run a simple simulation with the generated kernel right here. The first step is to set up the fields and simulation parameters."
]
},
{
...
...
%% Cell type:code id: tags:
``` python
frompystencils.sessionimport*
importsympyassp
importnumpyasnp
importmatplotlib.pyplotasplt
sp.init_printing()
```
%% Cell type:markdown id: tags:
### Code Generation for the Heat Equation
The <atarget="_blank"href="https://en.wikipedia.org/wiki/Heat_equation">heat equation</a> which is a simple partial differential equation describing the flow of heat through a homogenous medium. We can write it as
$$
\frac{\partial u}{\partial t} =
\kappa \left(
\frac{\partial^2 u}{\partial x^2} +
\frac{\partial^2 u}{\partial y^2}
\right)
$$
where $\kappa$ is the medium's diffusion coefficient and $u(x, y, t)$ is the unknown temperature distribution at the coordinate $(x,y)$ at time $t$.
To discretize this equation using pystencils, we first need to define all the fields and other symbols involved.
%% Cell type:code id: tags:
``` python
u,u_tmp=ps.fields("u, u_tmp: [2D]",layout='fzyx')
kappa=sp.Symbol("kappa")
dx=sp.Symbol("dx")
dt=sp.Symbol("dt")
```
%% Cell type:markdown id: tags:
We define the PDE using the pystencils building blocks for transient and spatial derivatives. The definition is implicitly equalled to zero. We use `ps.fd.transient` for a first derivative by time and `ps.fd.diff` to express the second derivatives. `ps.fd.diff` takes a field and a list of spatial dimensions in which the field should be differentiated.
Next, the PDE will be discretized. We use the `Discretization2ndOrder` class to apply finite differences discretization to the spatial components, and explicit euler discretization to the transient components.
While combining the two fractions on the right as desired, it also put everything above a common denominator. If we generated the kernel from this, we'd be redundantly multiplying $u_{(0,0)}$ by $dx^2$. Let's try something else. Instead of applying `simplify` to the entire equation, we could apply it only to the second summand.
The outermost operation of `heat_pde_discretized` is a $+$, so `heat_pde_discretized` is an instance of `sp.Sum`. We take it apart by accessing its arguments, simplify the right hand summand, and put it back together again.
That looks a lot better! There is nothing left to simplify. The right-hand summand still contains a division by $dx^2$, though. Due to their inefficiency, floating-point divisions should be replaced by multiplication with their reciprocals. Before we can eliminate the division, we need to wrap the equation inside an `AssignmentCollection`. On this collection we can apply `add_subexpressions_for_divisions` to replace the division by a factor $\xi_1 = \frac{1}{dx^2}$ which in the kernel will be computed ahead of the loop.
Our numeric solver's symbolic representation is now complete! Next, we use pystencils to generate and compile a C implementation of our kernel. The code is generated as shown below, compiled into a shared libary and then bound to `kernel_func`. All unbound sympy symbols (`dx`, `dt` and `kappa`) as well as the fields `u` and `u_tmp` are arguments to the generated kernel function.
Our numeric solver's symbolic representation is now complete! Next, we use pystencils to generate and compile a C implementation of our kernel. The code is generated as shown below, compiled into a shared library and then bound to `kernel_func`. All unbound sympy symbols (`dx`, `dt` and `kappa`) as well as the fields `u` and `u_tmp` are arguments to the generated kernel function.
%% Cell type:code id: tags:
``` python
kernel_ast=ps.create_kernel(update,cpu_openmp=4)
kernel_func=kernel_ast.compile()
ps.show_code(kernel_ast)
```
%% Output
%% Cell type:markdown id: tags:
### Prototype Simulation
We can set up and run a simple simulation wich the generated kernel right here. The first step is to set up the fields and simulation parameters.
We can set up and run a simple simulation with the generated kernel right here. The first step is to set up the fields and simulation parameters.
%% Cell type:code id: tags:
``` python
domain_size=1.0
cells=25
delta_x=domain_size/cells
delta_t=0.0001
kappa_v=1.0
u=np.zeros((cells,cells))
u_tmp=np.zeros_like(u)
```
%% Cell type:markdown id: tags:
We also need the Dirichlet and Neumann Boundaries.
Not only can we run the kernel, we can even view the results as a video! This is a useful tool for debugging and testing the solver before introducing it into a larger application. We can view the simulation animated both as a colorful 2D plot or as a 3D surface plot.
The total time interval being simulated is two seconds (100 frames à 200 steps à 0.1 milliseconds).