Merge remote-tracking branch 'refs/remotes/origin/lbmpy_codegen' into lbmpy_codegen

0dc3bba1 · Markus Holzer · 250bedb8 · fb69febf · 0dc3bba1 · 0dc3bba1
Commit 0dc3bba1 authored 4 years ago by Markus Holzer
--- a/apps/tutorials/codegen/03_AdvancedLBMCodegen.cpp
+++ b/apps/tutorials/codegen/03_AdvancedLBMCodegen.cpp
@@ -131,7 +131,7 @@ int main(int argc, char** argv)
   const real_t omega     = parameters.getParameter< real_t >("omega", real_c(1.8));
   const double remainingTimeLoggerFrequency =
      parameters.getParameter< double >("remainingTimeLoggerFrequency", 3.0); // in seconds
-   const int32_t VTKwriteFrequency = parameters.getParameter< int32_t >("VTKwriteFrequency", 1000);
+   const uint_t VTKwriteFrequency = parameters.getParameter< uint_t >("VTKwriteFrequency", 1000);

   ////////////////////////////////////
   /// PDF Field and Velocity Setup ///

--- a/apps/tutorials/codegen/03_AdvancedLBMCodegen.dox
+++ b/apps/tutorials/codegen/03_AdvancedLBMCodegen.dox
@@ -9,13 +9,13 @@ This tutorial demonstrates how to use [pystencils](https://pycodegen.pages.i10gi

 For large-scale LB simulations, the highly parallel design of a general-purpose graphics processing unit (GPGPU) can yield significant improvements in performance. The waLBerla framework relies on CUDA to run simulations on NVIDIA GPUs. In this tutorial, we will also show how code generation can be used to generate native CUDA code for different kinds of kernels.

-In this tutorial, we will be using the more advanced cumulant-based multiple-relaxation-time (MRT) collision operator. Instead of relaxing the entire distribution functions toward their equilibrium values, their [cumulants](https://en.wikipedia.org/wiki/Cumulant) are relaxed with individual relaxation rates. We will also use the D2Q9 velocity set. For this velocity set, the zeroth- and first-order cumulants correspond to density and momentum which are conserved during collisions, so their relaxation rates can be set to zero. We will specify one common relaxation rate $\omega$ for the three second-order cumulants to ensure the correct viscosity of the fluid; the higher-order cumulants will be set to their equilibrium values which correspond to a relaxation rate of 1.
+In this tutorial, we will be using the more advanced cumulant-based multiple-relaxation-time (MRT) collision operator. Instead of relaxing the entire distribution functions toward their equilibrium values, their [cumulants](https://en.wikipedia.org/wiki/Cumulant) are relaxed with individual relaxation rates. We will also use the D2Q9 velocity set. For this velocity set, the zeroth- and first-order cumulants correspond to density and momentum which are conserved during collisions, so their relaxation rates can be set to zero. We will specify one common relaxation rate \f$ \omega \f$ for the three second-order cumulants to ensure the correct viscosity of the fluid; the higher-order cumulants will be set to their equilibrium values which correspond to a relaxation rate of 1.

 \section advancedlbmcodegen_python Code Generation in Python

-We start by defining some general parameters as well as [SymPy](https://www.sympy.org/) symbols and pystencils fields for later usage. Those include the stencil (`D2Q9`), the memory layout (`fzyx`, see \ref tutorial_codegen01) and the symbol $\omega$ for the relaxation rate of the second-order cumulants.
+We start by defining some general parameters as well as [SymPy](https://www.sympy.org/) symbols and pystencils fields for later usage. Those include the stencil (`D2Q9`), the memory layout (`fzyx`, see \ref tutorial_codegen01) and the symbol \f$ \omega \f$ for the relaxation rate of the second-order cumulants.

-For the stream-pull-collide type kernel, we need two pdf fields which we set up using pystencils. We will use the built-in field swapping functionality of `generate_sweep`. It will generate the pointer swapping for the temporary destination field with the source field after each time step. This requires us to obey a specific naming convention: The temporary field *must* have the same *field name* (passed as a string to `ps.fields`) as the original field but with a suffix `_tmp`. Otherwise, an error will occur during compilation.
+For the stream-pull-collide type kernel, we need two PDF fields which we set up using pystencils. We will use the built-in field swapping functionality of `generate_sweep`. It will generate the pointer swapping for the temporary destination field with the source field after each time step. This requires us to obey a specific naming convention: The temporary field *must* have the same *field name* (passed as a string to `ps.fields`) as the original field but with a suffix `_tmp`. Otherwise, an error will occur during compilation.

 For VTK output and the initial velocity setup, we define a velocity vector field as an output field for the LB method.

@@ -54,7 +54,7 @@ lbm_update_rule = create_lb_update_rule(optimization=optimization,
 lbm_method = lbm_update_rule.method
 \endcode

-In \ref tutorial_codegen02, we were able to use the framework built around the waLBerla lattice model template API for setting up the shear flow's initial velocity profile. Since we are not using a lattice model class this time, this API is not available to us. With lbmpy, though, we can generate a kernel which takes in scalar values or fields for the initial density and velocity and sets the initial PDF values to the corresponding equilibrium. The function `macroscopic_values_setter` from `lbmpy.macroscopic_value_kernels` returns a set of assignments for this initialization procedure. It takes the LB method definition as an argument, as well as either symbols or pystencils field accesses for the initial density `rho` and the initial velocity. Lastly, it takes the pdf field's centre vector as the destination for the pdf values. We define a separate symbol for the density and use the velocity field defined above.
+In \ref tutorial_codegen02, we were able to use the framework built around the waLBerla lattice model template API for setting up the shear flow's initial velocity profile. Since we are not using a lattice model class this time, this API is not available to us. With lbmpy, though, we can generate a kernel which takes in scalar values or fields for the initial density and velocity and sets the initial PDF values to the corresponding equilibrium. The function `macroscopic_values_setter` from `lbmpy.macroscopic_value_kernels` returns a set of assignments for this initialization procedure. It takes the LB method definition as an argument, as well as either symbols or pystencils field accesses for the initial density `rho` and the initial velocity. Lastly, it takes the PDF field's centre vector as the destination for the PDF values. We define a separate symbol for the density and use the velocity field defined above.

 \code
 initial_rho = sp.Symbol('rho_0')
@@ -70,11 +70,11 @@ Everything is now prepared to generate the actual C++ code. We create the code g
 Several functions from `pystencils_walberla` and `lbmpy_walberla` are called to generate the classes:

 - The LBM sweep is generated by the `lbm_update_rule` equations using `generate_sweep`. This function takes an additional parameter `field_swaps` which takes a list of pairs of fields to generate the pointer swapping.
- The communication pack info is generated using `generate_pack_info_from_kernel` which infers from the update rule's write accesses the pdf indices that need to be communicated. Without further specification, it assumes a pull-type kernel.
- The pdf initialization kernel is generated from the `pdfs_setter` assignment collection using `generate_sweep`.
+- The communication pack info is generated using `generate_pack_info_from_kernel` which infers from the update rule's write accesses the PDF indices that need to be communicated. Without further specification, it assumes a pull-type kernel.
+- The PDF initialization kernel is generated from the `pdfs_setter` assignment collection using `generate_sweep`.
 - Using `generate_boundary`, we generate an optimised implementation of a NoSlip boundary handler for the domain's walls.

-\code 
+\code
 with CodeGeneration() as ctx:
    if ctx.cuda:
        target = 'gpu'
@@ -107,7 +107,7 @@ We will now integrate the generated classes into a waLBerla application. After a
 #include "DensityAndVelocityFieldSetter.h"
 \endcode

-We set up typedef aliases for the generated pack info and the D2Q9 stencil. For the PDF and velocity fields, we use instances of the field::GhostLayerField template. The pdf field has nine entries per spatial position, as specified by the `Stencil_T::Size` parameter. As our domain is two-dimensional, the velocity at each lattice node is a two-dimensional vector. Thus, we set up the velocity field to have two index dimensions passing the stencil's dimension as a template parameter. Finally, we also define a typedef alias for our generated NoSlip boundary.
+We set up typedef aliases for the generated pack info and the D2Q9 stencil. For the PDF and velocity fields, we use instances of the field::GhostLayerField template. The number of entries of the PDF field is specified by the `Stencil_T::Size` parameter. As our domain is two-dimensional, the velocity at each lattice node is a two-dimensional vector. Thus, we set up the velocity field to have two index dimensions passing the stencil's dimension as a template parameter. Finally, we also define a typedef alias for our generated NoSlip boundary.

 \code
 // Communication Pack Info
@@ -128,7 +128,7 @@ typedef FlagField< flag_t > FlagField_T;
 typedef lbm::CumulantMRTNoSlip NoSlip_T;
 \endcode

-The pdf field, the velocity field and the flag field which stores information about the cell types for boundary handling are set up using field::addToStorage. The memory layout `fzyx` is explicitly set for the pdf and velocity fields. We set initial values of `0.0` for both of them since they will be explicitly initialised later on.
+The PDF field, the velocity field and the flag field which stores information about the cell types for boundary handling are set up using field::addToStorage. The memory layout `fzyx` is explicitly set for the PDF and velocity fields. We set initial values of `0.0` for both of them since they will be explicitly initialised later on.

 \subsection advancedlbmpy_app_sweep_and_packinfo Using the generated sweep and pack info

@@ -139,7 +139,7 @@ The generated pack info class can be used in exactly the same way as a waLBerla-
   communication.addPackInfo(make_shared< PackInfo_T >(pdfFieldId));
 \endcode

-Upon construction, the generated sweep class requires the pdf field, the velocity field and the relaxation rate `omega`. The parameter `omega` is read from a parameter file. After creation, the sweep functor can be passed directly to the timeloop.
+Upon construction, the generated sweep class requires the PDF field, the velocity field and the relaxation rate `omega`. The parameter `omega` is read from a parameter file. After creation, the sweep functor can be passed directly to the timeloop.

 \code
   timeloop.add() << Sweep(pystencils::CumulantMRTSweep(pdfFieldId, velocityFieldId, omega));
@@ -147,7 +147,7 @@ Upon construction, the generated sweep class requires the pdf field, the velocit

 \subsection advancedlbmpy_app_boundary Adding the NoSlip Boundary

-The generated NoSlip boundary works independently of the native waLBerla LBM boundary handling classes, which are templated for the use with lattice models. Still, we use the functions exposed by the geometry module to initialise the flag field from the parameter file. The NoSlip boundary handler is constructed passing the block grid and the pdf field's identifier. After the flag field has been set up, the `fillFromFlagField` method is called to register the boundary cells with the boundary handler.
+The generated NoSlip boundary works independently of the native waLBerla LBM boundary handling classes, which are templated for the use with lattice models. Still, we use the functions exposed by the geometry module to initialise the flag field from the parameter file. The NoSlip boundary handler is constructed passing the block grid and the PDF field's identifier. After the flag field has been set up, the `fillFromFlagField` method is called to register the boundary cells with the boundary handler.

 \code
   const FlagUID fluidFlagUID("Fluid");
@@ -172,12 +172,12 @@ The NoSlip sweep can now be added to the timeloop before the LBM sweep. We also

 \subsection advancedlbmpy_velinit Initialisation of the Velocity Profile

-Setting up the shear flow's initial velocity profile consists of two parts. First, the velocity field is initialised to the shear flow. Second, the prepared velocity field is passed to the generated pdf field setter, which then initialises the equilibrium pdf values from the velocities. The density is uniformly set to unity.
+Setting up the shear flow's initial velocity profile consists of two parts. First, the velocity field is initialised to the shear flow. Second, the prepared velocity field is passed to the generated PDF field setter, which then initialises the equilibrium PDF values from the velocities. The density is uniformly set to unity.

 The `initShearFlowVelocityField` function takes a pointer to the block storage, the velocity field's identifier and a handle to a parameter file block. It then reads the magnitudes of the velocity's $x$-component and the perturbation in the $y$-direction. For the random noise, a random number generator is created with a seed specified in the parameter file. Then, the function iterates over all blocks (distributed to MPI processes) and cells. A cell's velocity is then set according to its global position.

 \code
-void initShearFlowVelocityField(const shared_ptr< StructuredBlockForest >& blocks, 
+void initShearFlowVelocityField(const shared_ptr< StructuredBlockForest >& blocks,
                                const BlockDataID& velocityFieldId,
                                const Config::BlockHandle& config)
 {
@@ -207,7 +207,7 @@ void initShearFlowVelocityField(const shared_ptr< StructuredBlockForest >& block
 }
 \endcode

-After the velocity field has been initialised, the generated `InitialPDFsSetter` sweep is created and run once on each block. After this step, the pdf field will be initialised with the correct equilibrium distributions for the first iteration.
+After the velocity field has been initialised, the generated `InitialPDFsSetter` sweep is created and run once on each block. After this step, the PDF field will be initialised with the correct equilibrium distributions for the first iteration.

 \code
   // Velocity field setup
@@ -231,7 +231,7 @@ The simulation is now ready to run.

 If CUDA is enabled, some implementation details need to be different from a CPU-only version. This mainly concerns the creation and management of fields, MPI communication and VTK output. Since the initialisation, LBM and NoSlip sweeps run entirely on the GPU, the PDF field has to be set up only in graphics memory. In contrast to that is the velocity field required by CPU and GPU. The shear flow velocity profile is constructed by CPU code before the initialisation kernel maps it onto the PDF field on the GPU. Also, the VTK output routines which run on the CPU need to read the velocity field. It thus needs to be created twice: Once in the main memory, and once in GPU memory. It is then copied on-demand from the GPU to the CPU. Furthermore, we create a flag field, which is only needed on the CPU. After the initialisation, we use it to create the index-vectors for the boundary-handling. The index vectors are then transferred to the GPU and not the entire flag field.

-For the largest part, though, the C++ code is identical. The code snippets presented above represent only the CPU variant of the code. The GPU implementation can be found in the source file 03_AdvancedLBMCodegen.cpp. There, code blocks which are different from the CPU to the GPU implementation are toggled via preprocessor conditionals. 
+For the largest part, though, the C++ code is identical. The code snippets presented above represent only the CPU variant of the code. The GPU implementation can be found in the source file 03_AdvancedLBMCodegen.cpp. There, code blocks which are different from the CPU to the GPU implementation are toggled via preprocessor conditionals.

 \section advancedlbmpy_conclusion Conclusion and Outlook


--- a/apps/tutorials/codegen/LatticeModelGeneration.ipynb
+++ b/apps/tutorials/codegen/LatticeModelGeneration.ipynb
--- a/doc/Mainpage.dox
+++ b/doc/Mainpage.dox
@@ -46,10 +46,14 @@ all the basic data strcutures and concepts of the framework.
 \subsection codegen Code Generation

 - \ref tutorial_codegen01 \n
-  This tutorial shows how to use pystencils to generate waLBerla sweeps from symbolic descriptions. 
+  This tutorial shows how to use pystencils to generate waLBerla sweeps from symbolic descriptions.
 - \ref tutorial_codegen02 \n
-   
-   
+  This tutorial demonstrates how to use lbmpy to generate an efficient lattice model for a waLBerla sweep.
+- \ref tutorial_codegen03 \n
+  This tutorial describes how to use lbmpy to generate a full LBM sweep for waLBerla.
+
+
+
 \section further_info Further information

 Not all features of the framework are covered in the tutorials.