diff --git a/apps/tutorials/codegen/02_CodegenLbmpy.dox b/apps/tutorials/codegen/02_CodegenLbmpy.dox
deleted file mode 100644
index fd0c62c6e01a2c4c998dcf55a780ab5484903da2..0000000000000000000000000000000000000000
--- a/apps/tutorials/codegen/02_CodegenLbmpy.dox
+++ /dev/null
@@ -1,74 +0,0 @@
-namespace walberla{
-
-/**
-\page tutorial_codegen02 Tutorial - Code Generation 2: Lattice Model Generation with lbmpy
-
-\section overview Overview
-
-This tutorial demonstrates how to use <a href="https://pycodegen.pages.i10git.cs.fau.de/lbmpy/notebooks/00_tutorial_lbmpy_walberla_overview.html" target="_blank">lbmpy</a> in conjunction with waLBerla to generate efficient implementations of various Lattice Boltzmann Methods to be included in large-scale distributed memory fluid flow simulations. While waLBerla provides an advanced framework for setting up and running complex fluid simulations, it only implements a small set of Lattice Boltzmann collision operators. Writing an efficient C++ implementation of an advanced Lattice-Boltzmann method can be very cumbersome. For this reason, lbmpy has been developed. It is a Python framework which allows to define a set of LB equations at different levels of abstraction, and then generate a highly optimized C implementation of these equations. An introduction to lbmpy can be found <a href="https://pycodegen.pages.i10git.cs.fau.de/lbmpy/notebooks/00_tutorial_lbmpy_walberla_overview.html" target="_blank">here</a>. 
-
-
-As in the previous tutorial (\ref tutorial_codegen01), we will first define the numeric methods and set up code generation in a Python script. We will then include the generated code in a waLBerla application to simulate a two-dimensional shear flow in a periodic domain. Additionally, it will be shown how a waLBerla simulation with complex initial conditions can be initialized using parameter files.
-
-\section lbmpy_codegen_script The Code Generation Script
-
-In this section, we will define multiple collision operators in Python using lbmpy and generate lattice model implementations from them. These lattice models consists of several parts, including:
-- The **velocity set** together with its weights. For our two-dimensional domain, we will be using the D2Q9 velocity set.
-- The **collision operator**. We will implement three different collision models: a simple SRT model, a cumulant-based MRT model and an orthogonal MRT model with an entropy condition.
-- The **force model**. Since no forces will be present in our simulation, we will not use a force model in this tutorial. For details about force models, see <a href="https://pycodegen.pages.i10git.cs.fau.de/lbmpy/sphinx/forcemodels.html" target="_blank">lbmpy's documentation on force models</a>.
-
-In addition to the lattice models, it will be shown how to generate lattice- and hardware-specific implementations of boundary conditions. For CPU simulations, this is not necessary as waLBerla already provides efficient implementations for common LBM boundaries. If you wish to use hardware acceleration with NVIDIA GPU's, however, this feature can be used to generate efficient CUDA implementations.
-
-\subsection lbmpy_codegen_common General Setup
-
-First, we define a few Python objects which all three collision operators will have in common. These are the D2Q9 stencil, the memory layout, one relaxation rate $\omega$, and a few optimization options. 
-
-\code
-STENCIL = 'D2Q9'
-OMEGA = sp.Symbol('omega')
-LAYOUT = 'zyxf'
-
-#   Optimization
-OPT = {'target': 'cpu', 'cse_global': True, 'field_layout': LAYOUT}
-\endcode
-
-The relaxation rate is represented by a <a href="https://www.sympy.org/" target="_blank">sympy</a> symbol and will be assigned a value from C++ code later on. As a memory layout, we choose `zyxf` which is an Array-Of-Structs layout. This favors memory locality for the collision step, but as a tradeoff is disadvantageous for vectorization of the streaming step. 
-
-We also define a dictionary with optimization parameters. The code generation target is set to `cpu` (change this to `gpu` for a CUDA-accelerated lattice model. This requires CUDA to be installed). We enable global commmon subexpression elimination with the `cse_global` flag, and set the PDF field's memory layout.
-
-\subsection lbmpy_latmod_def Definition of Lattice Models and Boundaries
-
-The major advantage of code generation is the ability to obtain highly efficient implementations of specially tailored simulation schemes directly from an abstract definition. Also, code generation enables us to easily compare different lattice models for a single simulation. 
-
-We define three functions `build_srt_method`, `build_cumulant_method` and `build_entropic_method` to generate the three lattice models listed above. In each of these functions, we first define the respective collision rule by a list of parameters. The function `create_lb_collision_rule` from lbmpy is used to derive and simplify the set of equations that define the collision rule. These assignments are then passed to waLBerla's `generate_lattice_model` function which generates the C++ header and source files. Last, `generate_boundary` is used to generate a specific implementation of the NoSlip boundary condition. Both generation functions use the build variables available through the code generation context `ctx` to determine several details of implementation, like floating-point precision and whether to use CUDA or OpenMP.
-
-TODO: Pack Info
-
-The three functions only differ in the chosen method parameters and the class names passed to the code generation functions. For SRT, we only need to define the stencil and the BGK collision operator's single relaxation rate:
-
-\code
-def build_srt_method(ctx):
-    srt_params = {'stencil': STENCIL,
-                  'method': 'srt',
-                  'relaxation_rate': OMEGA}
-
-    srt_collision_rule = create_lb_collision_rule(optimization=OPT, output=OUTPUT, **srt_params)
-    generate_lattice_model(ctx, "SRTLatticeModel", srt_collision_rule, field_layout=LAYOUT)
-
-    srt_update_rule = create_lb_update_rule(collision_rule=srt_collision_rule, optimization=OPT)
-    generate_pack_info_from_kernel(ctx, "SRTPackInfo", srt_update_rule)
-
-    generate_boundary(ctx, "SRTNoSlip", NoSlip(), srt_collision_rule.method)
-\endcode
-
-\section lbmpy_simulation_app The Simulation application
-
-\subsection lbmpy_codegen_inclusion Inclusion of the generated code
-
-\subsection lbmpy_shear_flow_init Initialization of the Shear Flow
-
-\tableofcontents
-
-*/
-
-}
\ No newline at end of file
diff --git a/apps/tutorials/codegen/02_LBMLatticeModelGeneration.cpp b/apps/tutorials/codegen/02_LBMLatticeModelGeneration.cpp
index 1cf4ea0f4142a16e601ef78239b08a3d34772572..af5346046a5622801f4fbc4100e2b6da1ddfa23f 100644
--- a/apps/tutorials/codegen/02_LBMLatticeModelGeneration.cpp
+++ b/apps/tutorials/codegen/02_LBMLatticeModelGeneration.cpp
@@ -159,7 +159,7 @@ int main(int argc, char** argv)
    // Communication
    blockforest::communication::UniformBufferedScheme< CommunicationStencil_T > communication(blocks);
    communication.addPackInfo(make_shared< PackInfo_T >(pdfFieldId));
-   
+
    // Timeloop
    timeloop.add() << BeforeFunction(communication, "communication")
                   << Sweep(BHFactory::BoundaryHandling::getBlockSweep(boundaryHandlingId), "Boundary Handling");
diff --git a/apps/tutorials/codegen/02_LBMLatticeModelGeneration.dox b/apps/tutorials/codegen/02_LBMLatticeModelGeneration.dox
new file mode 100644
index 0000000000000000000000000000000000000000..d67b6e8f04e75e1cc392588b051427bb291d0a2d
--- /dev/null
+++ b/apps/tutorials/codegen/02_LBMLatticeModelGeneration.dox
@@ -0,0 +1,220 @@
+namespace walberla{
+
+/**
+\page tutorial_codegen02 Tutorial - Code Generation 2: Lattice Model Generation with lbmpy
+
+\section overview Overview
+
+This tutorial demonstrates how to use <a href="https://pycodegen.pages.i10git.cs.fau.de/lbmpy/notebooks/00_tutorial_lbmpy_walberla_overview.html" target="_blank">lbmpy</a> in conjunction with waLBerla to generate efficient implementations of various Lattice Boltzmann Methods to be included in large-scale distributed memory fluid flow simulations. While waLBerla provides an advanced framework for setting up and running complex fluid simulations, it only implements a small set of Lattice Boltzmann collision operators. Writing an efficient C++ implementation of an advanced Lattice-Boltzmann method can be very cumbersome. For this reason, lbmpy was developed. It is a Python framework which allows to define a set of LB equations at different levels of abstraction, and then generate a highly optimized C implementation of these equations. An introduction to lbmpy can be found <a href="https://pycodegen.pages.i10git.cs.fau.de/lbmpy/notebooks/00_tutorial_lbmpy_walberla_overview.html" target="_blank">here</a>. 
+
+
+As in the previous tutorial (\ref tutorial_codegen01), we will first define the numeric methods and set up code generation in a Python script. We will then include the generated code in a waLBerla application to simulate a two-dimensional shear flow in a periodic domain. Additionally, it will be shown how a waLBerla simulation with complex initial conditions can be initialized using parameter files.
+
+\section lbmpy_codegen_script Setting up code generation in Python
+
+In this section, we will define an SRT collision operator in Python using lbmpy and generate a waLBerla lattice model implementation from its equations. A lattice model defines several things, including:
+- The **velocity set** together with its weights. For our two-dimensional domain, we will be using the D2Q9 velocity set.
+- The **LBM sweep** which consists of the collision operator and the streaming pattern. We will be using a standard pull-scheme with a temporary field.
+- The **force model**. Since no forces will be present in our simulation, we will not use a force model in this tutorial. For details about force models, see <a href="https://pycodegen.pages.i10git.cs.fau.de/lbmpy/sphinx/forcemodels.html" target="_blank">lbmpy's documentation on force models</a>.
+
+In addition to the lattice model, it will be shown how to generate a method-specific pack info class for MPI communication to reduce communication load to the necessary minimum. 
+
+In the code generation python script, we first require a few imports from lbmpy itself and from the waLBerla code generation libraries. lbmpy code generation is based on pystencils; the basic procedure is thus the same as in the previous tutorial. We need the `CodeGeneration` context from the `pystencils_walberla` module for the connection to the build system. For generating the communication pack info, we will use `generate_pack_info_from_kernel` from the same module. This method of pack info generation is not limited to LBM implementations, but can be used with any kind of sweep kernel. The function `generate_pack_info_from_kernel` simply takes a pystencils `AssignmentCollection` and extracts all field accesses to determine which cell entries need to be communicated.
+
+From the `lbmpy.creationfunctions` module we require the functions to create collision and update rules. For the actual code generation, `generate_lattice_model` from `lbmpy_walberla` is required. Since we will define symbols, `sympy` is also needed.
+
+\code
+import sympy as sp
+
+from lbmpy.creationfunctions import create_lb_collision_rule, create_lb_update_rule
+
+from pystencils_walberla import CodeGeneration, generate_pack_info_from_kernel
+from lbmpy_walberla import generate_lattice_model
+\endcode
+
+First, we define a few general parameters. These include the stencil (D2Q9) and the memory layout (`fzyx`, see \ref tutorial_codegen01 ). We define a sympy symbol for the relaxation rate $\omega$ since it will later be set from C++ code. A dictionary with optimization parameters is also set up. Here, we define the compilation target, enable global common subexpression elimination (`cse_global`) and set the PDF field's memory layout. In general, the target could be set to `gpu` to create a CUDA implementation, but this is currently not possible for lattice models.
+
+\code
+STENCIL = 'D2Q9'
+OMEGA = sp.Symbol('omega')
+LAYOUT = 'fzyx'
+
+#   Optimization
+OPT = {'target': 'cpu', 'cse_global': True, 'field_layout': LAYOUT}
+\endcode
+
+Next, we set the parameters for the SRT method in a dictionary and create both the collision and update rules by calling the respective lbmpy functions. They both return an `AssignmentCollection` containing all necessary equations. The only parameters needed for SRT are the stencil and the relaxation rate. For generating the lattice model, we only require the collision rule's equations since `generate_lattice_model` adds the two-fields pull scheme for the streaming step internally. In fact, lattice model generation is limited to this standard streaming pattern. 
+
+The update rule is still needed in the code generation process; namely for the pack info generation. The collision step only acts within one cell. Thus, the collision rule's equations contain no neighbour accesses. Calling `create_lb_update_rule` extends this by the same two-fields pull scheme as `generate_lattice_model`, and resulting update rule contains exactly those neighbour accesses which are required for `generate_pack_info_from_kernel` to build the optimized pack info.
+
+\code
+srt_params = {'stencil': STENCIL,
+              'method': 'srt',
+              'relaxation_rate': OMEGA}
+
+srt_collision_rule = create_lb_collision_rule(optimization=OPT, **srt_params)
+srt_update_rule = create_lb_update_rule(collision_rule=srt_collision_rule, optimization=OPT)
+\endcode
+
+Finally, we retrieve the code generation context and call the respective functions for generating the lattice model and the pack info. Both require the context and a class name as parameters. To  `generate_lattice_model`, we also pass the collision rule and the field layout; `generate_pack_info_from_kernel` in turn gets the update rule. 
+
+\code
+with CodeGeneration() as ctx:
+    generate_lattice_model(ctx, "SRTLatticeModel", srt_collision_rule, field_layout=LAYOUT)
+    generate_pack_info_from_kernel(ctx, "SRTPackInfo", srt_update_rule)
+\endcode
+
+Notice that, other than in \ref tutorial_codegen01, we did not need to define any fields. Both the source and destination PDF fields are created internally by lbmpy and `generate_lattice_model`. 
+
+As a final touch, we still need to set up the CMake build target for the code generation script. This time, two distinct classes will be generated. Therefore we need to list the header and source file names for both classes separately.
+
+\code
+walberla_generate_target_from_python( NAME 02_LBMLatticeModelGenerationPython
+        FILE 02_LBMLatticeModelGeneration.py
+        OUT_FILES   SRTLatticeModel.cpp SRTLatticeModel.h 
+                    SRTPackInfo.cpp SRTPackInfo.h )
+\endcode
+
+This completes the code generation part.
+
+\section lbmpy_simulation_app The Simulation application
+
+This section is concerned with the implementation of a waLBerla C++ application for the simulation of a periodic shear flow using the generated SRT lattice model and pack info. After adding the code generation target defined above as a dependency to the application's CMake target, we can straightforwardly include both generated classes:
+
+\code
+#include "SRTLatticeModel.h"
+#include "SRTPackInfo.h"
+\endcode
+
+For convenience, we define typedef aliases for the several types and templates that will later be used. Since our generated LBM implementation is consistent with the waLBerla LatticeModel API, we can use the lbm::PdfField class template for our PDF fields.
+
+\code
+// Typedef alias for the lattice model
+typedef lbm::SRTLatticeModel LatticeModel_T;
+
+// Communication Pack Info
+typedef pystencils::SRTPackInfo PackInfo_T;
+
+// Also set aliases for the stencils involved...
+typedef LatticeModel_T::Stencil Stencil_T;
+typedef LatticeModel_T::CommunicationStencil CommunicationStencil_T;
+
+// ... and for the required field types.
+typedef lbm::PdfField< LatticeModel_T > PdfField_T;
+
+// Also, for boundary handling, a flag data type and flag field are required.
+typedef walberla::uint8_t flag_t;
+typedef FlagField< flag_t > FlagField_T;
+typedef lbm::DefaultBoundaryHandlingFactory< LatticeModel_T, FlagField_T > BHFactory;
+\endcode
+
+The application will be mostly similar to \ref tutorial_lbm01, so we will mainly focus on the differences here. It will also be configured by a parameter file as described in the other tutorial.
+
+\subsection lbmpy_add_lattice_model_sweep Adding the generated sweep
+
+The only major difference caused by the used of a generated lattice model is when the LBM sweep is added to the timeloop. The sweep is exposed as a static member of the generated lattice model class and can be added like this:
+
+\code
+timeloop.add() << Sweep(LatticeModel_T::Sweep(pdfFieldId), "LB stream & collide");
+\endcode
+
+The remaining extensions concern only the setup of boundaries and the initial velocities. 
+
+\subsection lbmpy_shear_flow_init Setup of the Shear Flow Scenario
+
+We will set up a shear flow scenario in a rectangular, two-dimensional domain which is periodic in the x-direction and limited by NoSlip-boundaries (i.e. walls) to the north and south. The fluid will be moving to the right in the northern and southern parts of the domain, and to the right in the middle with the same speed. Its velocity in the y-direction will not be initially zero, but slightly perturbed by random noise. This will cause the development of vortices between the shear layers during the simulation.
+
+For simplicity, the boundaries are set up using the lbm::DefaultBoundaryHandlingFactory as described in \ref tutorial_lbm01. Other than there, no pressure or velocity boundary conditions exist; therefore the declaration simplifies to:
+
+\code
+BlockDataID boundaryHandlingId =
+      BHFactory::addBoundaryHandlingToStorage(blocks, "boundary handling", flagFieldId, pdfFieldId, fluidFlagUID,
+                                              Vector3< real_t >(), Vector3< real_t >(), real_c(0.0), real_c(0.0));
+\endcode
+
+In the parameter file, the boundary block thus only defines the NoSlip boundaries. Also, in the `DomainSetup` block, the domain needs to be periodic in the x-direction.
+
+\code
+DomainSetup
+{
+   blocks        <  1,    1, 1 >;
+   cellsPerBlock <  300, 80, 1 >;
+   periodic      <  1,    0, 1 >;  
+}
+
+(...)
+
+Boundaries 
+{   
+   Border { direction S,N; walldistance -1; NoSlip {} }		
+}
+\endcode
+
+The most substantial extension we are making is the velocity initialization. We will be using the lbm::initializer::PdfFieldInitializer class in conjunction with lbm::initializer::ExprSystemInitFunction which is capable of parsing mathematical expressions from the parameter file to set up complex initial flows. For this purpose, waLBerla uses the [exprtk](http://www.partow.net/programming/exprtk/index.html) C++ library. We will need to extend this functionality a little to introduce the random noise.
+
+The PdfFieldInitializer's initDensityAndVelocity function expects a function of type `std::vector< real_t > (const Cell&)`. This function will receive a `Cell` with global coordinates and should return a `std::vector` with four entries: One density and three cartesian velocity components for the given cell. For this purpose, we create a functor struct with these members:
+
+- An instance `exprInitFunc_` of lbm::initializer::ExprSystemInitFunction for initializing the x-velocities;
+- A random number generator `rng_` for the y-velocities, which is an instance of walberla::math::RealRandom;
+- The magnitude of the random noise.
+
+All members will be set using parameters specified in a parameter file block. The functor `exprInitFunc_` is initialized by calling it's `parse` method inside the constructor.
+
+We also define the `operator()` with the above signature. It first calls `exprInitFunc_` to set density and velocity, and then adds the random perturbation to the y component. 
+
+\code
+struct ShearFlowInit
+{
+ private:
+   lbm::initializer::ExprSystemInitFunction exprInitFunc_;
+   const real_t noiseMagnitude_;
+   math::RealRandom< real_t > rng_;
+
+ public:
+   ShearFlowInit(const shared_ptr< StructuredBlockForest >& blocks, const Config::BlockHandle& setup)
+      : exprInitFunc_(blocks), noiseMagnitude_(setup.getParameter< real_t >("noise_magnitude")),
+        rng_(setup.getParameter< std::mt19937::result_type >("noise_seed", 42))
+   {
+      if (!exprInitFunc_.parse(setup)) { WALBERLA_ABORT("Shear Flow Setup was incomplete."); }
+   }
+
+   std::vector< real_t > operator()(const Cell& globalCell)
+   {
+      std::vector< real_t > densityAndVelocity = exprInitFunc_(globalCell);
+      real_t yPerturbation                     = noiseMagnitude_ * rng_();
+      densityAndVelocity[2] += yPerturbation;
+
+      return densityAndVelocity;
+   }
+};
+\endcode
+
+All the required parameters and the expressions for initializing the density and velocity are defined in the parameter file, inside the block `ShearFlowSetup`. For `rho`, `u_x`, `u_y` and `u_z`, mathematical expressions can be specified which may include the variables `x`, `y`, `z` for a cell's global position and `n_x`, `n_y`, `n_z` which hold the number of cells in each direction. These expressions will then be evaluated for each domain cell.
+
+A seed for the random number generator is also specified, controlling the random noise and making it reproducible.
+
+\code
+ShearFlowSetup
+{
+   rho   1;
+   
+   u_x   if( ( y / n_y < 0.3 ) or (y / n_y > 0.7) , 0.08, -0.08);   
+   u_y   0;
+   u_z   0;
+
+   noise_magnitude     0.005;
+   noise_seed          42;
+}
+\endcode
+
+This completes the C++ implementation. It will produce VTK files which can then be viewed using ParaView, as described in previous tutorials.
+
+\section lbmpy_lattice_model_outlook Outlook
+
+Although generated lattice models are easy to define and easy to integrate with the waLBerla framework, they are somewhat limited in features. For example, the waLBerla code generation methods for lattice models do not allow advanced streaming patterns, or the generation of CUDA code. This is remedied by generating sweeps using `generate_sweep` directly instead of `generate_lattice_model`. The result is a generic waLBerla sweep which is not consistent with the lattice model API, thus making implementations a little more complicated since many class templates created for the use with lattice models can not be used. For this problem too, code generation offers solutions which will be further explained in the next tutorial. 
+
+\tableofcontents
+
+*/
+
+}
\ No newline at end of file
diff --git a/apps/tutorials/codegen/02_LBMLatticeModelGeneration.prm b/apps/tutorials/codegen/02_LBMLatticeModelGeneration.prm
index bdea0186e41c238634b6d387e539a07aec08260a..5df6b7e311345d64a394ab3bd87331cebdb6ce91 100644
--- a/apps/tutorials/codegen/02_LBMLatticeModelGeneration.prm
+++ b/apps/tutorials/codegen/02_LBMLatticeModelGeneration.prm
@@ -1,7 +1,7 @@
 
 Parameters 
 {
-	omega           1.8;
+   omega           1.8;
 	timesteps       10000;
 
 	remainingTimeLoggerFrequency 3; // in seconds
@@ -16,7 +16,7 @@ ShearFlowSetup
    u_z   0;
 
    noise_magnitude     0.005;
-   noise_seed              42;
+   noise_seed          42;
 }
 
 DomainSetup
@@ -42,10 +42,12 @@ Boundaries
 VTK 
 {
    // for parameter documentation see src/vtk/Initialization.cpp
-   fluid_field
+   srt_velocity_field
    {
       writeFrequency 100;
       ghostLayers    1;
+
+      baseFolder     vtk_out/srt_lattice_model;
       
       before_functions {
          PDFGhostLayerSync;
@@ -57,23 +59,6 @@ VTK
       
       writers {
          Velocity;
-         Density;
-      }
-   }
-   
-   flag_field
-   {
-      writeFrequency 10000000; // write only once
-   // ghostLayers    1;
-      
-      writers {
-         FlagField;
       }
    }
-   
-   domain_decomposition
-   {
-      writeFrequency 10000000; // write only once
-      outputDomainDecomposition true;
-   }
 }
diff --git a/apps/tutorials/codegen/03_AdvancedLBMCodegen.cpp b/apps/tutorials/codegen/03_AdvancedLBMCodegen.cpp
index 164e9afd711e8ae9d166ffee991e72e4fab4ebb3..2f07ed67071e88d213dad6d2e5abb748982ede56 100644
--- a/apps/tutorials/codegen/03_AdvancedLBMCodegen.cpp
+++ b/apps/tutorials/codegen/03_AdvancedLBMCodegen.cpp
@@ -216,7 +216,6 @@ int main(int argc, char** argv)
 
    // Timeloop
    timeloop.add() << BeforeFunction(communication, "communication") << Sweep(noSlip);
-
    timeloop.add() << Sweep(pystencils::CumulantMRTSweep(pdfFieldId, velocityFieldId, omega));
 
    // Stability Checker