GitLab now enforces expiry dates on tokens that originally had no set expiration date. Those tokens were given an expiration date of one year later. Please review your personal access tokens, project access tokens, and group access tokens to ensure you are aware of upcoming expirations. Administrators of GitLab can find more information on how to identify and mitigate interruption in our documentation.
@@ -98,9 +98,7 @@ As in \ref tutorial_codegen02, the classes generated by above code need to be re
\section advancedlbmcodegen_application The waLBerla application
We will now integrate the generated classes into a waLBerla application. If CUDA is enabled and the application is meant to utilise the GPU kernels, some implementation details will be different from a CPU-only version. This mainly concerns the creation and management of fields, MPI communication and VTK output. For the largest part, though, the C++ code is identical. The remainder of the tutorial will focus only on CPU code. In the source file 03_AdvancedLBMCodegen.cpp, code blocks which are different in a GPU implementation are toggled via preprocessor conditionals.
After adding the code generation target as a CMake dependency, we can include their header files:
We will now integrate the generated classes into a waLBerla application. After adding the code generation target as a CMake dependency, we can include their header files:
\code
#include "CumulantMRTNoSlip.h"
...
...
@@ -229,6 +227,12 @@ After the velocity field has been initialized, the generated `InitialPDFsSetter`
The simulation is now ready to be run.
\subsection advancedlbmpy_cuda Differences in the GPU application
If CUDA is enabled and the application is meant to utilise the GPU kernels, some implementation details need to be different from a CPU-only version. This mainly concerns the creation and management of fields, MPI communication and VTK output. Since the initialization, LBM and NoSlip sweeps run entirely on the GPU, the PDF field has to be set up only in graphics memory. The velocity field in turn is required both by CPU and GPU code: The shear flow velocity profile is constructed by CPU code before the initialization kernel maps it onto the PDF field on the GPU. Also, the VTK output routines which run on the CPU need to read the velocity field. It thus needs to be created twice: Once in main memory, and once in GPU memory. Its contents are then copied on-demand.
For the largest part, though, the C++ code is identical. The code snippets presented above represent only the CPU variant of the code. The GPU implementation can be found in the source file 03_AdvancedLBMCodegen.cpp. There, code blocks which are different from the CPU to the GPU implementation are toggled via preprocessor conditionals.
\section advancedlbmpy_conclusion Conclusion and Outlook
We have now successfully implemented a waLBerla LBM simulation application with an advanced collision operator, which can be specialized for both CPU and GPU hardware targets. This is still just a glimpse of the capabilities of code generation. One possible extension would be the use of advanced streaming patterns like the AA-pattern or EsoTwist to reduce the simulation's memory footprint. Also, lbmpy gives us the tools to develop advanced lattice boltzmann methods for many kinds of applications. The basic principles demonstrated in these tutorials can thus be used for creating much more complicated simulations with specially tailored, optimized lattice boltzmann code.