Skip to content
Snippets Groups Projects
Commit 67a6aa80 authored by Frederik Hennig's avatar Frederik Hennig
Browse files

Added tutorial 03 section on CUDA code

parent dfe0b668
Branches
Tags
No related merge requests found
......@@ -98,9 +98,7 @@ As in \ref tutorial_codegen02, the classes generated by above code need to be re
\section advancedlbmcodegen_application The waLBerla application
We will now integrate the generated classes into a waLBerla application. If CUDA is enabled and the application is meant to utilise the GPU kernels, some implementation details will be different from a CPU-only version. This mainly concerns the creation and management of fields, MPI communication and VTK output. For the largest part, though, the C++ code is identical. The remainder of the tutorial will focus only on CPU code. In the source file 03_AdvancedLBMCodegen.cpp, code blocks which are different in a GPU implementation are toggled via preprocessor conditionals.
After adding the code generation target as a CMake dependency, we can include their header files:
We will now integrate the generated classes into a waLBerla application. After adding the code generation target as a CMake dependency, we can include their header files:
\code
#include "CumulantMRTNoSlip.h"
......@@ -229,6 +227,12 @@ After the velocity field has been initialized, the generated `InitialPDFsSetter`
The simulation is now ready to be run.
\subsection advancedlbmpy_cuda Differences in the GPU application
If CUDA is enabled and the application is meant to utilise the GPU kernels, some implementation details need to be different from a CPU-only version. This mainly concerns the creation and management of fields, MPI communication and VTK output. Since the initialization, LBM and NoSlip sweeps run entirely on the GPU, the PDF field has to be set up only in graphics memory. The velocity field in turn is required both by CPU and GPU code: The shear flow velocity profile is constructed by CPU code before the initialization kernel maps it onto the PDF field on the GPU. Also, the VTK output routines which run on the CPU need to read the velocity field. It thus needs to be created twice: Once in main memory, and once in GPU memory. Its contents are then copied on-demand.
For the largest part, though, the C++ code is identical. The code snippets presented above represent only the CPU variant of the code. The GPU implementation can be found in the source file 03_AdvancedLBMCodegen.cpp. There, code blocks which are different from the CPU to the GPU implementation are toggled via preprocessor conditionals.
\section advancedlbmpy_conclusion Conclusion and Outlook
We have now successfully implemented a waLBerla LBM simulation application with an advanced collision operator, which can be specialized for both CPU and GPU hardware targets. This is still just a glimpse of the capabilities of code generation. One possible extension would be the use of advanced streaming patterns like the AA-pattern or EsoTwist to reduce the simulation's memory footprint. Also, lbmpy gives us the tools to develop advanced lattice boltzmann methods for many kinds of applications. The basic principles demonstrated in these tutorials can thus be used for creating much more complicated simulations with specially tailored, optimized lattice boltzmann code.
......
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment