waLBerla issueshttps://i10git.cs.fau.de/walberla/walberla/-/issues2022-12-24T14:00:29+01:00https://i10git.cs.fau.de/walberla/walberla/-/issues/200GCC12: EquationSystem warning when using `DebugOptimized` and `OPTIMIZIE_FOR_...2022-12-24T14:00:29+01:00Dominik Thoennesdominik.thoennes@fau.deGCC12: EquationSystem warning when using `DebugOptimized` and `OPTIMIZIE_FOR_LOCALHOST`The EquationSystem.cpp emmits a `maybe-uninitialized` warning when build with `DebugOptimized` and `WALBERLA_OPTIMIZIE_FOR_LOCALHOST` on gcc12
```
Building CXX object src/core/CMakeFiles/core.dir/math/equation_system/EquationSystem.cpp....The EquationSystem.cpp emmits a `maybe-uninitialized` warning when build with `DebugOptimized` and `WALBERLA_OPTIMIZIE_FOR_LOCALHOST` on gcc12
```
Building CXX object src/core/CMakeFiles/core.dir/math/equation_system/EquationSystem.cpp.o
cd /build/src/core && /usr/local/bin/ccache g++ -DBOOST_ALL_NO_LIB -I/build/src -I/walberla/src -isystem /opt/boost/include -isystem /usr/lib/x86_64-linux-gnu/openmpi/include/openmpi -isystem /usr/lib/x86_64-linux-gnu/openmpi/include -isystem /opt/openmesh/include -Wall -Wconversion -Wshadow -march=native -Wfloat-equal -Wextra -pedantic -D_GLIBCXX_USE_CXX11_ABI=1 -pthread -g -O3 -std=c++17 -o CMakeFiles/core.dir/math/equation_system/EquationSystem.cpp.o -c /walberla/src/core/math/equation_system/EquationSystem.cpp
In file included from /opt/boost/include/boost/graph/depth_first_search.hpp:21,
from /opt/boost/include/boost/graph/max_cardinality_matching.hpp:21,
from /walberla/src/core/math/equation_system/EquationSystem.cpp:39:
In constructor 'boost::bgl_named_params<T, Tag, Base>::bgl_named_params(T, const Base&) [with T = boost::vec_adj_list_vertex_id_map<boost::no_property, long unsigned int>; Tag = boost::vertex_index_t; Base = boost::bgl_named_params<boost::detail::odd_components_counter<long unsigned int>, boost::graph_visitor_t, boost::no_property>]',
inlined from 'boost::bgl_named_params<PType, boost::vertex_index_t, boost::bgl_named_params<T, Tag, Base> > boost::bgl_named_params<T, Tag, Base>::vertex_index_map(const PType&) const [with PType = boost::vec_adj_list_vertex_id_map<boost::no_property, long unsigned int>; T = boost::detail::odd_components_counter<long unsigned int>; Tag = boost::graph_visitor_t; Base = boost::no_property]' at /opt/boost/include/boost/graph/named_function_params.hpp:217:5,
inlined from 'static bool boost::maximum_cardinality_matching_verifier<Graph, MateMap, VertexIndexMap>::verify_matching(const Graph&, MateMap, VertexIndexMap) [with Graph = boost::adjacency_list<boost::vecS, boost::vecS, boost::undirectedS>; MateMap = long unsigned int*; VertexIndexMap = boost::vec_adj_list_vertex_id_map<boost::no_property, long unsigned int>]' at /opt/boost/include/boost/graph/max_cardinality_matching.hpp:779:61,
inlined from 'bool boost::matching(const Graph&, MateMap, VertexIndexMap) [with Graph = adjacency_list<vecS, vecS, undirectedS>; MateMap = long unsigned int*; VertexIndexMap = vec_adj_list_vertex_id_map<no_property, long unsigned int>; AugmentingPathFinder = edmonds_augmenting_path_finder; InitialMatchingFinder = extra_greedy_matching; MatchingVerifier = maximum_cardinality_matching_verifier]' at /opt/boost/include/boost/graph/max_cardinality_matching.hpp:807:79,
inlined from 'bool boost::checked_edmonds_maximum_cardinality_matching(const Graph&, MateMap, VertexIndexMap) [with Graph = adjacency_list<vecS, vecS, undirectedS>; MateMap = long unsigned int*; VertexIndexMap = vec_adj_list_vertex_id_map<no_property, long unsigned int>]' at /opt/boost/include/boost/graph/max_cardinality_matching.hpp:817:48,
inlined from 'bool boost::checked_edmonds_maximum_cardinality_matching(const Graph&, MateMap) [with Graph = adjacency_list<vecS, vecS, undirectedS>; MateMap = long unsigned int*]' at /opt/boost/include/boost/graph/max_cardinality_matching.hpp:824:56,
inlined from 'void walberla::math::EquationSystem::match()' at /walberla/src/core/math/equation_system/EquationSystem.cpp:97:4:
/opt/boost/include/boost/graph/named_function_params.hpp:192:56: warning: '*(unsigned char*)((char*)&occ + offsetof(boost::detail::odd_components_counter<long unsigned int>,boost::detail::odd_components_counter<long unsigned int>::m_parity))' may be used uninitialized [-Wmaybe-uninitialized]
192 | bgl_named_params(T v, const Base& b) : m_value(v), m_base(b) {}
| ^~~~~~~~~
/opt/boost/include/boost/graph/max_cardinality_matching.hpp: In member function 'void walberla::math::EquationSystem::match()':
/opt/boost/include/boost/graph/max_cardinality_matching.hpp:778:52: note: '*(unsigned char*)((char*)&occ + offsetof(boost::detail::odd_components_counter<long unsigned int>,boost::detail::odd_components_counter<long unsigned int>::m_parity))' was declared here
778 | detail::odd_components_counter< v_size_t > occ(num_odd_components);
|
```https://i10git.cs.fau.de/walberla/walberla/-/issues/199Make CMake and CodeGen simpler2022-11-29T13:03:44+01:00Markus HolzerMake CMake and CodeGen simplerAt the moment all generated files, that come out of the CodeGen script need to be manually stated under `OUT_FILES` in the `waLBerla_generate_target_from_python`. Example:
https://i10git.cs.fau.de/walberla/walberla/-/blob/master/apps/ben...At the moment all generated files, that come out of the CodeGen script need to be manually stated under `OUT_FILES` in the `waLBerla_generate_target_from_python`. Example:
https://i10git.cs.fau.de/walberla/walberla/-/blob/master/apps/benchmarks/FlowAroundSphereCodeGen/CMakeLists.txt
This has some disadvantages:
1. It is always a bit clunky because as a user you first provide the names of the `OUT_FILES` as strings to the generation function and then one needs to copy all these names in the CMake file again.
2. It is error-prone in a few fashions. The biggest issue is the correct file ending for GPU support. This was partially solved in !518, however still one needs to think about whether a file is only generated for CPU or can vary depending on the CMake configuration. Second, some generation functions like `generate_info_header` only produce a header file. This user must know this to write a correct CMake file first try ...
3. A third big problem is, that every generation function can only produce a single file. Otherwise, it would be impossible for a user to know which files will come out in the back. Thus this system lacks flexibility as wellMarkus HolzerMarkus Holzerhttps://i10git.cs.fau.de/walberla/walberla/-/issues/196Unification GPU/Device Usage2023-04-04T09:44:48+02:00Markus HolzerUnification GPU/Device UsageThe GPU/CUDA backend of waLBerla is very thin at the moment. In order to integrate the device using the following list of suggestions/observations should function as a guideline.
1. Like `MPI` device functions should get a wrapper aroun...The GPU/CUDA backend of waLBerla is very thin at the moment. In order to integrate the device using the following list of suggestions/observations should function as a guideline.
1. Like `MPI` device functions should get a wrapper around with an "environment" as done with `WALBERLA_MPI_SECTION`. Doing so users would no longer need to work with `if defined` as for example done here: https://i10git.cs.fau.de/walberla/walberla/-/blob/master/apps/benchmarks/FlowAroundSphereCodeGen/FlowAroundSphereCodeGen.cpp
2. The GPUField: https://i10git.cs.fau.de/walberla/walberla/-/blob/master/src/cuda/GPUField.h works differently in the sense that the interface is different (f-Size not a template parameter) but also it is very cumbersome to have explicit GPU data with its explicit `addToStorage` functions. It would be simpler if the `Field` itself could synchronise its data and give back a device of host pointer depending on the situations needed. This goes allong #109 which suggests using midspan for the Field data structure. The `mdspan` has the functionality right away.
3. Some device-specific implementations do not need to be specific. A good example is the communication scheme: https://i10git.cs.fau.de/walberla/walberla/-/blob/master/src/cuda/communication/UniformGPUScheme.h In the end the same `MPI` function is called just with a device pointer and not a host pointer (or even also with a device pointer if GPU direct is not available). Thus there is no need for this parallel world to exist here.
Another good example is the https://i10git.cs.fau.de/walberla/walberla/-/blob/master/src/cuda/communication/CustomMemoryBuffer.h which works basically completely similar to the normal device buffers with a slightly different API.https://i10git.cs.fau.de/walberla/walberla/-/issues/194Sum of cell local overlap fractions can exceed 1 (which it should not) in the...2023-03-23T11:19:54+01:00Samuel KemmlerSum of cell local overlap fractions can exceed 1 (which it should not) in the pe couplingSince pe is deprecated this issue is more for documentation and can be removed as soon as there is a documentation page for known bugs.Since pe is deprecated this issue is more for documentation and can be removed as soon as there is a documentation page for known bugs.https://i10git.cs.fau.de/walberla/walberla/-/issues/192Communication fails when using multiple blocks per process (GPU)2023-05-22T15:23:13+02:00Samuel KemmlerCommunication fails when using multiple blocks per process (GPU)The communication fails (i.e., communicates to the wrong position) when using multiple blocks per process (GPU). This only occurs between the process-local block 0 and process-local block 1.The communication fails (i.e., communicates to the wrong position) when using multiple blocks per process (GPU). This only occurs between the process-local block 0 and process-local block 1.https://i10git.cs.fau.de/walberla/walberla/-/issues/183MPI_ERR_TRUNCATE in WcTimingPool2022-03-24T14:06:22+01:00Daniel BauerMPI_ERR_TRUNCATE in WcTimingPoolI use mesh refinement with a refinement time step.
Additionally, I create two [WcTimingPools](https://i10git.cs.fau.de/walberla/walberla/-/blob/master/src/core/timing/TimingPool.h) and pass them to the refinement time step.
After running...I use mesh refinement with a refinement time step.
Additionally, I create two [WcTimingPools](https://i10git.cs.fau.de/walberla/walberla/-/blob/master/src/core/timing/TimingPool.h) and pass them to the refinement time step.
After running the simulation, I use [`logResultOnRoot()`](https://i10git.cs.fau.de/walberla/walberla/-/blob/master/src/core/timing/TimingPool.cpp#L411) to print the time measurements.
```
auto timingRef = std::make_shared<WcTimingPool>();
auto timingRefLvl = std::make_shared<WcTimingPool>();
⋮
// setup timeloop
refinementTimeStep->enableTiming(timingRef, timingRefLvl);
timeloop->addFuncBeforeTimeStep(makeSharedFunctor(refinementTimeStep), str::refinementTimeStep);
⋮
// run timeloop
auto timing = std::make_shared<WcTimingPool>();
timeloop->run(*timing);
// print timings
timing ->logResultOnRoot();
timingRef ->logResultOnRoot();
timingRefLvl->logResultOnRoot();
```
I run my simulation with 288 processes on 8 nodes and get the following behavior:
The first two timings (`timing` and `timingRef`) print the results as expected.
The level wise timing fails to print with the error:
```
[node098:207482] * An error occurred in MPI_Reduce
[node098:207482] * reported by process [3786932225,192]
[node098:207482] * on communicator MPI_COMM_WORLD
[node098:207482] * MPI_ERR_TRUNCATE: message truncated
[node098:207482] * MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[node098:207482] * and potentially your MPI job)
[node089.cluster:17591] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 1741
[node089.cluster:17591] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 1741
[node089.cluster:17591] 12 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[node089.cluster:17591] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
```
The error must come from [TimingPool.cpp:172-185](https://i10git.cs.fau.de/walberla/walberla/-/blob/master/src/core/timing/TimingPool.cpp#L172-L185).
I use gcc 8.3.0 and openmpi 3.1.5.https://i10git.cs.fau.de/walberla/walberla/-/issues/176Check and validate implementation of Guo force model with TRT/MRT2023-01-27T07:59:16+01:00Christoph SchwarzmeierCheck and validate implementation of Guo force model with TRT/MRTI tested waLBerla's TRT collision model with the compressible D3Q19 velocity set in waLBerla's free surface LBM module (not yet open source), .
When using the `GuoConstant` force model, I noticed two things that are not present when usi...I tested waLBerla's TRT collision model with the compressible D3Q19 velocity set in waLBerla's free surface LBM module (not yet open source), .
When using the `GuoConstant` force model, I noticed two things that are not present when using the `SimpleConstant` force model:
1. the physical results significantly differ from SRT (same even order relaxation rate, arbitrary "Magic" parameter)
2. the physical results change significantly with small changes in TRT's (even order) relaxation rate
These observations should not be related to the free surface LBM implementation and we should therefore
- [x] create a simple test case to validate the `GuoConstant` force model when using the TRT/MRT collision operators (without free surface LBM)
- [x] try to reproduce this issue (without free surface LBM)
- [ ] check if the `GuoConstant` and `GuoField` force models are implemented correctly for TRT/MRT collision operatorsJonas PlewinskiJonas Plewinskihttps://i10git.cs.fau.de/walberla/walberla/-/issues/173Poor user experience with generated DynamicUBB (UBB + additional_data_handler)2023-05-02T09:32:33+02:00Nigel OvermarsPoor user experience with generated DynamicUBB (UBB + additional_data_handler)For my purposes, I need to be able to set a certain inflow profile, for example a variant of a Poiseuille flow where the velocity changes at every time step.
Currently, this is possible when using generated sweeps, but only with a few n...For my purposes, I need to be able to set a certain inflow profile, for example a variant of a Poiseuille flow where the velocity changes at every time step.
Currently, this is possible when using generated sweeps, but only with a few nasty hacks, for both making it temporally varying and spatially varying. To be able to use a DynamicUBB, which was generated via
```
ubb_dynamic = UBB(lambda *args: None, dim=stencil.D)
ubb_data_handler = UBBAdditionalDataHandler(stencil, ubb_dynamic)
# UBB with user-defined velocity profile
generate_boundary(ctx, "DynamicUBB", ubb_dynamic, lbm_method,
additional_data_handler=ubb_data_handler, target=target)
```
one needs to pass a `std::function< Vector3< real_t >(const Cell&, const shared_ptr< StructuredBlockForest >&, IBlock&)` to the constructor of the DynamicUBB type. The easiest way to make one is via a functor, for which a template looks like this:
```
class InflowProfile
{
public:
Vector3< real_t > operator()( const Cell& pos, const shared_ptr< StructuredBlockForest >& SbF, IBlock& block ) {
// return velocity vector depending on the cell location in the SbF
return getVelocityVector(pos, SbF, block);
}
};
```
**Temporally varying inflow profile**
The problem is that, in the current version of waLBerla/lbmpy, the additional data handler is only being called once before the running of the simulation when the generated member function
```
template<typename FlagField_T>
void DynamicUBB::fillFromFlagField( const shared_ptr<StructuredBlockForest> & blocks, ConstBlockDataID flagFieldID,
FlagUID boundaryFlagUID, FlagUID domainFlagUID)
```
is being called. After these values have been set, there is currently no easy way to update them at every time step. As a workaround, one can add the aforementioned member function as a `addFuncAfterTimeStep` to the SweepTimeLoop and add a `TimeTracker` object to the functor, which allows to update the values depending on the current time step. This approach does appear to incur a performance penalty, as the member function `DynamicUBB::fillFromFlagField` appears to do some work which should be unnecessary after an initial run of it. I think, but haven't tested it, that this also makes running on the GPU not possible or significantly slowed down.
**Spatially varying inflow profile**
Currently, the cells being passed to the functors `operator()` are different from the ones I expected to get. The problem is in the following generated code:
```
// for every cell in the block
if ( isFlagSet( it.neighbor(1, 0, 0 , 0 ), boundaryFlag ) )
{
auto element = IndexInfo(it.x(), it.y(), it.z(), 0 );
Vector3<real_t> InitialisatonAdditionalData = elementInitaliser(Cell(it.x(), it.y(), it.z()), blocks, *block);
element.vel_0 = InitialisatonAdditionalData[0];
element.vel_1 = InitialisatonAdditionalData[1];
element.vel_2 = InitialisatonAdditionalData[2];
// snip
}
// Similar code for all directions, e.g. 27 in total for a D3Q27 stencil
```
Where the function `elementInitialiser` is the our functors `operator()`.
In essence, what it being checked if the current cell has a neighbor, in the case of a inflow boundary, for which the inflowFlag is set. If that is the case, apply the functor to this cell, meaning the cell of which a neighboring cell has the inflowFlag set, not the cell with the inflowFlag self, and store the results (i.e. the computed velocity vector) in the appropriate location.
Currently, I am working around this problem by manually adding the direction to the cell on which the functor is applied. Unfortunately, due to the generated nature of the code, this is not a real solution...
In my use case, certain noise values are generated for every cell for every time step in the inflow boundary via a Python program which are then being read from a .json file. These values are then put into a `std::unordered_map<GlobalCellCoordinatesTuple, Vector3>` and then read during the boundary treatment. Without the aforementioned fix there is a miss match between the cell coordinates I put in the `std::unordered_map<..,..>` (the ones for which the inflowFlag is being set) and the cell coordinates that get passed to the functor, resulting in the wrong behavior.
**Suggestions**
- Allow the user to specify if they want the boundary values to be updated at every time step. Then the user would just have to add a `TimeTracker` to the functor and then can update the values in an easy manner.
- Allow the user to select if they want the cells passed to the functors `operator()` to be 1) the neighboring cells as is done right now, or 2) they are the actual cells for which the flag which is checked is set.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/walberla/walberla/-/issues/170Usability CodeGen boundaries2023-04-05T13:47:53+02:00Markus HolzerUsability CodeGen boundariesEspecially when a boundary with additional data is generated it is rather hard to understand how this should be done in the python script.
Something like a wrapper around the boundaries would maybe improve the situation. For example, th...Especially when a boundary with additional data is generated it is rather hard to understand how this should be done in the python script.
Something like a wrapper around the boundaries would maybe improve the situation. For example, the UBB can be generated in this way:
```python
ubb_dynamic = UBB(lambda *args: None, dim=stencil.D)
ubb_data_handler = UBBAdditionalDataHandler(stencil, ubb_dynamic)
```
However, the empty callback with the lambda and the `UBBAdditionalDataHandler` are cryptic and there is absolutely no other choice than shown above.
Thus a thin wrapper like `DynamicUBB(stencil=stencil)` could improve this situation quite a lot.
The problem why this has to be written in such a cryptic way in the first place is that lbmpy has to function as a standalone framework besides waLBerla. Thus certain design decisions are motivated out of a python world and contradict with the codegen for waLBerla.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/walberla/walberla/-/issues/163Usage of NCCL2021-10-08T09:35:18+02:00Markus HolzerUsage of NCCLNVIDIA NCCL provides a Collective Communication Library. This could give a performance boost for multi GPU computations and be better than Cuda-Aware MPI.
https://developer.nvidia.com/nccl
https://docs.nvidia.com/deeplearning/nccl/insta...NVIDIA NCCL provides a Collective Communication Library. This could give a performance boost for multi GPU computations and be better than Cuda-Aware MPI.
https://developer.nvidia.com/nccl
https://docs.nvidia.com/deeplearning/nccl/install-guide/index.htmlMarkus HolzerMarkus Holzerhttps://i10git.cs.fau.de/walberla/walberla/-/issues/161Documentation of Python setup workflow2021-09-30T10:44:25+02:00Helen SchottenhammlDocumentation of Python setup workflowTo help the standard user to set up their Python environment properly for the usage with pystencils/ lbmpy, a documentation in the README would be helpful.To help the standard user to set up their Python environment properly for the usage with pystencils/ lbmpy, a documentation in the README would be helpful.https://i10git.cs.fau.de/walberla/walberla/-/issues/147Don't call resetForceAndTorque inside DEM2021-05-20T15:09:56+02:00Michael Kuronmkuron@icp.uni-stuttgart.deDon't call resetForceAndTorque inside DEMThe DEM solver contains
```
// Resetting the acting forces
bodyIt->resetForceAndTorque();
[...]
// Resetting the acting forces
bodyIt->resetForceAndTorque();
```
which I don't think should be there, and it definitely shouldn't...The DEM solver contains
```
// Resetting the acting forces
bodyIt->resetForceAndTorque();
[...]
// Resetting the acting forces
bodyIt->resetForceAndTorque();
```
which I don't think should be there, and it definitely shouldn't reset the force twice. HCSITS does not reset the force itself, that's what ForceTorqueOnBodiesResetter is for.Christoph RettingerChristoph Rettingerhttps://i10git.cs.fau.de/walberla/walberla/-/issues/145reduceOverParticles2021-03-30T10:54:04+02:00Sebastian EiblreduceOverParticles```
auto kinEnergy = ps.reduceOverParticles(.., SUM, [&](auto p_idx)
{return 0.5_r * ac.getMass(p_idx) * ac.getLinearVelocity(p_idx) * ac.getLinearVelocity(p_idx);});
``````
auto kinEnergy = ps.reduceOverParticles(.., SUM, [&](auto p_idx)
{return 0.5_r * ac.getMass(p_idx) * ac.getLinearVelocity(p_idx) * ac.getLinearVelocity(p_idx);});
```https://i10git.cs.fau.de/walberla/walberla/-/issues/130Improve pe Union2023-05-02T12:18:33+02:00Michael Kuronmkuron@icp.uni-stuttgart.deImprove pe UnionCurrent restrictions that I noticed:
- [ ] A union of two overlapping bodies has an incorrect volume, mass and inertia tensor. Overlap should either be checked for and warned about, or the overlap volume should be estimated numerically ...Current restrictions that I noticed:
- [ ] A union of two overlapping bodies has an incorrect volume, mass and inertia tensor. Overlap should either be checked for and warned about, or the overlap volume should be estimated numerically and mass and inertia be corrected accordingly.
- [ ] Cryptic errors happen when a union is created from bodies spread across a block boundary.
Further restrictions that @eibl mentioned to me:
- [ ] Dynamic creation and splitting of unions in parallel is problematic.
- [ ] Unclear collision handling because concave objects can collide in multiple points simultaneously.
Regarding the last point: I guess you can even have simultaneous collisions without unions. E.g. cube-cube and cube-plane (face to face), cube-cylinder, plane-cylinder and cyclinder-cylinder (face to side or face to face), or sphere-torus (moving the sphere through the center of the torus).https://i10git.cs.fau.de/walberla/walberla/-/issues/126Investige usage of CUDA graphs to reduce kernel call overhead2020-07-26T21:38:21+02:00Stephan SeitzInvestige usage of CUDA graphs to reduce kernel call overheadMarco had the idea to use this in the CUDA backend for Petalisp. But maybe it's also useful in waLBerla timeloops.
https://developer.nvidia.com/blog/cuda-graphs/
It effectively tries to reduce call overhead of kernels when you repeated...Marco had the idea to use this in the CUDA backend for Petalisp. But maybe it's also useful in waLBerla timeloops.
https://developer.nvidia.com/blog/cuda-graphs/
It effectively tries to reduce call overhead of kernels when you repeatedly call the same group of kernels. Maybe not as relevant, since bottleneck should be communication not managing the CUDA kernels and our kernels are not exactly "short-running"-https://i10git.cs.fau.de/walberla/walberla/-/issues/109Use mdspan instead of Field for portability of codegen beyond Walberla2022-09-14T12:12:39+02:00Michael Kuronmkuron@icp.uni-stuttgart.deUse mdspan instead of Field for portability of codegen beyond Walberla@bauer pointed out that C++23 wil have `std::mdspan`, a non-owning multidimensional array view. A reference implementation that works with C++11 and higher is already available: https://github.com/kokkos/mdspan.
This could be used at th...@bauer pointed out that C++23 wil have `std::mdspan`, a non-owning multidimensional array view. A reference implementation that works with C++11 and higher is already available: https://github.com/kokkos/mdspan.
This could be used at the interface between Walberla and pystencils‘ generated code and would allow pystencils_walberla to be used with other (non-Walberla) software. Right now, that interface uses `walberla::field::Field`.https://i10git.cs.fau.de/walberla/walberla/-/issues/108Make Codegen Safer2020-03-18T08:53:11+01:00Christoph RettingerMake Codegen SaferBy introducing more and more (lbmpy/pystencils but also mesa_pd) codegen stuff in waLBerla, the problem arises that changes made in lbmy or pystencils might ultimately also affect the outcome of the codegen procedure, i.e. the generated ...By introducing more and more (lbmpy/pystencils but also mesa_pd) codegen stuff in waLBerla, the problem arises that changes made in lbmy or pystencils might ultimately also affect the outcome of the codegen procedure, i.e. the generated kernel/lattice model.
The problem is not only that these changes are made in other repositories, but also that changes in some files there can not easily be linked to the functionality that is used in the codegen scripts, i.e. it is not intuitively clear by looking at commit changes that this might affect my generated kernel.
One solution could be, that if I want to make sure that a certain kernel does exactly what I have in mind, I have to define a certain input (e.g. 27 PDFs in a single cell) and compute the output with my kernel, for which I know it is correct.
This pair of input and output is then put inside the codegen script file and every time I generate a kernel, it is checked that the output of the newly generated kernel matches the formerly given one.
In some cases, this could also be done symbolically, but the issue arises that different optimizations might lead to a different symbolic representation, even though the outcome of the actual computation sis till the same.
So it would be great to introduce a checkKernel(kernel, input, output) function, to e.g. lbmpy_walberla. Then, initial (and correct) input-output-pairs for the currently existing codegen things in waLBerla have to be defined. And then the check function is used in all codegen scripts to assert that the behavior is still the same. The input data set should be complex enough to cover most cases (e.g. not all zeros), maybe even random.https://i10git.cs.fau.de/walberla/walberla/-/issues/25walberla::geometry::getClosestLineBoxPoints produces wrong results2021-03-29T18:42:28+02:00Tobias Leemannwalberla::geometry::getClosestLineBoxPoints produces wrong resultsFor some combinations of Capsules and Boxes the analytical collision detection of PE seems to yield wrong results.
This is probably caused by a bug in the function `walberla::geometry::getClosestLineBoxPoints`.
Code Example to reprodu...For some combinations of Capsules and Boxes the analytical collision detection of PE seems to yield wrong results.
This is probably caused by a bug in the function `walberla::geometry::getClosestLineBoxPoints`.
Code Example to reproduce:
The Capsule has radius 3, length 6 and is not rotated and translated, so its center line runs from (-3,0,0) to (3,0,0).
The Box has size (3, 1.5, 1.5) and is somehow rotated and translated.
```
Vec3 box_pos(real_t(-2.46404), real_t(2.90053), real_t(-2.177));
Quat box_rot(real_t(0.712691),real_t(-0.509519),real_t(0.0466145), real_t(0.479884));
//Corresponding Bodies
Box box(0, 0, box_pos, Vec3(0,0,0), box_rot, Vec3(3, real_t(1.5), real_t(1.5)), iron, false, true, false);
Capsule cap(1, 1, Vec3(0,0,0), Vec3(0,0,0), Quat(), real_t(3.0), real_t(6.0), iron, false, true, false);
//This is how getClosestLineBoxPoints is called in the collide-function:
Vec3 line_point, box_point;
walberla::geometry::getClosestLineBoxPoints( Vec3(3,0,0), Vec3(-3,0,0), box_pos, box_rot.toRotationMatrix(),
Vec3(real_t(3), real_t(1.5), real_t(1.5)), line_point, box_point);
std::cerr << "Result: Line-Point " << line_point ", Box-Point " << box_point << std::endl;
```
This results in the following output:
`Result: Line-Point <3,0,0>, Box-Point <-1.66899,2.22324,-1.94998>`
This result is wrong, because if you rotate and translate the Corner (-1.5,-0.75,-0.75)(in box local coordinates) e.g. by
`std::cerr << "Corner Position of Box: " << box_pos+box_rot.rotate(Vec3(real_t(-1.5), real_t(-0.75), real_t(-0.75))) << std::endl;`
you obtain the point (-2.40108,1.35235,-1.18999) which is the nearest point of the box to the line. The nearest point on the line would then be (-2.40108, 0, 0).
The return of the incorrect points in this case results in a failure to detect the collision between the Box and the Capsule (which has a large penetration depth of about 1.1986).https://i10git.cs.fau.de/walberla/walberla/-/issues/14field::refinement::PackInfo should permit specifying number of ghost layers t...2019-11-07T14:23:10+01:00Michael Kuronmkuron@icp.uni-stuttgart.defield::refinement::PackInfo should permit specifying number of ghost layers to communicateFor uniform grids, `field::communication::PackInfo`'s constructor has an optional argument `numberOfGhostLayers` that specifies how many ghost layers should be exchanged. If it is not given, then all ghost layers are exchanged.
For re...For uniform grids, `field::communication::PackInfo`'s constructor has an optional argument `numberOfGhostLayers` that specifies how many ghost layers should be exchanged. If it is not given, then all ghost layers are exchanged.
For refined grids, `field::refinement::PackInfo`, having otherwise identical functionality to its non-refined counterpart, does not permit specifying the number of ghost layers to exchange. In fact, it does not even exchange all ghost layers, but only communicates one layer of the coarse field to two layers of the fine field. Functionality to specify the number of ghost layers should be added. For consistency with non-refined simulations, the number of ghost layers should be specified from the perspective of the coarse field, so if `2` is given, 2 ghost layers of the coarse grid should exchange data with 4 ghost layers of the fine grid.https://i10git.cs.fau.de/walberla/walberla/-/issues/13Make the timeloop more powerful. lbm::refinement::TimeStep duplicates much of...2019-11-07T14:23:05+01:00Michael Kuronmkuron@icp.uni-stuttgart.deMake the timeloop more powerful. lbm::refinement::TimeStep duplicates much of the functionality of timeloop::SweepTimeloopFor refined simulations, one needs to use `lbm::refinement::TimeStep`, which duplicates much of the functionality of the regular `timeloop::SweepTimeloop`. The `lbm::refinement::TimeStep` is then attached to a `timeloop::SweepTimeloop`. ...For refined simulations, one needs to use `lbm::refinement::TimeStep`, which duplicates much of the functionality of the regular `timeloop::SweepTimeloop`. The `lbm::refinement::TimeStep` is then attached to a `timeloop::SweepTimeloop`. This is not very elegant and the two should be merged so that one does not end up with multiple `timing::TimingPool` objects, gets the same amount of debug logging, etc. Further problems include that `lbm::refinement::TimeStep` does not provide callbacks in all desirable places (I would for example need one each after `(startCommunication|wait)(CoarseToFine|FineToCoarse|EqualLevel)` ).
Perhaps `lbm::refinement::TimeStep` can be replaced with a set of helper functions that add the appropriate functors to an (enhanced) `timeloop::SweepTimeloop`.
Additionally, `timeloop::SweepTimeloop` should itself conform to the concept of `timeloop::Timeloop::SelectableFunc` so that timeloops can be nested. This would be useful for things that have internal iterations, like the PDE solvers.Christoph RettingerChristoph Rettinger