waLBerla issueshttps://i10git.cs.fau.de/walberla/walberla/-/issues2024-01-29T12:17:02+01:00https://i10git.cs.fau.de/walberla/walberla/-/issues/239Dynamic load balancing: Refresh function seems to not communicate the flag fi...2024-01-29T12:17:02+01:00Philipp SuffaDynamic load balancing: Refresh function seems to not communicate the flag field correctlyWhen using the blockforest refresh function for dynamic load balancing, it seems not to communicate the "uidToFlag" map of the flag Field. So the flags in the flag field are still set correctly but the connection to the FlagUIDs seems to...When using the blockforest refresh function for dynamic load balancing, it seems not to communicate the "uidToFlag" map of the flag Field. So the flags in the flag field are still set correctly but the connection to the FlagUIDs seems to be lost.Philipp SuffaPhilipp Suffahttps://i10git.cs.fau.de/walberla/walberla/-/issues/238import waLBerla hangs after installation2024-02-12T15:19:48+01:00Pedro Santos Nevesimport waLBerla hangs after installationHi waLBerla developers and contributors!
With other colleagues in the EESSI and MultiXscale projects, we are trying to build and deploy optmized waLBerla v6.1 installations and ran into an issue when building it with two specific toolch...Hi waLBerla developers and contributors!
With other colleagues in the EESSI and MultiXscale projects, we are trying to build and deploy optmized waLBerla v6.1 installations and ran into an issue when building it with two specific toolchains that we'd like to report and hopefully get your input on.
A summary of the issue:
We are building waLBerla through EasyBuild using the [`foss2022b`](https://github.com/easybuilders/easybuild-easyconfigs/pull/19324) and [`foss2023a`](https://github.com/easybuilders/easybuild-easyconfigs/pull/19252) toolchains with two identical easyconfig files. With either toolchain the installation proceeds until the sanity check step which simply runs `python -c import waLBerla`, upon which the system hangs. We see this happen [on the EasyBuild test clusters](https://github.com/easybuilders/easybuild-easyconfigs/pull/19252#issuecomment-1820653972) but not on our personal laptops or in the HPC at the University of Groningen.
We tried to change the sanity check to `mpirun -np 1 python -c "import waLBerla"` in the chance that the issue was with the test cluster's environment, but the same hang occurs.
One successful workaround is to set `UCX_LOG_LEVEL=info` in the sanity check so that it reads `UCX_LOG_LEVEL=info python -c "import waLBerla"`. We don't know why changing the log level of `UCX` resolves this problem, and my colleague who discovered this has also opened a ticket about it in the `UCX` repo [here](https://github.com/openucx/ucx/issues/9532).
Another workaround seems to be importing `mpi4py` before waLBerla. This is surprising, because `mpi4py` is not a dependency of waLBerla. We would rather not add `mpi4py` as a dependency for this issue, especially without knowing the consequences of this.
Given that we were only seeing this problem in the EasyBuild test clusters and not in other systems, and also the fact that the `UCX` workaround seems to work for the [EESSI test clusters](https://github.com/EESSI/software-layer/pull/421), we assumed `import waLBerla` was likely hanging due to some quirk of the EasyBuild test clusters. However, we received a [report ](https://github.com/easybuilders/easybuild-easyconfigs/pull/19324/#issuecomment-1857832565) from another EasyBuild maintainer with a notice of this problem in another system. Because of this, we are now not convinced that whatever is causing this has to do with the EasyBuild clusters and their environment.
We have a [summary](https://gitlab.com/eessi/support/-/issues/20) of our attempts in our support portal, where you can find more details.
Would you have any idea of what could be causing this, or have you perhaps encountered something similar in the past? We'd love your input as we're quite confused about this problem. Thanks in advance!https://i10git.cs.fau.de/walberla/walberla/-/issues/237clang-tidy used to ignore .h files2023-11-09T14:59:33+01:00Dominik Thoennesdominik.thoennes@fau.declang-tidy used to ignore .h filesI think that the clang-tidy script ignored `.h` files in the past.
This means that these files were not checked and there are lots of warnings after the update.
I disabled the job in the pipeline for now.
https://i10git.cs.fau.de/walberl...I think that the clang-tidy script ignored `.h` files in the past.
This means that these files were not checked and there are lots of warnings after the update.
I disabled the job in the pipeline for now.
https://i10git.cs.fau.de/walberla/walberla/-/jobs/1135823https://i10git.cs.fau.de/walberla/walberla/-/issues/212Define FlagUIDs in the BoundaryCollection for reuse in App2023-06-21T11:38:45+02:00Philipp SuffaDefine FlagUIDs in the BoundaryCollection for reuse in AppIt could be useful to define the FlagUIDs, which are first set in the generation file, in the BoundaryCollection, so that they can be further used in the application file.
So if one defined a UBB, NoSlip and FixedDensity boundary in the...It could be useful to define the FlagUIDs, which are first set in the generation file, in the BoundaryCollection, so that they can be further used in the application file.
So if one defined a UBB, NoSlip and FixedDensity boundary in the generation file, the BoundaryCollection could look like:
```
namespace walberla{
namespace lbm {
const FlagUID noSlipFlagUID("NoSlip");
const FlagUID UBBFlagUID("UBB");
const FlagUID FixedDensityFlagUID("FixedDensity");
class PSMBoundaryCollection
{
....
```Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/walberla/walberla/-/issues/209Code Quality Days 2.5. + 3.5.2023-05-02T09:36:09+02:00Dominik Thoennesdominik.thoennes@fau.deCode Quality Days 2.5. + 3.5.Hi,
this issue is intended to provide an overview of the open issues we could work on during the code quality days.
The current plan is to have this for two days.
Please feel free to add more information.
Poll for the date:
https://ter...Hi,
this issue is intended to provide an overview of the open issues we could work on during the code quality days.
The current plan is to have this for two days.
Please feel free to add more information.
Poll for the date:
https://terminplaner6.dfn.de/en/p/6683ec971fb22b9928e1ff6d3ae6b412-196072
| topic | comment/explanation | related issues |
| --- | --- | --- |
|Cleanup | Check very old issues (> 2 years) to see if these are still relevant | |
| Remove Boost | | #190 |
| fix metis/parmetis integration | | #195 |
| improve logging | | #178 |
| Unify Communication | CPU and GPU communication schemes are vastly similar | #196 |
| Better GPU integration | GPU usage should be integrated like MPI for example | !565 |
| Use SoA by default | Although SoA is mostly better it is not the default in waLBerla | #182 |
| Boundaries | Different topics on boundaries | #203 #173 #170 #3 |
https://docs.google.com/spreadsheets/d/1chiE5PCNcuokjp7Q3MyClaKQlDO2GqmaljPfJlIhH20/edit#gid=0https://i10git.cs.fau.de/walberla/walberla/-/issues/205Juwels-Booster: selectDeviceBasedOnMpiRank() presumably assigns the wrong hos...2023-04-05T13:49:07+02:00Philipp SuffaJuwels-Booster: selectDeviceBasedOnMpiRank() presumably assigns the wrong host memory to the deviceThe function selectDeviceBasedOnMpiRank() seems to assign all devices (GPUs) on a node to the same host memory.
This could cause performance issues for CPU to GPU communication (cudaMemcopy), because GPUs are not communicating to their c...The function selectDeviceBasedOnMpiRank() seems to assign all devices (GPUs) on a node to the same host memory.
This could cause performance issues for CPU to GPU communication (cudaMemcopy), because GPUs are not communicating to their closest host memory.
So if you allocate 4 GPUs on a node and call the program with 4 MPI processes, all 4 GPUs are assigned to the same MPI process memory (of process 1) by cudaSetDevice().
This is the case, because the function gpuGetDeviceCount() returns 1 instead of 4 devices (GPUs).
This behavior is only tested for juwels-booster so far, further investigation is needed...https://i10git.cs.fau.de/walberla/walberla/-/issues/204Is there a LBM grid refinement example on the GPU?2023-09-27T21:54:03+02:00ahmedIs there a LBM grid refinement example on the GPU?Thanks a lot for making this great library open-source with such high-quality code!
I was wondering if there's a 3D LBM grid refinement refinement that runs on the GPU. I saw a a LBM grid refinement example under `apps\benchmarks\Adapti...Thanks a lot for making this great library open-source with such high-quality code!
I was wondering if there's a 3D LBM grid refinement refinement that runs on the GPU. I saw a a LBM grid refinement example under `apps\benchmarks\AdaptiveMeshRefinementFluidParticleCoupling` but I think it only runs in parallel on the CPU (correct me if I'm wrong). What I'm trying to find is running LBM on a non-uniform static mesh that doesn't change over time i.e., not AMR.https://i10git.cs.fau.de/walberla/walberla/-/issues/200GCC12: EquationSystem warning when using `DebugOptimized` and `OPTIMIZIE_FOR_...2022-12-24T14:00:29+01:00Dominik Thoennesdominik.thoennes@fau.deGCC12: EquationSystem warning when using `DebugOptimized` and `OPTIMIZIE_FOR_LOCALHOST`The EquationSystem.cpp emmits a `maybe-uninitialized` warning when build with `DebugOptimized` and `WALBERLA_OPTIMIZIE_FOR_LOCALHOST` on gcc12
```
Building CXX object src/core/CMakeFiles/core.dir/math/equation_system/EquationSystem.cpp....The EquationSystem.cpp emmits a `maybe-uninitialized` warning when build with `DebugOptimized` and `WALBERLA_OPTIMIZIE_FOR_LOCALHOST` on gcc12
```
Building CXX object src/core/CMakeFiles/core.dir/math/equation_system/EquationSystem.cpp.o
cd /build/src/core && /usr/local/bin/ccache g++ -DBOOST_ALL_NO_LIB -I/build/src -I/walberla/src -isystem /opt/boost/include -isystem /usr/lib/x86_64-linux-gnu/openmpi/include/openmpi -isystem /usr/lib/x86_64-linux-gnu/openmpi/include -isystem /opt/openmesh/include -Wall -Wconversion -Wshadow -march=native -Wfloat-equal -Wextra -pedantic -D_GLIBCXX_USE_CXX11_ABI=1 -pthread -g -O3 -std=c++17 -o CMakeFiles/core.dir/math/equation_system/EquationSystem.cpp.o -c /walberla/src/core/math/equation_system/EquationSystem.cpp
In file included from /opt/boost/include/boost/graph/depth_first_search.hpp:21,
from /opt/boost/include/boost/graph/max_cardinality_matching.hpp:21,
from /walberla/src/core/math/equation_system/EquationSystem.cpp:39:
In constructor 'boost::bgl_named_params<T, Tag, Base>::bgl_named_params(T, const Base&) [with T = boost::vec_adj_list_vertex_id_map<boost::no_property, long unsigned int>; Tag = boost::vertex_index_t; Base = boost::bgl_named_params<boost::detail::odd_components_counter<long unsigned int>, boost::graph_visitor_t, boost::no_property>]',
inlined from 'boost::bgl_named_params<PType, boost::vertex_index_t, boost::bgl_named_params<T, Tag, Base> > boost::bgl_named_params<T, Tag, Base>::vertex_index_map(const PType&) const [with PType = boost::vec_adj_list_vertex_id_map<boost::no_property, long unsigned int>; T = boost::detail::odd_components_counter<long unsigned int>; Tag = boost::graph_visitor_t; Base = boost::no_property]' at /opt/boost/include/boost/graph/named_function_params.hpp:217:5,
inlined from 'static bool boost::maximum_cardinality_matching_verifier<Graph, MateMap, VertexIndexMap>::verify_matching(const Graph&, MateMap, VertexIndexMap) [with Graph = boost::adjacency_list<boost::vecS, boost::vecS, boost::undirectedS>; MateMap = long unsigned int*; VertexIndexMap = boost::vec_adj_list_vertex_id_map<boost::no_property, long unsigned int>]' at /opt/boost/include/boost/graph/max_cardinality_matching.hpp:779:61,
inlined from 'bool boost::matching(const Graph&, MateMap, VertexIndexMap) [with Graph = adjacency_list<vecS, vecS, undirectedS>; MateMap = long unsigned int*; VertexIndexMap = vec_adj_list_vertex_id_map<no_property, long unsigned int>; AugmentingPathFinder = edmonds_augmenting_path_finder; InitialMatchingFinder = extra_greedy_matching; MatchingVerifier = maximum_cardinality_matching_verifier]' at /opt/boost/include/boost/graph/max_cardinality_matching.hpp:807:79,
inlined from 'bool boost::checked_edmonds_maximum_cardinality_matching(const Graph&, MateMap, VertexIndexMap) [with Graph = adjacency_list<vecS, vecS, undirectedS>; MateMap = long unsigned int*; VertexIndexMap = vec_adj_list_vertex_id_map<no_property, long unsigned int>]' at /opt/boost/include/boost/graph/max_cardinality_matching.hpp:817:48,
inlined from 'bool boost::checked_edmonds_maximum_cardinality_matching(const Graph&, MateMap) [with Graph = adjacency_list<vecS, vecS, undirectedS>; MateMap = long unsigned int*]' at /opt/boost/include/boost/graph/max_cardinality_matching.hpp:824:56,
inlined from 'void walberla::math::EquationSystem::match()' at /walberla/src/core/math/equation_system/EquationSystem.cpp:97:4:
/opt/boost/include/boost/graph/named_function_params.hpp:192:56: warning: '*(unsigned char*)((char*)&occ + offsetof(boost::detail::odd_components_counter<long unsigned int>,boost::detail::odd_components_counter<long unsigned int>::m_parity))' may be used uninitialized [-Wmaybe-uninitialized]
192 | bgl_named_params(T v, const Base& b) : m_value(v), m_base(b) {}
| ^~~~~~~~~
/opt/boost/include/boost/graph/max_cardinality_matching.hpp: In member function 'void walberla::math::EquationSystem::match()':
/opt/boost/include/boost/graph/max_cardinality_matching.hpp:778:52: note: '*(unsigned char*)((char*)&occ + offsetof(boost::detail::odd_components_counter<long unsigned int>,boost::detail::odd_components_counter<long unsigned int>::m_parity))' was declared here
778 | detail::odd_components_counter< v_size_t > occ(num_odd_components);
|
```https://i10git.cs.fau.de/walberla/walberla/-/issues/199Make CMake and CodeGen simpler2022-11-29T13:03:44+01:00Markus HolzerMake CMake and CodeGen simplerAt the moment all generated files, that come out of the CodeGen script need to be manually stated under `OUT_FILES` in the `waLBerla_generate_target_from_python`. Example:
https://i10git.cs.fau.de/walberla/walberla/-/blob/master/apps/ben...At the moment all generated files, that come out of the CodeGen script need to be manually stated under `OUT_FILES` in the `waLBerla_generate_target_from_python`. Example:
https://i10git.cs.fau.de/walberla/walberla/-/blob/master/apps/benchmarks/FlowAroundSphereCodeGen/CMakeLists.txt
This has some disadvantages:
1. It is always a bit clunky because as a user you first provide the names of the `OUT_FILES` as strings to the generation function and then one needs to copy all these names in the CMake file again.
2. It is error-prone in a few fashions. The biggest issue is the correct file ending for GPU support. This was partially solved in !518, however still one needs to think about whether a file is only generated for CPU or can vary depending on the CMake configuration. Second, some generation functions like `generate_info_header` only produce a header file. This user must know this to write a correct CMake file first try ...
3. A third big problem is, that every generation function can only produce a single file. Otherwise, it would be impossible for a user to know which files will come out in the back. Thus this system lacks flexibility as wellMarkus HolzerMarkus Holzerhttps://i10git.cs.fau.de/walberla/walberla/-/issues/196Unification GPU/Device Usage2023-04-04T09:44:48+02:00Markus HolzerUnification GPU/Device UsageThe GPU/CUDA backend of waLBerla is very thin at the moment. In order to integrate the device using the following list of suggestions/observations should function as a guideline.
1. Like `MPI` device functions should get a wrapper aroun...The GPU/CUDA backend of waLBerla is very thin at the moment. In order to integrate the device using the following list of suggestions/observations should function as a guideline.
1. Like `MPI` device functions should get a wrapper around with an "environment" as done with `WALBERLA_MPI_SECTION`. Doing so users would no longer need to work with `if defined` as for example done here: https://i10git.cs.fau.de/walberla/walberla/-/blob/master/apps/benchmarks/FlowAroundSphereCodeGen/FlowAroundSphereCodeGen.cpp
2. The GPUField: https://i10git.cs.fau.de/walberla/walberla/-/blob/master/src/cuda/GPUField.h works differently in the sense that the interface is different (f-Size not a template parameter) but also it is very cumbersome to have explicit GPU data with its explicit `addToStorage` functions. It would be simpler if the `Field` itself could synchronise its data and give back a device of host pointer depending on the situations needed. This goes allong #109 which suggests using midspan for the Field data structure. The `mdspan` has the functionality right away.
3. Some device-specific implementations do not need to be specific. A good example is the communication scheme: https://i10git.cs.fau.de/walberla/walberla/-/blob/master/src/cuda/communication/UniformGPUScheme.h In the end the same `MPI` function is called just with a device pointer and not a host pointer (or even also with a device pointer if GPU direct is not available). Thus there is no need for this parallel world to exist here.
Another good example is the https://i10git.cs.fau.de/walberla/walberla/-/blob/master/src/cuda/communication/CustomMemoryBuffer.h which works basically completely similar to the normal device buffers with a slightly different API.https://i10git.cs.fau.de/walberla/walberla/-/issues/194Sum of cell local overlap fractions can exceed 1 (which it should not) in the...2023-03-23T11:19:54+01:00Samuel KemmlerSum of cell local overlap fractions can exceed 1 (which it should not) in the pe couplingSince pe is deprecated this issue is more for documentation and can be removed as soon as there is a documentation page for known bugs.Since pe is deprecated this issue is more for documentation and can be removed as soon as there is a documentation page for known bugs.https://i10git.cs.fau.de/walberla/walberla/-/issues/192Communication fails when using multiple blocks per process (GPU)2023-05-22T15:23:13+02:00Samuel KemmlerCommunication fails when using multiple blocks per process (GPU)The communication fails (i.e., communicates to the wrong position) when using multiple blocks per process (GPU). This only occurs between the process-local block 0 and process-local block 1.The communication fails (i.e., communicates to the wrong position) when using multiple blocks per process (GPU). This only occurs between the process-local block 0 and process-local block 1.https://i10git.cs.fau.de/walberla/walberla/-/issues/183MPI_ERR_TRUNCATE in WcTimingPool2022-03-24T14:06:22+01:00Daniel BauerMPI_ERR_TRUNCATE in WcTimingPoolI use mesh refinement with a refinement time step.
Additionally, I create two [WcTimingPools](https://i10git.cs.fau.de/walberla/walberla/-/blob/master/src/core/timing/TimingPool.h) and pass them to the refinement time step.
After running...I use mesh refinement with a refinement time step.
Additionally, I create two [WcTimingPools](https://i10git.cs.fau.de/walberla/walberla/-/blob/master/src/core/timing/TimingPool.h) and pass them to the refinement time step.
After running the simulation, I use [`logResultOnRoot()`](https://i10git.cs.fau.de/walberla/walberla/-/blob/master/src/core/timing/TimingPool.cpp#L411) to print the time measurements.
```
auto timingRef = std::make_shared<WcTimingPool>();
auto timingRefLvl = std::make_shared<WcTimingPool>();
⋮
// setup timeloop
refinementTimeStep->enableTiming(timingRef, timingRefLvl);
timeloop->addFuncBeforeTimeStep(makeSharedFunctor(refinementTimeStep), str::refinementTimeStep);
⋮
// run timeloop
auto timing = std::make_shared<WcTimingPool>();
timeloop->run(*timing);
// print timings
timing ->logResultOnRoot();
timingRef ->logResultOnRoot();
timingRefLvl->logResultOnRoot();
```
I run my simulation with 288 processes on 8 nodes and get the following behavior:
The first two timings (`timing` and `timingRef`) print the results as expected.
The level wise timing fails to print with the error:
```
[node098:207482] * An error occurred in MPI_Reduce
[node098:207482] * reported by process [3786932225,192]
[node098:207482] * on communicator MPI_COMM_WORLD
[node098:207482] * MPI_ERR_TRUNCATE: message truncated
[node098:207482] * MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[node098:207482] * and potentially your MPI job)
[node089.cluster:17591] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 1741
[node089.cluster:17591] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 1741
[node089.cluster:17591] 12 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[node089.cluster:17591] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
```
The error must come from [TimingPool.cpp:172-185](https://i10git.cs.fau.de/walberla/walberla/-/blob/master/src/core/timing/TimingPool.cpp#L172-L185).
I use gcc 8.3.0 and openmpi 3.1.5.https://i10git.cs.fau.de/walberla/walberla/-/issues/176Check and validate implementation of Guo force model with TRT/MRT2023-01-27T07:59:16+01:00Christoph SchwarzmeierCheck and validate implementation of Guo force model with TRT/MRTI tested waLBerla's TRT collision model with the compressible D3Q19 velocity set in waLBerla's free surface LBM module (not yet open source), .
When using the `GuoConstant` force model, I noticed two things that are not present when usi...I tested waLBerla's TRT collision model with the compressible D3Q19 velocity set in waLBerla's free surface LBM module (not yet open source), .
When using the `GuoConstant` force model, I noticed two things that are not present when using the `SimpleConstant` force model:
1. the physical results significantly differ from SRT (same even order relaxation rate, arbitrary "Magic" parameter)
2. the physical results change significantly with small changes in TRT's (even order) relaxation rate
These observations should not be related to the free surface LBM implementation and we should therefore
- [x] create a simple test case to validate the `GuoConstant` force model when using the TRT/MRT collision operators (without free surface LBM)
- [x] try to reproduce this issue (without free surface LBM)
- [ ] check if the `GuoConstant` and `GuoField` force models are implemented correctly for TRT/MRT collision operatorsJonas PlewinskiJonas Plewinskihttps://i10git.cs.fau.de/walberla/walberla/-/issues/173Poor user experience with generated DynamicUBB (UBB + additional_data_handler)2023-05-02T09:32:33+02:00Nigel OvermarsPoor user experience with generated DynamicUBB (UBB + additional_data_handler)For my purposes, I need to be able to set a certain inflow profile, for example a variant of a Poiseuille flow where the velocity changes at every time step.
Currently, this is possible when using generated sweeps, but only with a few n...For my purposes, I need to be able to set a certain inflow profile, for example a variant of a Poiseuille flow where the velocity changes at every time step.
Currently, this is possible when using generated sweeps, but only with a few nasty hacks, for both making it temporally varying and spatially varying. To be able to use a DynamicUBB, which was generated via
```
ubb_dynamic = UBB(lambda *args: None, dim=stencil.D)
ubb_data_handler = UBBAdditionalDataHandler(stencil, ubb_dynamic)
# UBB with user-defined velocity profile
generate_boundary(ctx, "DynamicUBB", ubb_dynamic, lbm_method,
additional_data_handler=ubb_data_handler, target=target)
```
one needs to pass a `std::function< Vector3< real_t >(const Cell&, const shared_ptr< StructuredBlockForest >&, IBlock&)` to the constructor of the DynamicUBB type. The easiest way to make one is via a functor, for which a template looks like this:
```
class InflowProfile
{
public:
Vector3< real_t > operator()( const Cell& pos, const shared_ptr< StructuredBlockForest >& SbF, IBlock& block ) {
// return velocity vector depending on the cell location in the SbF
return getVelocityVector(pos, SbF, block);
}
};
```
**Temporally varying inflow profile**
The problem is that, in the current version of waLBerla/lbmpy, the additional data handler is only being called once before the running of the simulation when the generated member function
```
template<typename FlagField_T>
void DynamicUBB::fillFromFlagField( const shared_ptr<StructuredBlockForest> & blocks, ConstBlockDataID flagFieldID,
FlagUID boundaryFlagUID, FlagUID domainFlagUID)
```
is being called. After these values have been set, there is currently no easy way to update them at every time step. As a workaround, one can add the aforementioned member function as a `addFuncAfterTimeStep` to the SweepTimeLoop and add a `TimeTracker` object to the functor, which allows to update the values depending on the current time step. This approach does appear to incur a performance penalty, as the member function `DynamicUBB::fillFromFlagField` appears to do some work which should be unnecessary after an initial run of it. I think, but haven't tested it, that this also makes running on the GPU not possible or significantly slowed down.
**Spatially varying inflow profile**
Currently, the cells being passed to the functors `operator()` are different from the ones I expected to get. The problem is in the following generated code:
```
// for every cell in the block
if ( isFlagSet( it.neighbor(1, 0, 0 , 0 ), boundaryFlag ) )
{
auto element = IndexInfo(it.x(), it.y(), it.z(), 0 );
Vector3<real_t> InitialisatonAdditionalData = elementInitaliser(Cell(it.x(), it.y(), it.z()), blocks, *block);
element.vel_0 = InitialisatonAdditionalData[0];
element.vel_1 = InitialisatonAdditionalData[1];
element.vel_2 = InitialisatonAdditionalData[2];
// snip
}
// Similar code for all directions, e.g. 27 in total for a D3Q27 stencil
```
Where the function `elementInitialiser` is the our functors `operator()`.
In essence, what it being checked if the current cell has a neighbor, in the case of a inflow boundary, for which the inflowFlag is set. If that is the case, apply the functor to this cell, meaning the cell of which a neighboring cell has the inflowFlag set, not the cell with the inflowFlag self, and store the results (i.e. the computed velocity vector) in the appropriate location.
Currently, I am working around this problem by manually adding the direction to the cell on which the functor is applied. Unfortunately, due to the generated nature of the code, this is not a real solution...
In my use case, certain noise values are generated for every cell for every time step in the inflow boundary via a Python program which are then being read from a .json file. These values are then put into a `std::unordered_map<GlobalCellCoordinatesTuple, Vector3>` and then read during the boundary treatment. Without the aforementioned fix there is a miss match between the cell coordinates I put in the `std::unordered_map<..,..>` (the ones for which the inflowFlag is being set) and the cell coordinates that get passed to the functor, resulting in the wrong behavior.
**Suggestions**
- Allow the user to specify if they want the boundary values to be updated at every time step. Then the user would just have to add a `TimeTracker` to the functor and then can update the values in an easy manner.
- Allow the user to select if they want the cells passed to the functors `operator()` to be 1) the neighboring cells as is done right now, or 2) they are the actual cells for which the flag which is checked is set.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/walberla/walberla/-/issues/170Usability CodeGen boundaries2023-04-05T13:47:53+02:00Markus HolzerUsability CodeGen boundariesEspecially when a boundary with additional data is generated it is rather hard to understand how this should be done in the python script.
Something like a wrapper around the boundaries would maybe improve the situation. For example, th...Especially when a boundary with additional data is generated it is rather hard to understand how this should be done in the python script.
Something like a wrapper around the boundaries would maybe improve the situation. For example, the UBB can be generated in this way:
```python
ubb_dynamic = UBB(lambda *args: None, dim=stencil.D)
ubb_data_handler = UBBAdditionalDataHandler(stencil, ubb_dynamic)
```
However, the empty callback with the lambda and the `UBBAdditionalDataHandler` are cryptic and there is absolutely no other choice than shown above.
Thus a thin wrapper like `DynamicUBB(stencil=stencil)` could improve this situation quite a lot.
The problem why this has to be written in such a cryptic way in the first place is that lbmpy has to function as a standalone framework besides waLBerla. Thus certain design decisions are motivated out of a python world and contradict with the codegen for waLBerla.Markus HolzerMarkus Holzerhttps://i10git.cs.fau.de/walberla/walberla/-/issues/163Usage of NCCL2021-10-08T09:35:18+02:00Markus HolzerUsage of NCCLNVIDIA NCCL provides a Collective Communication Library. This could give a performance boost for multi GPU computations and be better than Cuda-Aware MPI.
https://developer.nvidia.com/nccl
https://docs.nvidia.com/deeplearning/nccl/insta...NVIDIA NCCL provides a Collective Communication Library. This could give a performance boost for multi GPU computations and be better than Cuda-Aware MPI.
https://developer.nvidia.com/nccl
https://docs.nvidia.com/deeplearning/nccl/install-guide/index.htmlMarkus HolzerMarkus Holzerhttps://i10git.cs.fau.de/walberla/walberla/-/issues/161Documentation of Python setup workflow2021-09-30T10:44:25+02:00Helen SchottenhammlDocumentation of Python setup workflowTo help the standard user to set up their Python environment properly for the usage with pystencils/ lbmpy, a documentation in the README would be helpful.To help the standard user to set up their Python environment properly for the usage with pystencils/ lbmpy, a documentation in the README would be helpful.https://i10git.cs.fau.de/walberla/walberla/-/issues/147Don't call resetForceAndTorque inside DEM2021-05-20T15:09:56+02:00Michael Kuronmkuron@icp.uni-stuttgart.deDon't call resetForceAndTorque inside DEMThe DEM solver contains
```
// Resetting the acting forces
bodyIt->resetForceAndTorque();
[...]
// Resetting the acting forces
bodyIt->resetForceAndTorque();
```
which I don't think should be there, and it definitely shouldn't...The DEM solver contains
```
// Resetting the acting forces
bodyIt->resetForceAndTorque();
[...]
// Resetting the acting forces
bodyIt->resetForceAndTorque();
```
which I don't think should be there, and it definitely shouldn't reset the force twice. HCSITS does not reset the force itself, that's what ForceTorqueOnBodiesResetter is for.Christoph RettingerChristoph Rettingerhttps://i10git.cs.fau.de/walberla/walberla/-/issues/145reduceOverParticles2021-03-30T10:54:04+02:00Sebastian EiblreduceOverParticles```
auto kinEnergy = ps.reduceOverParticles(.., SUM, [&](auto p_idx)
{return 0.5_r * ac.getMass(p_idx) * ac.getLinearVelocity(p_idx) * ac.getLinearVelocity(p_idx);});
``````
auto kinEnergy = ps.reduceOverParticles(.., SUM, [&](auto p_idx)
{return 0.5_r * ac.getMass(p_idx) * ac.getLinearVelocity(p_idx) * ac.getLinearVelocity(p_idx);});
```