hyteg issueshttps://i10git.cs.fau.de/hyteg/hyteg/-/issues2020-02-24T16:04:02+01:00https://i10git.cs.fau.de/hyteg/hyteg/-/issues/114Extend HyTeG to work on level 1 and 02020-02-24T16:04:02+01:00Nils KohlExtend HyTeG to work on level 1 and 0Apparently for some P1 GMG tests (even P1-P1 Stokes) we can plug in level 1 as minimum level and it works.
Level 0 could really be helpful to accelerate our coarse grid solvers.
* [x] indexing
* [x] functions
* [x] operator assembly
* [...Apparently for some P1 GMG tests (even P1-P1 Stokes) we can plug in level 1 as minimum level and it works.
Level 0 could really be helpful to accelerate our coarse grid solvers.
* [x] indexing
* [x] functions
* [x] operator assembly
* [x] single level solvers (CG, PETSc suite, ...)
* [x] grid transfer, multigrid solvers
* [x] fix 2D
* [x] function iterator, PETSc field split indexing
* [x] agglomeration
* [ ] ~~VTK~~ -> #116 Dominik Thoennesdominik.thoennes@fau.deDominik Thoennesdominik.thoennes@fau.dehttps://i10git.cs.fau.de/hyteg/hyteg/-/issues/72Number of global MPI reduce ops in P2Function::dot()?2018-07-26T07:29:54+02:00Marcus MohrNumber of global MPI reduce ops in P2Function::dot()?Hi,
taking a look at the `P2Function::dot()` I see that there are two invocations of `walberla::mpi::allReduceInplace()`, one each by
- VertexDoFFunction< ValueType >::dot()
- EdgeDoFFunction< ValueType >::dot()
Having no insight into ...Hi,
taking a look at the `P2Function::dot()` I see that there are two invocations of `walberla::mpi::allReduceInplace()`, one each by
- VertexDoFFunction< ValueType >::dot()
- EdgeDoFFunction< ValueType >::dot()
Having no insight into walberla I can of course not be sure, but if this leads to two MPI reduce operations being performed, we should probably, w.r.t. 3D and HPC, avoid this.
My suggestions would be to add another optional parameter to ***DoFFunction::dot() that turns the call to walberla::mpi::allReduceInplace() on or off. Then we could perform the reduce inside P2Function::dot(), same as with
P2Function::getMaxMagnitude(), see [d7a2cc47].
Opinion?
Cheers
MarcusDominik Thoennesdominik.thoennes@fau.deDominik Thoennesdominik.thoennes@fau.dehttps://i10git.cs.fau.de/hyteg/hyteg/-/issues/61Implement vector valued unknowns by extending the indexing functions2019-11-20T14:17:10+01:00Nils KohlImplement vector valued unknowns by extending the indexing functionsFor higher order finite elements or e.g. DG1 discretizations, **we need vector values unknowns** on the discretized domain.
Currently this is implemented (but not tested and most likely not functional!) by template parameters (`ValueTyp...For higher order finite elements or e.g. DG1 discretizations, **we need vector values unknowns** on the discretized domain.
Currently this is implemented (but not tested and most likely not functional!) by template parameters (`ValueType`) that could theoretically be set to `std::array` or similar.
As an alternative, **we could extend the indexing functions by another dimension** that refers to the nth element of such a vector valued unknown.
Example:
To get the three DG1 unknowns at position x, y on a macro-face an indexing function call could look like this:
```
real_t dg1_a = data[ index< Level >( x, y, DG_CENTER, 0 ) ];
real_t dg1_b = data[ index< Level >( x, y, DG_CENTER, 1 ) ];
real_t dg1_c = data[ index< Level >( x, y, DG_CENTER, 2 ) ];
```
(Such mappings of course must be documented (which would be true for the templated approach as well).)
The function memory is simply extended to the required size.
Further advantages:
* the `ValueType` template parameter ~~can be removed~~ would be restricted to either `float` or `double`
* the memory layout (AoS vs SoA) can be easily switched by exchanging the indexing function(!) - this would not be possible with the `ValueType`-template approach
Disadvantages:
* we would restrict the function classes to unknowns of type ~~`real_t`^N~~ `float`^N or `double`^N (for `uint_t` typed functions we then need extra classes (but maybe that's not that bad anyway?))
* ~~https://i10git.cs.fau.de/terraneo/tinyhhg/issues/34 would be really hard to solve (but uncertain if we ever need that)~~Nils KohlNils Kohlhttps://i10git.cs.fau.de/hyteg/hyteg/-/issues/50Benchmark (level-)templated vs non-templated index functions2018-10-01T10:56:53+02:00Nils KohlBenchmark (level-)templated vs non-templated index functionsCurrently, the indexing functions are templated with the corresponding refinement level.
But it is not clear, if that optimization is necessary (it is not even clear if the templates are better than non-templated indexing functions...)....Currently, the indexing functions are templated with the corresponding refinement level.
But it is not clear, if that optimization is necessary (it is not even clear if the templates are better than non-templated indexing functions...). Therefore we should compare the performance impact of a non-templated indexing function.
If there is none, this would mean we can get rid of the templated indexing functions which would
1. presumably severely increase compilation speed
2. simplify modularization of the library
3. simplify the code structure (we could get rid of the SPECIALIZE macro(s) and lots of template code)Dominik Thoennesdominik.thoennes@fau.deDominik Thoennesdominik.thoennes@fau.dehttps://i10git.cs.fau.de/hyteg/hyteg/-/issues/48ParMetis graph contains edges between primitives that are more than one dimen...2019-04-18T17:36:51+02:00Nils KohlParMetis graph contains edges between primitives that are more than one dimension apartInter-primitive communication is currently only allowed almong primitives that differ in only one dimension.
The corresponding ParMetis graph should reflect that relation by not creating edges between primitives that differ by two dimens...Inter-primitive communication is currently only allowed almong primitives that differ in only one dimension.
The corresponding ParMetis graph should reflect that relation by not creating edges between primitives that differ by two dimensions.Nils KohlNils Kohlhttps://i10git.cs.fau.de/hyteg/hyteg/-/issues/45Unnecessary communication during interpolate, add, assign2019-02-18T16:44:08+01:00Nils KohlUnnecessary communication during interpolate, add, assignCurrently (in various DoF spaces) we communicate the halos during the LA routines interpolate, add and assign although it is not strictly necessary since all of them only update the local DoFs and only depend on local DoFs.
It would be ...Currently (in various DoF spaces) we communicate the halos during the LA routines interpolate, add and assign although it is not strictly necessary since all of them only update the local DoFs and only depend on local DoFs.
It would be more consequent to let routines that need updated halos pull them when necessary (e.g. operators - before stencils are applied).
Resolving this issue
1. would clear up which routines really depend on updated halos
2. could increase performance as it might safe time that is currently spent on unnecessary communication (however, shifting the communication to other routines could also decrease performance since we might not be able to overlap it with computation in some cases)https://i10git.cs.fau.de/hyteg/hyteg/-/issues/32Remove switch statement from index functions2017-09-18T10:00:55+02:00Nils KohlRemove switch statement from index functionsCould be replaced by generated or templated functions for each stencil direction.
However unsure if the switch is optimized away by the compiler anyway.Could be replaced by generated or templated functions for each stencil direction.
However unsure if the switch is optimized away by the compiler anyway.https://i10git.cs.fau.de/hyteg/hyteg/-/issues/30Graph weights for load balancing2019-11-20T14:19:22+01:00Nils KohlGraph weights for load balancingCurrently all primitives are equally weighted, but they should be weighted by number of DoFs (vertex < edge < face < cell).Currently all primitives are equally weighted, but they should be weighted by number of DoFs (vertex < edge < face < cell).Nils KohlNils Kohlhttps://i10git.cs.fau.de/hyteg/hyteg/-/issues/29BufferedCommunicator fails when using parallel sends / recvs via OpenMPBuffer...2019-11-07T14:17:37+01:00Nils KohlBufferedCommunicator fails when using parallel sends / recvs via OpenMPBufferSystemLikely there are race conditions when invoking the pack / unpack (send / recv) callbacks. Maybe there is more to it..
Switched to serial sends / recvs now, seems to work fine.
Unsure whether this causes a huge performance drop -> profi...Likely there are race conditions when invoking the pack / unpack (send / recv) callbacks. Maybe there is more to it..
Switched to serial sends / recvs now, seems to work fine.
Unsure whether this causes a huge performance drop -> profile first before fix!
Also: only an issue when building with OpenMP.Nils KohlNils Kohlhttps://i10git.cs.fau.de/hyteg/hyteg/-/issues/25Potential Optimization: replace std::pow(2, ...) with bit shifts or lookup ta...2017-09-15T16:20:31+02:00Nils KohlPotential Optimization: replace std::pow(2, ...) with bit shifts or lookup tablesDominik Thoennesdominik.thoennes@fau.deDominik Thoennesdominik.thoennes@fau.dehttps://i10git.cs.fau.de/hyteg/hyteg/-/issues/19Potential Optimization: cache communication dependencies2017-08-01T14:54:40+02:00Nils KohlPotential Optimization: cache communication dependenciesThe BufferedCommunicator could cache neighbor dependencies instead of arranging them before every communicationThe BufferedCommunicator could cache neighbor dependencies instead of arranging them before every communicationNils KohlNils Kohl