This MR realizes a two-step process. First, in a serial (offline) run, the SetupPrimitiveStorage is built and passed to a load balancer, followed by serialization to a file. During the actual, parallel run, the file is read via MPIIO such that only the portion that is necessary has to be viewed by the respective process. From that binary string, the entire PrimitiveStorage is created in parallel.
There are numerous possible optimizations - most notably, it is not clear to me right now if the current approach of simply writing all neighbor primitives to file is preferable, or if alternatively, such data should be communicated after only reading the locally allocated primitives. Maybe it does not even make a difference at all. Also, this implementation has not been tested with large setups yet.