Cell hash collisions
To help with the design of a dynamic UBB class with lbmpy, I wrote a Cell
hashmap to keep track of boundary nodes. While investigating an unrelated performance bottleneck, I found out that the hashing of a Cell
object was prone to collisions for indices larger than 64, up to a rate of 1% for indices larger than 512. It stood out in my unit test because my hashmap stores global cell indices, and as such it commonly handles large indices.
The current Cell hash function was introduced by dec54ed4. From what I understand, it originates from the MurmurHash2 library included in Boost, and was probably extracted from the <boost/functional/hash.hpp>
header. However, the code that was extracted comes from the hash_combine(std::size_t& seed, std::size_t const& value)
function, and since value
has to be a std::size_t
, an extra call to std::size_t std::hash<int>(cell[i])
was added, instead of extracting the hash_combine(std::size_t& seed, int const& value)
function directly. The latter was originally used by waLBerla on 64bit architectures (the function is quite slow, because it doesn't rely on std::hash<int>()
to hash the int
).
I could be wrong, but it looks like injecting a std::hash<int>()
in the former function has weakened the hashing algorithm, at least for the range of values that are typical with Cell
objects. Interestingly, by removing the Golden Ratio constant (0x9e3779b9
) and tweaking the bit shifts, it's possible to remove collisions for cell indices up to 512 on both 64bit and 32bit architectures, while also reducing runtime. Here is the new implementation:
inline std::size_t hash_value( const Cell & cell )
{
std::size_t seed = 0;
std::hash<cell_idx_t> hasher;
// limit bitwise left shift to 32 positions to avoid overflow on 32bit archs
// (if seed is zero, the first bitwise left and right shifts do nothing)
seed ^= hasher(cell.x()) + (seed<<11) + (seed>>6);
seed ^= hasher(cell.y()) + (seed<<11) + (seed>>6);
seed ^= hasher(cell.z()) + (seed<<11) + (seed>>6);
return seed;
}
A benchmark is attached (benchmark.cpp). Here are the results:
Benchmark results on a 64bit computer
Count collisions with current_hash_algorithm
box_length 16: 0 collisions (0.0%)
box_length 32: 786 collisions (0.0%)
box_length 64: 48'378 collisions (0.2%)
box_length 128: 1'771'416 collisions (0.8%)
box_length 256: 16'510'817 collisions (0.9%)
box_length 512: 135'273'859 collisions (1.0%)
Timing hash algorithm with current_hash_algorithm
box_length 16: 0 milliseconds
box_length 32: 0 milliseconds
box_length 64: 0 milliseconds
box_length 128: 1 milliseconds
box_length 256: 7 milliseconds
box_length 512: 53 milliseconds
Timing insertions in unordered map with current_hash_algorithm
box_length 16: 0.00 seconds
box_length 32: 0.00 seconds
box_length 64: 0.02 seconds
box_length 128: 0.13 seconds
box_length 256: 1.80 seconds
box_length 512: 53.06 seconds
Count collisions with new_hash_algorithm
box_length 16: 0 collisions (0.0%)
box_length 32: 0 collisions (0.0%)
box_length 64: 0 collisions (0.0%)
box_length 128: 0 collisions (0.0%)
box_length 256: 0 collisions (0.0%)
box_length 512: 0 collisions (0.0%)
Timing hash algorithm with new_hash_algorithm
box_length 16: 0 milliseconds
box_length 32: 0 milliseconds
box_length 64: 0 milliseconds
box_length 128: 1 milliseconds
box_length 256: 10 milliseconds
box_length 512: 78 milliseconds
Timing insertions in unordered map with new_hash_algorithm
box_length 16: 0.00 seconds
box_length 32: 0.00 seconds
box_length 64: 0.02 seconds
box_length 128: 0.18 seconds
box_length 256: 1.54 seconds
box_length 512: 13.25 seconds
Count collisions with boost_hash_algorithm
box_length 16: 0 collisions (0.0%)
box_length 32: 0 collisions (0.0%)
box_length 64: 0 collisions (0.0%)
box_length 128: 0 collisions (0.0%)
box_length 256: 0 collisions (0.0%)
box_length 512: 0 collisions (0.0%)
Timing hash algorithm with boost_hash_algorithm
box_length 16: 0 milliseconds
box_length 32: 0 milliseconds
box_length 64: 0 milliseconds
box_length 128: 2 milliseconds
box_length 256: 13 milliseconds
box_length 512: 101 milliseconds
Timing insertions in unordered map with boost_hash_algorithm
box_length 16: 0.00 seconds
box_length 32: 0.00 seconds
box_length 64: 0.04 seconds
box_length 128: 0.73 seconds
box_length 256: 7.47 seconds
box_length 512: 83.00 seconds
Benchmark results on a 32bit computer
Note: collisions were tested with box_length=480 instead of 512 to reduce the memory footprint and avoid a std::bad_alloc
.
Count collisions with current_hash_algorithm
box_length 16: 0 collisions (0.0%)
box_length 32: 786 collisions (0.0%)
box_length 64: 48'378 collisions (0.2%)
box_length 128: 1'771'416 collisions (0.8%)
box_length 256: 16'510'817 collisions (0.9%)
box_length 480: 111'387'188 collisions (1.0%)
Timing hash algorithm with current_hash_algorithm
box_length 16: 0 milliseconds
box_length 32: 0 milliseconds
box_length 64: 0 milliseconds
box_length 128: 3 milliseconds
box_length 256: 24 milliseconds
Timing insertions in unordered map with current_hash_algorithm
box_length 16: 0.00 seconds
box_length 32: 0.01 seconds
box_length 64: 0.05 seconds
box_length 128: 0.46 seconds
box_length 256: 5.35 seconds
Count collisions with new_hash_algorithm
box_length 16: 0 collisions (0.0%)
box_length 32: 0 collisions (0.0%)
box_length 64: 0 collisions (0.0%)
box_length 128: 0 collisions (0.0%)
box_length 256: 0 collisions (0.0%)
box_length 480: 0 collisions (0.0%)
Timing hash algorithm with new_hash_algorithm
box_length 16: 0 milliseconds
box_length 32: 0 milliseconds
box_length 64: 0 milliseconds
box_length 128: 3 milliseconds
box_length 256: 24 milliseconds
Timing insertions in unordered map with new_hash_algorithm
box_length 16: 0.00 seconds
box_length 32: 0.01 seconds
box_length 64: 0.07 seconds
box_length 128: 0.51 seconds
box_length 256: 3.86 seconds
Count collisions with boost_hash_algorithm
box_length 16: 0 collisions (0.0%)
box_length 32: 0 collisions (0.0%)
box_length 64: 14 collisions (0.0%)
box_length 128: 646 collisions (0.0%)
box_length 256: 35'821 collisions (0.0%)
box_length 480: 1'482'657 collisions (0.0%)
Timing hash algorithm with boost_hash_algorithm
box_length 16: 0 milliseconds
box_length 32: 0 milliseconds
box_length 64: 1 milliseconds
box_length 128: 4 milliseconds
box_length 256: 33 milliseconds
Timing insertions in unordered map with boost_hash_algorithm
box_length 16: 0.00 seconds
box_length 32: 0.01 seconds
box_length 64: 0.15 seconds
box_length 128: 1.57 seconds
box_length 256: 14.79 seconds
While the performance gain of the new hash implementation over the current implementation is noticeable for cell indices >= 64, it's unclear to me how useful that is for end users. Maintaining a hashmap of global cell indices that contains the indices of more than 10% of the blockforest cells is quite a specific use case.
Due to the memory-intensive nature of the benchmark, it's unclear what happens after 512 (maybe my implementation wraps around eventually); for such a use case I would prefer using the Boost version. Even though it is slower and has a few collisions already at 512 on 32bit architectures, it is definitively better tested.
I'm sharing this in case it could prove useful to anyone. If you don't think there are other use cases where this collision rate could be a problem, feel free to close this ticket.