Step 01 — Hash chain (FNV-1a64 → SplitMix64 → double hashing)
Before any filter logic, get the hash chain identical across all three languages. If fnv1a64("foobar") doesn't return 0x85944171f73967e8 everywhere, nothing else will work.
1. FNV-1a64
Algorithm:
hash = 0xcbf29ce484222325
for byte in input:
hash ^= byte
hash = hash * 0x100000001b3 // wrapping 64-bit multiplication
return hash
Known test vectors:
| Input | Result |
|---|---|
"" (empty) | 0xcbf29ce484222325 (the initial value) |
"a" | 0xaf63dc4c8601ec8c |
"foobar" | 0x85944171f73967e8 |
Side-by-side:
#![allow(unused)] fn main() { pub fn fnv1a64(bytes: &[u8]) -> u64 { let mut h: u64 = 0xcbf29ce484222325; for &b in bytes { h ^= b as u64; h = h.wrapping_mul(0x100000001b3); } h } }
func FNV1a64(b []byte) uint64 {
var h uint64 = 0xcbf29ce484222325
for _, x := range b {
h ^= uint64(x)
h *= 0x100000001b3
}
return h
}
std::uint64_t Fnv1a64(const std::uint8_t* p, std::size_t n) {
std::uint64_t h = 0xcbf29ce484222325ULL;
for (std::size_t i = 0; i < n; ++i) {
h ^= p[i];
h *= 0x100000001b3ULL;
}
return h;
}
⚠️ Two-byte traps: don't use FNV-1 (not 1a — different order of XOR vs multiply); don't use the 32-bit prime or basis (different constants).
2. SplitMix64 finalizer
FNV-1a64 has decent low bits but biased high bits. SplitMix64 (Vigna & Steele) is a single-step bit mixer that produces near-perfect avalanche on a 64-bit input. We apply it to FNV's output so that both the upper and lower 32-bit halves are usable as independent hashes.
splitmix64(x):
x = x + 0x9e3779b97f4a7c15
x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9
x = (x ^ (x >> 27)) * 0x94d049bb133111eb
return x ^ (x >> 31)
Known vectors:
| Input | Output |
|---|---|
0 | 0xe220a8397b1dcdaf |
0xdeadbeef | 0x4adfb90f68c9eb9b |
splitmix64(fnv1a64("foobar")) | use the test to lock this in |
The combined hash we actually use:
mix64(bytes) := splitmix64(fnv1a64(bytes))
h1 = mix64(bytes) & 0xffffffff // low 32 bits
h2 = mix64(bytes) >> 32 // high 32 bits
3. Synthesizing k indices (Kirsch–Mitzenmacher)
For a bit array of size m and k hashes:
for i in 0..k:
g = h1 + i * h2 // wrapping 64-bit add
idx = g mod m // reduce to [0, m)
use bits[idx]
The reduction g mod m is hot. Naive % works but a modulo on a 64-bit integer is ~20 cycles. RocksDB and others use Lemire's fast reduction:
fastmod(g, m) = ((g as u128) * (m as u128)) >> 64
(Equivalent to floor(g * m / 2^64), a near-uniform map of [0, 2^64) to [0, m).) Either approach is fine for correctness — but pick one and use it in all three languages, otherwise the bit positions diverge and cross-language tests break.
This lab uses plain % because it's identical across languages with no language-specific u128 syntax to worry about. Performance difference is irrelevant at filter-construction scale.
Test gate
Before moving on, all three bloombench hash foobar invocations must print:
fnv1a64=85944171f73967e8
splitmix=... (same value across languages)
h1=... h2=... (same values across languages)
If those don't match, the rest of the lab cannot succeed.