Step 01 — Hash chain (FNV-1a64 → SplitMix64 → double hashing)

Before any filter logic, get the hash chain identical across all three languages. If fnv1a64("foobar") doesn't return 0x85944171f73967e8 everywhere, nothing else will work.

1. FNV-1a64

Algorithm:

hash = 0xcbf29ce484222325
for byte in input:
    hash ^= byte
    hash = hash * 0x100000001b3        // wrapping 64-bit multiplication
return hash

Known test vectors:

Input	Result
`""` (empty)	`0xcbf29ce484222325` (the initial value)
`"a"`	`0xaf63dc4c8601ec8c`
`"foobar"`	`0x85944171f73967e8`

Side-by-side:

#![allow(unused)]
fn main() {
pub fn fnv1a64(bytes: &[u8]) -> u64 {
    let mut h: u64 = 0xcbf29ce484222325;
    for &b in bytes {
        h ^= b as u64;
        h = h.wrapping_mul(0x100000001b3);
    }
    h
}
}

func FNV1a64(b []byte) uint64 {
    var h uint64 = 0xcbf29ce484222325
    for _, x := range b {
        h ^= uint64(x)
        h *= 0x100000001b3
    }
    return h
}

std::uint64_t Fnv1a64(const std::uint8_t* p, std::size_t n) {
    std::uint64_t h = 0xcbf29ce484222325ULL;
    for (std::size_t i = 0; i < n; ++i) {
        h ^= p[i];
        h *= 0x100000001b3ULL;
    }
    return h;
}

⚠️ Two-byte traps: don't use FNV-1 (not 1a — different order of XOR vs multiply); don't use the 32-bit prime or basis (different constants).

FNV-1a64 has decent low bits but biased high bits. SplitMix64 (Vigna & Steele) is a single-step bit mixer that produces near-perfect avalanche on a 64-bit input. We apply it to FNV's output so that both the upper and lower 32-bit halves are usable as independent hashes.

splitmix64(x):
    x = x + 0x9e3779b97f4a7c15
    x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9
    x = (x ^ (x >> 27)) * 0x94d049bb133111eb
    return  x ^ (x >> 31)

Known vectors:

Input	Output
`0`	`0xe220a8397b1dcdaf`
`0xdeadbeef`	`0x4adfb90f68c9eb9b`
`splitmix64(fnv1a64("foobar"))`	use the test to lock this in

The combined hash we actually use:

mix64(bytes) := splitmix64(fnv1a64(bytes))
h1 = mix64(bytes) & 0xffffffff               // low 32 bits
h2 = mix64(bytes) >> 32                       // high 32 bits

3. Synthesizing `k` indices (Kirsch–Mitzenmacher)

For a bit array of size m and k hashes:

for i in 0..k:
    g = h1 + i * h2          // wrapping 64-bit add
    idx = g mod m            // reduce to [0, m)
    use bits[idx]

The reduction g mod m is hot. Naive % works but a modulo on a 64-bit integer is ~20 cycles. RocksDB and others use Lemire's fast reduction:

fastmod(g, m) = ((g as u128) * (m as u128)) >> 64

(Equivalent to floor(g * m / 2^64), a near-uniform map of [0, 2^64) to [0, m).) Either approach is fine for correctness — but pick one and use it in all three languages, otherwise the bit positions diverge and cross-language tests break.

This lab uses plain % because it's identical across languages with no language-specific u128 syntax to worry about. Performance difference is irrelevant at filter-construction scale.

Test gate

Before moving on, all three bloombench hash foobar invocations must print:

fnv1a64=85944171f73967e8
splitmix=...        (same value across languages)
h1=...  h2=...      (same values across languages)

If those don't match, the rest of the lab cannot succeed.

Distributed Systems Engineer — Build Databases & Consensus From Scratch

Step 01 — Hash chain (FNV-1a64 → SplitMix64 → double hashing)

1. FNV-1a64

2. SplitMix64 finalizer

3. Synthesizing `k` indices (Kirsch–Mitzenmacher)

Test gate

Distributed Systems Engineer — Build Databases & Consensus From Scratch

Step 01 — Hash chain (FNV-1a64 → SplitMix64 → double hashing)

1. FNV-1a64

2. SplitMix64 finalizer

3. Synthesizing k indices (Kirsch–Mitzenmacher)

Test gate

3. Synthesizing `k` indices (Kirsch–Mitzenmacher)