Base hash = FNV-1a64 then SplitMix64 mixing. FNV-1a64 is trivial to implement identically across languages; SplitMix64 finalizing fixes its weak avalanche so the upper and lower 32 bits are both well-distributed. The two 32-bit halves become h1 and h2 for double hashing.
Kirsch–Mitzenmacher: g_i = h1 + i*h2, all u64 arithmetic, with the final mod m using a single u128-multiplication trick (Daniel Lemire, Fast Random Integer Generation in an Interval, 2018) so we don't pay for a div.
Bit array is little-endian byte-packed, bit i lives in bytes[i/8] >> (i%8) & 1. Same convention LevelDB and RocksDB use.
Header = k(u32 LE) || m(u64 LE). 12 bytes total. We deliberately put k first so a partial-read of just the header reveals the hash count without needing m.
No checksum on the filter itself. Bloom filters can tolerate a flipped bit (it adds at most a few keys' worth of false positives); pages-level checksumming belongs to db-06 (SSTable).
new_with_fpr(n, fpr) constructor. Picks m = ceil(-n * ln(fpr) / (ln 2)^2) and k = round((m/n) * ln 2). Caps k at 30 to avoid degenerate sizing for absurdly small FPRs.
vs MurmurHash3 / xxhash: faster and arguably higher quality, but each is hundreds of lines to re-implement identically in three languages. FNV+SplitMix is 12 lines per language and indistinguishable in our use case.
vs k independent hashes: 2× CPU for no measurable FPR change (Kirsch & Mitzenmacher 2006).
vs Cuckoo / xor filters: more space-efficient at low FPR but much more code. Worth a separate lab — db-21.
vs in-language hashers (std::hash, hash/fnv, std::hash<string>): per-language differences — Go's maphash is randomized per process; C++ std::hash<string> is implementation-defined. None of them survive cross-language interop.
Concurrent inserts. Single writer model. Concurrent add corrupts overlapping byte writes. Lock externally or use atomic OR per byte — covered in db-08 / db-21.
Adversarial keys. FNV-1a64 is not cryptographic — an attacker can craft collisions to inflate FPR. Use SipHash / xxh3 (with secret seed) if filter inputs are attacker-controlled.
Deletion. Use a counting Bloom or a cuckoo filter. See db-21.