Verification — What to Test and How
Per-language property tests
| # | Test | Pass if |
|---|---|---|
| V1 | fnv1a64_known_vectors | "" → 0xcbf29ce484222325; "a" → 0xaf63dc4c8601ec8c; "foobar" → 0x85944171f73967e8 |
| V2 | splitmix64_known_vectors | splitmix64(0) = 0xe220a8397b1dcdaf; splitmix64(0xdeadbeef) = 0x4adfb90f68c9eb9b |
| V3 | no_false_negatives | Insert N=10 000 random keys (seeded); contains returns true for every one |
| V4 | fpr_within_2x | Build for n=10 000 at fpr=0.01; query 100 000 random absent keys; observed FPR ≤ 2× theoretical |
| V5 | optimal_k_formula | with_fpr(1000, 0.01) returns k=7 and 9 580 ≤ m ≤ 9 620 (allow ±0.5%) |
| V6 | encode_decode_roundtrip | encode → decode → query the same keys: identical results |
| V7 | header_layout | First 4 bytes = k LE; next 8 = m LE; payload length = ⌈m/8⌉ |
| V8 | empty_filter_rejects_all | New filter with m=64, k=3; contains returns false for 1000 random keys |
Cross-language test
scripts/cross_test.sh performs the writer × reader matrix for {go, rust, cpp}²:
- Each writer builds a filter for the same fixed-seed key set (1 000 keys).
- Filters must be byte-identical (
md5sumover filter file). - Each reader opens each writer's filter and runs:
- 1 000 known-present queries → must all return
present - 1 000 known-absent queries (different seed) → results must match across readers
- 1 000 known-present queries → must all return
This catches:
- Endian or bit-order bugs in the header / bit array.
- Hash mismatch (
fnv1a64orsplitmix64differs). mod mreduction differs (Lemire's u128 trick vs%should yield identical indices).
What "passing" means
- All 8 property tests green in all three languages.
cross_test.shexits 0 with 9 byte-identical filter writers and 9 passing reader runs.- Manual smoke: hexdump of a 4-key filter matches the structure described in docs/observation.md.