Verification — db-21

1. What "Verified" Means Here

Two distinct claims:

  1. Per-language correctness: unit tests in each language pass.
  2. Cross-language byte equivalence: three independent implementations produce identical canonical wire dumps for three fixed workloads, proven by sha256.

Both must hold. (1) without (2) lets each port drift independently into a "self-consistent but wrong" state.

2. Per-Language Unit Tests

Ten tests, mirrored across all three ports:

#NameAsserts
1bloom_hit_missBloom positive case + a definite negative
2bounds_short_circuitGet skips SST when key outside [smallest, largest]
3range_tomb_hides_older_putNewer range tomb shadows older Put
4range_tomb_respects_newer_putOlder range tomb does not shadow newer Put
5tiered_picks_prefixcompact_size_tiered picks ≥2 prefix
6universal_picks_runcompact_universal picks ≥3 contiguous run
7noop_compactionReturns false when no eligible group
8dump_determinismTwo dumps of the same state are equal; magic is DSEADV21
9workload_all_scenariosAll four scenarios produce non-empty dumps with correct magic
10dedup_keeps_lastbuild_sst keeps the last Put per key
./scripts/verify.sh
# == Rust ==
# 10 passed; 0 failed
# == Go ==
# ok      github.com/10xdev/dse/db21
# == C++ ==
# 1/1 Test #1: test_adv .........................   Passed
# === OK ===

3. Cross-Language Byte Equivalence

./scripts/cross_test.sh
# == build Rust ==
# == build Go ==
# == build C++ ==
# ok   fixture=A impl=rust fc2fe88978eb2d419a73a7a16fa9ec0695ad9a56cb3a31b0bf85c0a28d7c97d6
# ok   fixture=A impl=go   fc2fe88978eb2d419a73a7a16fa9ec0695ad9a56cb3a31b0bf85c0a28d7c97d6
# ok   fixture=A impl=cpp  fc2fe88978eb2d419a73a7a16fa9ec0695ad9a56cb3a31b0bf85c0a28d7c97d6
# ok   fixture=B impl=rust 05b07426e0da8ec2f1f8c81573dc275cd61cab9c19c93dc17c854456e441e7bb
# ok   fixture=B impl=go   05b07426e0da8ec2f1f8c81573dc275cd61cab9c19c93dc17c854456e441e7bb
# ok   fixture=B impl=cpp  05b07426e0da8ec2f1f8c81573dc275cd61cab9c19c93dc17c854456e441e7bb
# ok   fixture=C impl=rust 4ad255755dbfbaa40a842766656d0c0dbd6713b6a527ffea5a24fa35964d73e4
# ok   fixture=C impl=go   4ad255755dbfbaa40a842766656d0c0dbd6713b6a527ffea5a24fa35964d73e4
# ok   fixture=C impl=cpp  4ad255755dbfbaa40a842766656d0c0dbd6713b6a527ffea5a24fa35964d73e4
# === ALL OK ===

4. What Would Falsify The Claim

A non-exhaustive list of bugs the cross test would catch but a per-language test wouldn't:

  • Forgetting to encode the bloom bitmap as little-endian on a big-endian port.
  • Using host integer width for length prefixes instead of u32.
  • Iterating a hash map at any point in merge_run (non-deterministic order across languages and across runs).
  • Encoding the ratio as "0.5" instead of the IEEE bit pattern.
  • Compacting via "longest run found so far that satisfies threshold at the time of finding", instead of evaluating all runs and picking the global longest.
  • Off-by-one in b = a + 1 + (r3 mod (keys-a)) for the range tombstone end key.

5. Reproducibility Bar

  • macOS arm64, AppleClang 16, Go 1.22, Rust stable (rustc 1.7x).
  • No external dependencies (no sha2 crate, no golang.org/x/..., no OpenSSL): every implementation is self-contained, so the verification step is reproducible offline.
  • All three hashes are pinned in scripts/cross_test.sh and reproduced in this document for paper-trail purposes.