Verification — db-21
1. What "Verified" Means Here
Two distinct claims:
- Per-language correctness: unit tests in each language pass.
- Cross-language byte equivalence: three independent implementations produce identical canonical wire dumps for three fixed workloads, proven by sha256.
Both must hold. (1) without (2) lets each port drift independently into
a "self-consistent but wrong" state.
2. Per-Language Unit Tests
Ten tests, mirrored across all three ports:
| # | Name | Asserts |
|---|---|---|
| 1 | bloom_hit_miss | Bloom positive case + a definite negative |
| 2 | bounds_short_circuit | Get skips SST when key outside [smallest, largest] |
| 3 | range_tomb_hides_older_put | Newer range tomb shadows older Put |
| 4 | range_tomb_respects_newer_put | Older range tomb does not shadow newer Put |
| 5 | tiered_picks_prefix | compact_size_tiered picks ≥2 prefix |
| 6 | universal_picks_run | compact_universal picks ≥3 contiguous run |
| 7 | noop_compaction | Returns false when no eligible group |
| 8 | dump_determinism | Two dumps of the same state are equal; magic is DSEADV21 |
| 9 | workload_all_scenarios | All four scenarios produce non-empty dumps with correct magic |
| 10 | dedup_keeps_last | build_sst keeps the last Put per key |
./scripts/verify.sh
# == Rust ==
# 10 passed; 0 failed
# == Go ==
# ok github.com/10xdev/dse/db21
# == C++ ==
# 1/1 Test #1: test_adv ......................... Passed
# === OK ===
3. Cross-Language Byte Equivalence
./scripts/cross_test.sh
# == build Rust ==
# == build Go ==
# == build C++ ==
# ok fixture=A impl=rust fc2fe88978eb2d419a73a7a16fa9ec0695ad9a56cb3a31b0bf85c0a28d7c97d6
# ok fixture=A impl=go fc2fe88978eb2d419a73a7a16fa9ec0695ad9a56cb3a31b0bf85c0a28d7c97d6
# ok fixture=A impl=cpp fc2fe88978eb2d419a73a7a16fa9ec0695ad9a56cb3a31b0bf85c0a28d7c97d6
# ok fixture=B impl=rust 05b07426e0da8ec2f1f8c81573dc275cd61cab9c19c93dc17c854456e441e7bb
# ok fixture=B impl=go 05b07426e0da8ec2f1f8c81573dc275cd61cab9c19c93dc17c854456e441e7bb
# ok fixture=B impl=cpp 05b07426e0da8ec2f1f8c81573dc275cd61cab9c19c93dc17c854456e441e7bb
# ok fixture=C impl=rust 4ad255755dbfbaa40a842766656d0c0dbd6713b6a527ffea5a24fa35964d73e4
# ok fixture=C impl=go 4ad255755dbfbaa40a842766656d0c0dbd6713b6a527ffea5a24fa35964d73e4
# ok fixture=C impl=cpp 4ad255755dbfbaa40a842766656d0c0dbd6713b6a527ffea5a24fa35964d73e4
# === ALL OK ===
4. What Would Falsify The Claim
A non-exhaustive list of bugs the cross test would catch but a per-language test wouldn't:
- Forgetting to encode the bloom bitmap as little-endian on a big-endian port.
- Using
hostinteger width for length prefixes instead ofu32. - Iterating a hash map at any point in
merge_run(non-deterministic order across languages and across runs). - Encoding the ratio as
"0.5"instead of the IEEE bit pattern. - Compacting via "longest run found so far that satisfies threshold at the time of finding", instead of evaluating all runs and picking the global longest.
- Off-by-one in
b = a + 1 + (r3 mod (keys-a))for the range tombstone end key.
5. Reproducibility Bar
- macOS arm64, AppleClang 16, Go 1.22, Rust stable (
rustc 1.7x). - No external dependencies (no
sha2crate, nogolang.org/x/..., no OpenSSL): every implementation is self-contained, so the verification step is reproducible offline. - All three hashes are pinned in
scripts/cross_test.shand reproduced in this document for paper-trail purposes.