Step 03 — Cross-Language Byte Equivalence
Goal
Prove that the Rust, Go, and C++ implementations produce byte-identical canonical wire dumps for three fixed workloads.
Why This Is The Whole Point
API-level test parity is cheap and weak. "Same input → same hash of a canonical binary dump" is strong: any per-language drift (endian, integer width, map-iteration order, float formatting) surfaces as a hash mismatch on the next run.
The Format (one canonical source)
See docs/execution.md Section 4. Two-line summary:
- Magic
"DSEADV21"‖f64 LE ratio‖u32 LE sst_count. - Per SST (newest first): bounds (lenpref) ‖ entries (
u8 kind+ lenpref key + maybe lenpref value) ‖ range tombs ‖u64 LEbloom bitmap.
The Workload (one canonical source)
See docs/execution.md Sections 1-3. Two-line summary:
- SplitMix64 PRNG, 3 draws per op,
(r1 >> 62) & 3chooses Put / Put / Delete / RangeTomb. Flush every 8 ops, compact every 16. No residue flush at end.
The Three Fixtures
| Fixture | seed | ops | keys | scenario |
|---|---|---|---|---|
| A | 42 | 200 | 32 | tieredcompact |
| B | 7 | 500 | 64 | universalcompact |
| C | 99 | 300 | 16 | withrange |
Hashes are pinned in scripts/cross_test.sh and reproduced in
docs/execution.md Section 5 and docs/verification.md Section 3.
Done When
./scripts/cross_test.sh
# ... ends with ...
=== ALL OK ===
If it doesn't, the diff between two implementations' dumps is the debugging artefact. Decode the first ~16 bytes to confirm magic + ratio, then walk SSTs one at a time — each SST is self-delimiting.
What To Do When A Hash Drifts
- Recapture from Rust. If you intentionally changed semantics, the
Rust reference dictates the new canonical hashes; update both
scripts/cross_test.shanddocs/execution.mdSection 5. - Hunt the drift. If you didn't intend to change anything, diff the
raw
dump_statebytes between the failing pair. The first differing byte tells you where in the format the bug lives. Common culprits: forgot LE, usedusizeinstead ofu32, iterated a hash map.