db-10 — Observation
What the cross-language verification actually proves, and what the serialized stream looks like by hand.
Output of scripts/cross_test.sh
=== compare Scenario A (inserts seed=42 ops=500) ===
A rust=4b587ccce2627561c03d5db0c2c172642c9f3ed188c97fc53a215a3d0f316088 ( ???? B)
A go =4b587ccce2627561c03d5db0c2c172642c9f3ed188c97fc53a215a3d0f316088 ( ???? B)
A cpp =4b587ccce2627561c03d5db0c2c172642c9f3ed188c97fc53a215a3d0f316088 ( ???? B)
match(A): 4b587ccce2627561c03d5db0c2c172642c9f3ed188c97fc53a215a3d0f316088
=== compare Scenario B (mixed seed=7 ops=500) ===
B rust=9edbeec6436ee549c8a52b97f286831ed340c4bb588c6371542cdf0421e37718 ( 2515 B)
B go =9edbeec6436ee549c8a52b97f286831ed340c4bb588c6371542cdf0421e37718 ( 2515 B)
B cpp =9edbeec6436ee549c8a52b97f286831ed340c4bb588c6371542cdf0421e37718 ( 2515 B)
match(B): 9edbeec6436ee549c8a52b97f286831ed340c4bb588c6371542cdf0421e37718
=== spot-check stream contents ===
spot-checks ok
=== ALL OK ===
Reading the stream by hand
The empty tree is exactly five bytes:
01 is_leaf = 1
00 00 00 00 nkeys = 0
After one insert (key="a", val="1"):
01 is_leaf = 1
01 00 00 00 nkeys = 1
01 00 00 00 klen = 1
61 key = "a"
01 00 00 00 vlen = 1
31 val = "1"
After the fourth distinct key, the root must split:
00 is_leaf = 0 ← became internal
01 00 00 00 nkeys = 1
04 00 00 00 klen = 4 ← promoted middle key
… key bytes …
04 00 00 00 vlen = 4
… val bytes …
01 00 00 00 … left child (preorder)
01 00 00 00 … right child (preorder)
The is_leaf byte changes from 01 to 00 precisely at the moment
the root grows upwards. There is no other operation that flips this
byte for the root.
What the matching sha256 proves
A single matching match(...) line proves that all three
implementations agree on, at the byte level:
- The PRNG. Any drift in
SplitMix64would shuffle the key stream and the very first byte of the serialized tree would change. - The lexicographic byte compare. Different ordering would re-route the descent at every internal node from key 4 onward.
- The proactive-split rule. Different split rules would
produce different children counts and
nkeysfields at every level above the leaves. - The proactive-rebalance rule (Scenario B). The mixed scenario hits both insert and delete paths; the matching hash proves the borrow/merge logic agrees across all three.
- The preorder serializer with little-endian length prefixes. Different endianness or different node order would flip every single multi-byte field in the stream.
Any one of these going wrong, in any one of the three languages, makes the hashes diverge.
Sizes
Scenario B settles at exactly 2515 B for seed=7, ops=500, scenario=mixed. The Scenario A size varies but is also identical
across all three languages (see the script output).
Spot-check rationale
The script greps the Rust scenario-A output for a known key prefix
that must be inserted by SplitMix64(42)'s first few outputs.
This guards against the silent-success regression where every
language is "successfully" producing the same five-byte empty-tree
header and nothing else.