db-08 — Verification

Cross-language byte identity (gating)

scripts/cross_test.sh is the gate. It builds canonical SSTable inputs and runs each language's merge_iter binary, comparing sha256 of the serialized merge stream.

Final results from the current run:

drop=false:
  rust: f693c483ef39dfef8e6285e29f9051a57e60bf2c4ba7b45bbf552c7932687fd1 (1874 bytes)
  go  : f693c483ef39dfef8e6285e29f9051a57e60bf2c4ba7b45bbf552c7932687fd1 (1874 bytes)
  cpp : f693c483ef39dfef8e6285e29f9051a57e60bf2c4ba7b45bbf552c7932687fd1 (1874 bytes)
  match: f693c483ef39dfef8e6285e29f9051a57e60bf2c4ba7b45bbf552c7932687fd1

drop=true:
  rust: ec71c56c89f451d33e58697af2d7bce985069078e1c599cc42062dfbba6e250e (1865 bytes)
  go  : ec71c56c89f451d33e58697af2d7bce985069078e1c599cc42062dfbba6e250e (1865 bytes)
  cpp : ec71c56c89f451d33e58697af2d7bce985069078e1c599cc42062dfbba6e250e (1865 bytes)
  match: ec71c56c89f451d33e58697af2d7bce985069078e1c599cc42062dfbba6e250e

The 9-byte size delta between modes equals exactly one tombstone frame (u32_le(4) + "key5" + u8(1)), confirming that the only entry dropped is the expected one.

Stream-content spot-checks

The cross-test runs xxd -p | grep to confirm that:

  • NEW-10 (hex 4e45572d3130) appears — the merged-write semantics worked.
  • OLD-50 (hex 4f4c442d3530) appears — keys present only in the older source survive.
  • val99 (hex 76616c3939) appears — the largest bulk key from older shows up.
  • 040000006b65793501 (key5 tombstone framing) appears with drop=false and is absent with drop=true.

These are not redundant with the sha256 check: sha256 mismatch tells you something is wrong but not what; the framed-hex grep tells you which invariant broke.

Unit-test coverage matrix

BehaviorRustGoC++
LRU basic hit/miss + counters
LRU evicts LRU on capacity
LRU re-insert overwrites + promotes
LRU MRU-first key order after Get
Merger: empty inputs → empty output
Merger: single source passthrough
Merger: two-source interleave (no duplicates)
Merger: newest-wins on tie
Merger: tombstone kept when drop=false
Merger: tombstone dropped when drop=true
SerializeStream deterministic & expected size

How to re-verify locally

cd db-08-block-cache-and-iterators
bash scripts/verify.sh         # unit tests for all three languages
bash scripts/cross_test.sh     # cross-language byte-identity test

What would invalidate this proof

  • Changing SerializeStream's framing (lengths, endianness, type-byte encoding) — sha256 would diverge immediately.
  • Changing the (key, src) heap comparator to break ties on src descendingnewest-wins test fails before cross-test runs.
  • Changing the cache capacity unit from entries to bytes — the LRU tests would need recalibration but no other lab depends on the unit choice.