Step 03 — CLI and Cross-Language Test
Goal
Wrap the MergingIterator in a CLI binary (merge_iter) that reads N
SSTables (built by db-06), runs a merge, and writes the canonical serialized
stream to stdout. Then prove the three language implementations are
byte-identical with scripts/cross_test.sh.
CLI spec
merge_iter [--drop-tombstones] IN1.sst IN2.sst ...
- IN1 is the newest source; INk is the oldest.
- Reads each input via db-06's
SstReader, converts toVec<(Vec<u8>, Entry)>, feeds all into aMergingIterator, callsSerializeStream, and writes the bytes verbatim to stdout. - Exit code: 0 success, 1 input error, 2 usage error.
Implementations:
- Rust: src/rust/src/bin/merge_iter.rs
- Go: src/go/cmd/merge_iter/main.go
- C++: src/cpp/src/merge_iter_bin.cc
Acceptance
Run the cross-test:
bash scripts/cross_test.sh
It must:
- Print
match:lines with sha256s that are the same for all three languages (in bothdrop=falseanddrop=truemodes). - Confirm via hex spot-check that
NEW-10,OLD-50, andval99are present in the stream. - Confirm the key5 tombstone framing (
040000006b65793501) appears withdrop=falseand is absent withdrop=true. - End with
CROSS-TEST OK.
Captured truth (current run):
drop=false → f693c483ef39dfef8e6285e29f9051a57e60bf2c4ba7b45bbf552c7932687fd1 (1874 bytes)
drop=true → ec71c56c89f451d33e58697af2d7bce985069078e1c599cc42062dfbba6e250e (1865 bytes)
Discussion prompts
- Why pipe the binary stream into
sha256sumrather than diff the entry list? (A bytewise hash catches all serialization differences with a single number; it is the strongest possible equivalence test.) - The drop=true output is exactly 9 bytes shorter than drop=false. Where do
those 9 bytes go? (
u32_le(4) + "key5" + u8(1)= 4+4+1 = 9 — one tombstone frame.) - If you wanted to add a new entry kind (say, a "merge-add" delta), what
would you change in the serialization? (Pick a new type byte (e.g. 2),
decide its payload framing, document it, and update all three
languages'
SerializeStreamand CLI in lockstep.)