db-07 Step 3 — CLI and cross-language byte-identity
CLI shape (all three languages emit and accept the same)
compact [--drop-tombstones] OUT.sst IN1.sst IN2.sst ...
Arguments:
--drop-tombstones: optional first flag. If present, tombstones are dropped (use when this is the bottom-level compaction).OUT.sst: output file path.IN1.sst ...: one or more input SSTable paths. IN1 is the newest.
Exit codes:
0: success.1: any error (open failure, malformed SSTable, write failure).2: usage error.
The CLI is intentionally minimal. There is no JSON, no stats, no progress. Stats live in db-22 (performance + benchmarking).
The cross-test scenario
The script in scripts/cross_test.sh:
- Builds
feed_newer.mt(memtable scenario from observation.md, 50 keys with key10 replaced and key5 deleted). - Builds
feed_older.mt(100 keys with key50 = "OLD-50"). - Promotes both to SSTables using the db-06 Rust binary
(
sstable build feed_newer.mt newer.sst). - For each language, runs
compact OUT.sst newer.sst older.sst. - Asserts
sha256(rust.OUT) == sha256(go.OUT) == sha256(cpp.OUT). - Runs the 3×3 read matrix using db-06's
sstable iterover each OUT. - Spot-checks
sstable get OUT.sst <key>for key5, key10, key50, key99, nope.
The spot-checks use db-06's sstable CLI (not db-07's compact), which is
why steps 5–7 don't need a separate db-07 reader: the output is a db-06
SSTable.
Why this proves the merge
Two SSTables with overlapping keys, where some overlaps prefer the newer
version and one (key50) is unique to the older. If your merge logic gets the
recency tiebreaker wrong, you read val10 instead of NEW-10. If you forget
to drain duplicates, you write the same key twice and SstWriter::add throws.
If you drop tombstones by mistake, key5 disappears.
If all three languages get the same sha256, the algorithm and its translation to three runtimes are pinned down.