db-09 step 03 — CLI and cross-language byte-identity
Goal
Build a dbctl --dir DIR CLI in all three languages that reads commands
from stdin, then assert via sha256 that all three produce byte-identical
output for the same canonical script — including after a crash/recover
cycle.
CLI contract
Each line of stdin is one of:
# comment (ignored)
PUT <key> <value> # whitespace-delimited (no spaces inside)
DEL <key>
FLUSH
DUMP # write serialize_view(drop_tombstones=true) to stdout
DUMP_WITH_TOMBS # write serialize_view(drop_tombstones=false) to stdout
Blank lines and lines starting with # are ignored.
DUMP and DUMP_WITH_TOMBS write raw bytes (no trailing newline) so that
sha256 over stdout is a pure function of the database state.
Tasks
- Build
dbctlin Rust (src/rust/src/bin/dbctl.rs), Go (src/go/cmd/dbctl/main.go), and C++ (src/cpp/src/dbctl.cc). - Write
scripts/cross_test.shthat:- Builds all three binaries.
- Creates one canonical command script that exercises multi-flush, overwrites that land in newer SSTables, tombstones, and a non-empty WAL tail.
- For each language: pipes the script into
dbctl --dir db-LANG(which fully writes and closes), then reopens the directory and pipesDUMP(and separatelyDUMP_WITH_TOMBS) into a file. - Computes sha256 over each dump file; asserts all three match.
- Spot-checks the rust DUMP stream hex for the expected post-recovery key-value bytes to guard against silent-empty regressions.
- Write
scripts/verify.shthat runs unit tests in all three languages.
Acceptance
$ scripts/verify.sh
=== rust === ... ok
=== go === ... ok
=== cpp === ... ok
=== OK ===
$ scripts/cross_test.sh
...
match(DUMP): 7d1568c7...
match(DUMP_TOMBS): 27e3d256...
spot-checks ok
=== ALL OK ===
A byte-identical DUMP after reopen is a near-proof of correctness for the entire encode → flush → MANIFEST → recover → merge → serialize pipeline across three independent implementations.
Discussion prompts
- Why force a close+reopen between the writes and the DUMP, instead of dumping from the same process?
- Why is
DUMP(without tombstones) sufficient on its own not a sound proof? What doesDUMP_WITH_TOMBSadd? - If the three sha256s ever diverge, which lab's format is the most probable culprit, and why?