db-09 step 03 — CLI and cross-language byte-identity

Goal

Build a dbctl --dir DIR CLI in all three languages that reads commands from stdin, then assert via sha256 that all three produce byte-identical output for the same canonical script — including after a crash/recover cycle.

CLI contract

Each line of stdin is one of:

# comment (ignored)
PUT  <key>  <value>      # whitespace-delimited (no spaces inside)
DEL  <key>
FLUSH
DUMP                     # write serialize_view(drop_tombstones=true) to stdout
DUMP_WITH_TOMBS          # write serialize_view(drop_tombstones=false) to stdout

Blank lines and lines starting with # are ignored.

DUMP and DUMP_WITH_TOMBS write raw bytes (no trailing newline) so that sha256 over stdout is a pure function of the database state.

Tasks

  1. Build dbctl in Rust (src/rust/src/bin/dbctl.rs), Go (src/go/cmd/dbctl/main.go), and C++ (src/cpp/src/dbctl.cc).
  2. Write scripts/cross_test.sh that:
    1. Builds all three binaries.
    2. Creates one canonical command script that exercises multi-flush, overwrites that land in newer SSTables, tombstones, and a non-empty WAL tail.
    3. For each language: pipes the script into dbctl --dir db-LANG (which fully writes and closes), then reopens the directory and pipes DUMP (and separately DUMP_WITH_TOMBS) into a file.
    4. Computes sha256 over each dump file; asserts all three match.
    5. Spot-checks the rust DUMP stream hex for the expected post-recovery key-value bytes to guard against silent-empty regressions.
  3. Write scripts/verify.sh that runs unit tests in all three languages.

Acceptance

$ scripts/verify.sh
=== rust === ... ok
=== go   === ... ok
=== cpp  === ... ok
=== OK ===

$ scripts/cross_test.sh
...
  match(DUMP):       7d1568c7...
  match(DUMP_TOMBS): 27e3d256...
  spot-checks ok
=== ALL OK ===

A byte-identical DUMP after reopen is a near-proof of correctness for the entire encode → flush → MANIFEST → recover → merge → serialize pipeline across three independent implementations.

Discussion prompts

  • Why force a close+reopen between the writes and the DUMP, instead of dumping from the same process?
  • Why is DUMP (without tombstones) sufficient on its own not a sound proof? What does DUMP_WITH_TOMBS add?
  • If the three sha256s ever diverge, which lab's format is the most probable culprit, and why?