Step 03 — CLI and Cross-Language Test

CLI surface

sstable build  IN.mt OUT.sst        # MemTable dump in → SSTable out
sstable footer FILE.sst              # prints footer values + magic_ok
sstable get    FILE.sst KEY          # value: <hex> | tombstone | absent
sstable iter   FILE.sst              # V <hex-key> <hex-value> | T <hex-key>
sstable size   FILE.sst              # file_bytes=B entries=N num_blocks=K

The hex-encoding and value: / tombstone / absent strings match db-05 so the cross-test reuses the same comparison logic.

Cross-test scenario

Identical input across all three languages:

memtable new        M.mt
memtable bulk       M.mt 100
memtable put        M.mt key50 REPLACED
memtable del        M.mt key10
memtable put        M.mt ""     empty-key-value
memtable del        M.mt key99
sstable  build      M.mt OUT.sst

Cross-test checks:

  1. Byte identity. sha256 of OUT.sst matches across rust / go / c++. (Same input MemTable dump + same writer rules ⇒ same bytes.)
  2. 3×3 iter matrix. Every reader can iterate every writer's output, producing identical line-by-line dumps.
  3. 3×3 footer parse. sstable footer OUT.sst from every reader on every writer's output reports the same index_offset / index_size / num_blocks and magic_ok=true.
  4. Spot-check get. For each language: get key50value: 5245504c41434544, get key10tombstone, get ""value: 656d7074792d6b65792d76616c7565, get nopeabsent.
  5. Iter equivalence vs MemTable. sstable iter OUT.sst matches memtable iter M.mt byte-for-byte (the SSTable preserves the sorted entry stream, including tombstones).

Block-boundary check

With 100 small entries (key0..key99val0..val99, encoded ≈ 16 bytes each), a 4096-byte block target produces roughly 100 / (4096 / 16) ≈ 1 data blocks but with the +9 overhead per entry it lands at 1 or 2 blocks. The cross-test asserts only that num_blocks ≥ 1 and that every reader agrees on the count.

A separate sub-test forces a small block target (64 bytes) on identical input across the three languages and asserts the resulting num_blocks value matches; this is the precise boundary-rule check.

Output formats (exact strings)

CommandFormat
footerindex_offset=<N> index_size=<N> num_blocks=<N> magic_ok=<true|false>
getvalue: <hex> | tombstone | absent
iter valueV <hex-key> <hex-value>
iter tombstoneT <hex-key>
sizefile_bytes=<N> entries=<N> num_blocks=<N>

The cross-test scripts diff these as plain text.