Step 03 — CLI and Cross-Language Test
CLI surface
sstable build IN.mt OUT.sst # MemTable dump in → SSTable out
sstable footer FILE.sst # prints footer values + magic_ok
sstable get FILE.sst KEY # value: <hex> | tombstone | absent
sstable iter FILE.sst # V <hex-key> <hex-value> | T <hex-key>
sstable size FILE.sst # file_bytes=B entries=N num_blocks=K
The hex-encoding and value: / tombstone / absent strings match
db-05 so the cross-test reuses the same comparison logic.
Cross-test scenario
Identical input across all three languages:
memtable new M.mt
memtable bulk M.mt 100
memtable put M.mt key50 REPLACED
memtable del M.mt key10
memtable put M.mt "" empty-key-value
memtable del M.mt key99
sstable build M.mt OUT.sst
Cross-test checks:
- Byte identity. sha256 of
OUT.sstmatches across rust / go / c++. (Same input MemTable dump + same writer rules ⇒ same bytes.) - 3×3 iter matrix. Every reader can iterate every writer's output, producing identical line-by-line dumps.
- 3×3 footer parse.
sstable footer OUT.sstfrom every reader on every writer's output reports the sameindex_offset/index_size/num_blocksandmagic_ok=true. - Spot-check get. For each language:
get key50→value: 5245504c41434544,get key10→tombstone,get ""→value: 656d7074792d6b65792d76616c7565,get nope→absent. - Iter equivalence vs MemTable.
sstable iter OUT.sstmatchesmemtable iter M.mtbyte-for-byte (the SSTable preserves the sorted entry stream, including tombstones).
Block-boundary check
With 100 small entries (key0..key99 → val0..val99, encoded ≈ 16
bytes each), a 4096-byte block target produces roughly
100 / (4096 / 16) ≈ 1 data blocks but with the +9 overhead per
entry it lands at 1 or 2 blocks. The cross-test asserts only that
num_blocks ≥ 1 and that every reader agrees on the count.
A separate sub-test forces a small block target (64 bytes) on
identical input across the three languages and asserts the resulting
num_blocks value matches; this is the precise boundary-rule check.
Output formats (exact strings)
| Command | Format |
|---|---|
footer | index_offset=<N> index_size=<N> num_blocks=<N> magic_ok=<true|false> |
get | value: <hex> | tombstone | absent |
iter value | V <hex-key> <hex-value> |
iter tombstone | T <hex-key> |
size | file_bytes=<N> entries=<N> num_blocks=<N> |
The cross-test scripts diff these as plain text.