Verification — db-05 LSM MemTable

Unit tests (per language)

IDTest nameWhat it asserts
V1empty_encode_decodeMemTable::new().encode() → 8 bytes MMT1\x00\x00\x00\x00; decode round-trips to an empty table.
V2put_then_getAfter put("k","v"), get("k") returns Value("v").
V3overwrite_replacesTwo puts on the same key keep only the latest value; len() stays at 1.
V4delete_writes_tombstoneAfter put("k","v") then del("k"), get("k") returns Tombstone (not None).
V5iter_byte_lex_orderInsert keys in random order; iteration yields them sorted byte-lex ("" first, \x00 next, etc.).
V6encode_decode_round_tripBuild a 50-entry table with a mix of values and tombstones; encode → decode → every entry matches and len() is preserved.
V7size_bytes_matches_encodeFor any table, size_bytes() == encode().len().
V8decoder_rejects_bad_magicdecode(b"XXX1...") returns Err.
V9decoder_rejects_truncationTruncate a valid dump at every byte boundary; decode must fail cleanly (no panic).
V10decoder_rejects_unsorted_keysHand-craft a dump where keys go ["b","a"]; decoder rejects.

Cross-language interop (scripts/cross_test.sh)

The same scripted scenario runs in each language:

new   → bulk 100 → put "key50" "REPLACED"
                → del "key10"
                → put "" "empty-key-value"
                → del "key99"
                → save

This produces dumps rust.bin, go.bin, cpp.bin. The script then:

  1. SHA-256s all three dumps. All must match — this is the byte-identical gate.
  2. 3×3 reader matrix. Every reader (rust/go/cpp) runs iter on every writer's dump. The lines must be identical across all 9 combinations.
  3. get spot-check. Each reader queries key50, key10, key99, "", and an absent key nonexistent; results must be value: 5245504c41434544 (REPLACED), tombstone, tombstone, value: 656d7074792d6b65792d76616c7565, absent respectively across all readers.

End-to-end verification (scripts/verify.sh)

bash scripts/verify.sh

Builds and tests all three languages, then runs the cross-test. Final line must be ALL GREEN.

Manual sanity checks

  • memtable new /tmp/m && wc -c /tmp/m → exactly 8 bytes.
  • memtable bulk /tmp/m 1000 && memtable size /tmp/m → matches the formula 8 + 1000 * (9 + len("keyN") + len("valN")) summed over N=0..999.
  • Hexdump the first 16 bytes of any dump and confirm magic + count.

What broken looks like

SymptomDiagnostic
decode accepts b"\x00\x00\x00\x00" (no magic check)Add magic test V8.
Two readers print different iter output for the same dumpEither type-byte misplaced, or one language is comparing by string instead of bytes (UTF-8 vs raw).
len() differs across langs after the same scriptGo's map+sort path lost a duplicate; check overwrite path.
Dump grows monotonically after delTombstone path is creating a new entry under a different key; check key equality.
Random crash in C++ on decode of truncated inputMissing length check before memcpy; bounds-check every read.