db-09 step 01 — The write path

Goal

Implement Db::open(dir), put(k,v), delete(k), and Write(WriteBatch) such that every successful return has been durably persisted to the WAL.

Tasks

  1. Pick the on-disk layout (MANIFEST, wal.log, sst-NNNNNN.sst).
  2. Define the WriteBatch wire format. Use a single encoder/decoder so the in-memory batch representation and the WAL record payload are identical bytes.
  3. On open(dir):
    • mkdir -p the directory.
    • Read MANIFEST (if it exists) one line at a time; collect SST ids newest-first.
    • Open each SSTable in order; track max_id.
    • Replay wal.log with WalIter: decode each record as a batch and apply to a fresh memtable.
    • Open the WAL for writes; set next_id = max_id + 1.
  4. Implement Write(batch):
    • Reject the empty batch as a no-op (don't write an empty WAL record).
    • bytes = batch.encode(); wal.append(bytes); wal.sync();
    • Apply the batch to the memtable.
  5. put and delete are thin wrappers that build a one-op batch.

Acceptance

Inline unit tests:

  • batch_roundtrip — encode → decode round-trip preserves three representative ops (Put, Delete, Put-with-empty-key).
  • batch_rejects_trailing — decoding rejects a one-byte-suffix-corrupted payload.
  • put_get_memtableput("a","1") followed by get("a") returns Some("1"); get("missing") returns None.
  • delete_shadowsput then delete makes get return None.

All four green in Rust, Go, and C++.

Discussion prompts

  • Why must wal.sync() happen before applying to the memtable, not after?
  • What invariant would break if we let Write proceed for an empty batch?
  • How would a group-commit optimization preserve the same durability guarantee while batching multiple Write calls behind a single fsync?