db-09 step 02 — Flush and recovery

Goal

Implement Flush() and recovery such that crashes between any two file operations never produce an inconsistent live set.

Tasks

  1. Implement Flush() as the strict sequence:

    1. If memtable is empty, return.
    2. Allocate id = next_id; next_id += 1.
    3. Build an SstWriter from memtable.sorted(). For each entry, map EntryType::Value→Value (with bytes) and EntryType::Tombstone→ Tombstone (empty value).
    4. Write sst-<id>.sst.tmp durably (open + write + fsync).
    5. rename it to sst-<id>.sst.
    6. Prepend (id, SstReader) to the in-memory ssts list (newest first).
    7. Rewrite MANIFEST atomically: write MANIFEST.tmp durably (one L0 <id> line per live SST, newest first), then rename to MANIFEST.
    8. Close the WAL, remove wal.log, reopen the WAL.
    9. Replace memtable with an empty one.
  2. Verify the recovery sequence implemented in step 01 still satisfies the crash matrix:

    Crash between …Effect on next open
    step 4 and 5leftover *.tmp file, ignored on next open
    step 5 and 7leftover unlisted SST file, ignored on next open
    step 7 and 8replayed WAL re-applies writes that are also in the latest SST — idempotent because MemTable::put is overwrite
    step 8 and 9impossible — both are in-memory only after this point

Acceptance

Inline unit tests:

  • flush_creates_sst — after Flush(), memtable empty and LiveSstIds().len() == 1; reads still work.
  • flush_then_recoverFlush(), drop Db, reopen, reads still return the flushed values.
  • wal_replay — without flushing, drop Db, reopen; memtable has the pre-crash writes.
  • newest_sst_wins — two flushes with overlapping keys; the value from the newer flush is returned.
  • recovery_after_flush_plus_wal — mix: flush, then write more (tombstones + puts) without flushing, drop, reopen; reads reflect both the flushed and the WAL-only writes correctly.

All five green in Rust, Go, and C++.

Discussion prompts

  • Why prepend instead of append to the ssts list?
  • Why is it safe to truncate the WAL even when the new MANIFEST may not yet be fsync'd to its parent directory?
  • What would change if step 7 used an edit log (append a "+id" record) instead of rewriting the whole file?