db-09 step 02 — Flush and recovery
Goal
Implement Flush() and recovery such that crashes between any two file
operations never produce an inconsistent live set.
Tasks
-
Implement
Flush()as the strict sequence:- If memtable is empty, return.
- Allocate
id = next_id; next_id += 1. - Build an
SstWriterfrommemtable.sorted(). For each entry, mapEntryType::Value→Value (with bytes) andEntryType::Tombstone→ Tombstone (empty value). - Write
sst-<id>.sst.tmpdurably (open + write + fsync). renameit tosst-<id>.sst.- Prepend
(id, SstReader)to the in-memorysstslist (newest first). - Rewrite
MANIFESTatomically: writeMANIFEST.tmpdurably (oneL0 <id>line per live SST, newest first), thenrenametoMANIFEST. - Close the WAL, remove
wal.log, reopen the WAL. - Replace
memtablewith an empty one.
-
Verify the recovery sequence implemented in step 01 still satisfies the crash matrix:
Crash between … Effect on next open step 4 and 5 leftover *.tmpfile, ignored on next openstep 5 and 7 leftover unlisted SST file, ignored on next open step 7 and 8 replayed WAL re-applies writes that are also in the latest SST — idempotent because MemTable::putis overwritestep 8 and 9 impossible — both are in-memory only after this point
Acceptance
Inline unit tests:
flush_creates_sst— afterFlush(), memtable empty andLiveSstIds().len() == 1; reads still work.flush_then_recover—Flush(), dropDb, reopen, reads still return the flushed values.wal_replay— without flushing, dropDb, reopen; memtable has the pre-crash writes.newest_sst_wins— two flushes with overlapping keys; the value from the newer flush is returned.recovery_after_flush_plus_wal— mix: flush, then write more (tombstones + puts) without flushing, drop, reopen; reads reflect both the flushed and the WAL-only writes correctly.
All five green in Rust, Go, and C++.
Discussion prompts
- Why prepend instead of append to the
sstslist? - Why is it safe to truncate the WAL even when the new MANIFEST may not yet
be
fsync'd to its parent directory? - What would change if step 7 used an edit log (append a "+id" record) instead of rewriting the whole file?