db-11 step 02 — LRU cache with write-back

Goal

Add the in-memory page cache on top of step 01. Bounded capacity, LRU eviction, write-back on eviction, dirty bit per frame. After this step the disk is touched only on cache misses, evictions, and flush().

Tasks

  1. Define Frame { pid: u32, data: Vec<u8>, dirty: bool, prev, next } where prev/next are indices into a Vec<Frame> (Rust) or *list.Element (Go) or std::list<Frame>::iterator (C++).
  2. Add to Pager:
    • capacity: usize — set at open().
    • frames: Vec<Frame> — the storage backing the LRU chain.
    • free: Vec<usize> — reusable indices after eviction.
    • map: HashMap<u32, usize> — pid → frame index.
    • head, tail: Option<usize> — MRU and LRU ends.
    • hits, misses: u64 — accounting.
  3. Helpers:
    • promote(idx) — unlink frame from current position, insert at head. No-op if already at head.
    • unlink(idx) — remove frame from the list.
    • evict_tail() — pop the LRU frame; if dirty, pwrite before reusing the slot.
    • admit(pid, data, dirty) — if at capacity, evict_tail first; allocate a frame (from free or push new); insert at head; update map.
  4. Rewrite read(pid):
    • if map[pid] exists: promote, hits += 1, clone, return.
    • else: misses += 1, pread, admit(pid, data, dirty=false), clone, return.
  5. Rewrite write(pid, bytes):
    • if map[pid] exists: overwrite data, set dirty = true, promote.
    • else: admit(pid, bytes, dirty=true)no pread.
  6. Rewrite flush():
    • collect all (pid, frame_idx) where dirty == true.
    • sort by pid ascending.
    • for each, pwrite at pid * page_size, set dirty = false.
    • rewrite page 0 with current num_pages.
    • fsync.

Acceptance

Inline unit tests:

  • cache_hit_does_not_pread — write then read twice; second read produces a cache hit (cache_hits >= 1).
  • eviction_writes_back_dirty — fill cache + 1, evict the oldest frame, drop the pager, reopen, read the evicted pid, bytes match the value written before eviction.
  • eviction_skips_clean_pages — fill cache with only-reads, evict, reopen: file size unchanged (no spurious writes).
  • flush_is_idempotent — flush twice in a row, file bytes identical, both succeed.
  • hits_misses_accounting — for a known sequence of operations, cache_hits + cache_misses equals the number of read calls (writes that hit the cache are not counted as reads).

All three green in Rust, Go, and C++.

Discussion prompts

  • Why does write on a miss not do a pread? When could this be wrong? (Answer: never, as long as the caller writes the whole page. Partial-page writes would need read-modify-write.)
  • Why sort dirty pages by pid before writing them out?
  • What is the worst-case eviction cost, and how could evict_tail amortize fsyncs across many evictions?