db-08 — Broader Ideas

What this lab teaches that goes beyond storage

The two primitives in this lab — bounded caches and tournament merges — are load-bearing in every layer of computing, not just databases.

Bounded caches

  • CPU caches (L1/L2/L3) implement set-associative LRU/PLRU in hardware with the same hash-map-plus-recency-order shape, just expressed in gates.
  • Page tables and TLBs are caches over the virtual-to-physical mapping; they share LRU's vulnerability to large scans.
  • HTTP caches (CDN edges, browser caches) cache responses keyed on URL with the same eviction problem and many of the same policies (LRU, LFU, TinyLFU, S3-FIFO).
  • Compiler caches (ccache, sccache, Bazel's CAS) cache build outputs keyed on the content hash of inputs — same data structure, different key.
  • JIT method caches in V8 and HotSpot cache compiled code; they too evict on capacity pressure.

The pattern is universal: bounded random-access store + recency or frequency order. Once you can implement and analyze LRU, you can swap in LFU, ARC, LRU-K, 2Q, CLOCK-Pro, TinyLFU, S3-FIFO, or W-TinyLFU by replacing the order without changing the index.

K-way merges

  • External sort (sort -m, MapReduce shuffle, Spark's sort-shuffle) is literally a K-way merge of sorted runs, identical in structure to ours.
  • Stream-stream joins (Flink, ksqlDB, Materialize) merge two ordered streams by key with a sliding-window predicate.
  • Time-series databases (Prometheus, InfluxDB, VictoriaMetrics) merge sorted chunks across files, then deduplicate by timestamp — newest-wins, with timestamp as the tie-breaker instead of source index.
  • Git's pack-objects merges sorted delta chains across pack files when serving a fetch.
  • Snapshot iteration in MVCC databases is a merging iterator with a per-key filter that drops versions newer than the snapshot's commit timestamp — exactly what db-13 will build on top of db-08.

"When does this break?"

  • LRU + scans. A long sequential scan pollutes the cache with entries the workload will never see again. Mitigations: scan-resistant policies (LRU-K, ARC), separate cache pools per access pattern, or O_DIRECT bypass.
  • K-way merge with very large K. When K approaches thousands (e.g., a Cassandra node with many SSTables on disk), O(log K) per-entry cost starts to bite. The fix is not a better merger but a compaction policy that keeps K bounded (db-07's job).
  • Tombstones outliving the keys they shadow. A delete-heavy workload produces tombstones faster than compaction can drop them; the merger spends increasing CPU skipping shadowed entries. Cassandra calls this "tombstone hell" and ships a tombstone_warn_threshold.
  • Cache stampede. Many threads simultaneously missing on the same key hammer the underlying storage; production systems add per-key locks ("singleflight" in Go's groupcache).

Extensions worth attempting

  1. Sharded LRU. Replace the single cache with N independent shards keyed on hash(file_id, offset) % N.
  2. TinyLFU admission filter in front of the cache (frequency sketch admits only entries seen more than once).
  3. Block-cache statistics beyond hits/misses/evicts: per-entry size, bytes resident, age histogram, top-N hot blocks.
  4. Bidirectional iterator. Add Prev() to support reverse range scans.
  5. Range-tombstone aware merger. Adding range deletes changes heap-pop semantics: a range tomb shadows a range of point keys.
  6. O(1) amortized doubly-linked list arena (Rust) that interns BlockKey to u32 indices to halve hash map memory.

Where this lab fits in the curriculum

After db-08, every later lab gets a free ride on these primitives:

  • db-09 wires BlockCache and MergingIterator into the LevelDB-complete Get/Scan paths.
  • db-13 (MVCC) layers snapshot-visibility filtering on top of a merging iterator just like this one.
  • db-14 (indexes / query optimization) builds secondary merging iterators for index-scan-then-fetch plans.
  • db-20 (distributed KV store) shards block caches across nodes and adds a network-aware admission policy.