O'Neil, O'Neil, Weikum — "The LRU-K Page Replacement Algorithm For
Database Disk Buffering" (SIGMOD 1993). The canonical "LRU is not the
whole story" paper; explains why LRU under-performs LRU-K on database
workloads.
LevelDB util/cache.cc — the reference shardless LRU used by LevelDB.
Doubly-linked list + hash table; reads update recency on hit. Worth reading
end-to-end; ~300 lines.
https://github.com/google/leveldb/blob/main/util/cache.cc
RocksDB cache/lru_cache.{h,cc} and cache/clock_cache.{h,cc} —
production-grade sharded LRU plus a clock-based variant. Demonstrates the
shard-by-key-hash technique.
https://github.com/facebook/rocksdb/tree/main/cache
CockroachDB Pebble internal/cache/ — a Go implementation with a
modern API; useful for comparing language ergonomics.
https://github.com/cockroachdb/pebble/tree/master/internal/cache
Postgres src/backend/storage/buffer/ — the canonical relational
buffer pool: clock-sweep replacement with usage counts. Different policy,
same role.
Knuth, TAOCP Vol. 3 §5.4.1 — "Multiway Merging and Replacement
Selection". The original analysis of K-way merge using a tournament tree
and a loser tree.
LevelDB table/merger.cc and table/iterator.h — the canonical
read-path merging iterator interface, plus the heap-based merger that
combines memtable + level-0 + level-N+ iterators.
https://github.com/google/leveldb/blob/main/table/merger.cc
RocksDB table/merging_iterator.{h,cc} — extended with range tombstones
and pinned iterators. Shows how the interface evolves under production
pressure.
Pebble internal/manifest/level_iter.go + merging_iter.go — a Go
flavor with explicit handling of range deletes.
Designing Data-Intensive Applications, Ch. 3 ("Storage and Retrieval"),
pp. 70-89. Kleppmann's tour of LSM read amplification, bloom filters,
and the role of the block cache.
Petrov, "Database Internals", Ch. 7 ("Log-Structured Storage"). Covers
caching, iterators, and tombstone semantics at the level we implement.
The canonical byte layout used by merge_iter is documented in
src/rust/src/lib.rs on SerializeStream. It is
deliberately minimal — its only job is to give us a byte-identical
cross-language fingerprint for the sha256 check.
The cross-test reuses the same newer.mt/older.mt feedstock as
db-07 so the two labs can
be compared side-by-side. Their sha256s will differ because db-07 emits a
full SSTable (with index, footer, padding) while db-08 emits a flat
entry-stream, but the underlying ordering is identical.