db-11 — References

Primary sources

  • SQLite Pager design notes — the cleanest public description of a production pager, including how it interacts with rollback journals and WAL. The architecture of the db-11 pager is a deliberate simplification of this design. https://www.sqlite.org/atomiccommit.html https://www.sqlite.org/walformat.html
  • LMDB / mdb design — Howard Chu, MDB: A Memory-Mapped Database and Backend for OpenLDAP. Describes a B+-tree pager whose write path is copy-on-write rather than write-back. Useful counterpoint to the LRU + dirty-bit approach we took. https://www.symas.com/symas-embedded-database-lmdb
  • Goetz Graefe, Modern B-Tree Techniques, Foundations and Trends in Databases 3(4), 2010. Chapter 2 covers buffer-pool management and the page-eviction policies real engines use.

Operating-systems background

  • Andrew S. Tanenbaum, Modern Operating Systems, 4th ed., chapter on file systems and page caches. The OS's own page cache is conceptually our cache; understanding pread/pwrite/fsync at the kernel level explains why "writing" without fsync is not durable.
  • fsync(2) man page — the canonical answer to "what does fsync actually guarantee?" Read this before assuming a write reached disk.
  • Eduardo Pinheiro et al., Failure Trends in a Large Disk Drive Population, FAST 2007. Sobering reminder that the device under the pager does fail; durability is a probabilistic claim.

Replacement policies

  • Elizabeth O'Neil, Patrick O'Neil, Gerhard Weikum, The LRU-K Page Replacement Algorithm For Database Disk Buffering, SIGMOD 1993. Why naive LRU thrashes on scan-heavy workloads, and the fix everyone borrowed.
  • Theodore Johnson, Dennis Shasha, 2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm, VLDB 1994. The 2Q policy used by Postgres and several others.
  • The db-11 implementation deliberately uses textbook LRU. db-22 (performance) will measure when this hurts and what 2Q / CLOCK / ARC buy.

Production engines whose pager you can read

Cross-lab dependencies

  • Upstream: none. The pager is a from-scratch component.
  • Downstream: db-12 (SQL frontend), db-13 (MVCC), db-14 (indexes), db-15 (SQLite-complete), db-20 (distributed KV) all store state on top of a pager file in this format.