References
Papers
-
Selinger, P. G., Astrahan, M. M., Chamberlin, D. D., Lorie, R. A., & Price, T. G. (1979). Access Path Selection in a Relational Database Management System. SIGMOD '79. The System R paper. Defines cost-based optimisation, dynamic programming over join orders, selectivity estimation. Every modern planner is a variation on this design.
-
Graefe, G. (1994). Volcano — An Extensible and Parallel Query Evaluation System. IEEE TKDE 6(1). The iterator-based "open/next/close" execution model used here. db-14's Executor is a flattened Volcano.
-
Graefe, G. (1995). The Cascades Framework for Query Optimization. Data Eng. Bulletin 18(3). Rule-based + cost-based search via memoisation on plan equivalence classes. Cascades is what SQL Server and CockroachDB derive from.
-
Graefe, G. (2011). Modern B-Tree Techniques. Foundations and Trends in Databases. The reference on B-trees — covers concurrent access, range scans, prefix compression, all relevant to "what an index is".
-
Leis, V., Gubichev, A., Mirchev, A., Boncz, P., Kemper, A., & Neumann, T. (2015). How Good Are Query Optimizers, Really? PVLDB 9(3). Empirical study showing that cardinality estimation errors dwarf cost-model errors; motivates why even very simple planners can be competitive.
Books
-
Hellerstein, J. M. & Stonebraker, M. (eds, 2005). Readings in Database Systems (the "Red Book"), 5th edition. Chapters on query processing and optimisation. Free online.
-
Garcia-Molina, H., Ullman, J. D., & Widom, J. (2008). Database Systems: The Complete Book, 2nd ed. Chapters 15–16 (query processing) and 17 (optimisation).
-
Ramakrishnan, R. & Gehrke, J. (2003). Database Management Systems, 3rd ed. Chapters 12–15 cover indexing and external sorting.
Production system docs
-
SQLite — Query Planner Overview. https://www.sqlite.org/queryplanner.html The Next-Generation Query Planner doc (https://www.sqlite.org/queryplanner-ng.html) describes the N best-N-paths algorithm SQLite uses. A clean read on rule vs cost planning in a real engine.
-
PostgreSQL — Planner/Optimizer. https://www.postgresql.org/docs/current/planner-optimizer.html Authoritative on cost constants, statistics (
pg_statistic), and the GEQO genetic optimiser for large join graphs.
Source code
- SQLite
where.c— single-file implementation of the planner. ~10k LoC of cost-based reasoning over WHERE clauses. The reference. - LevelDB
db/version_set.cc— for a non-SQL planner-style scoring function on file-picking in compaction. - CockroachDB
pkg/sql/opt/— Cascades-style optimiser in Go.