Broader Ideas
Beyond this lab
Cost-based optimisation
The cost model here estimates a single number per predicate. A real optimiser must:
- Estimate cardinality and CPU/IO cost per operator.
- Compose costs across operators (a Filter after an IndexScan costs scan-rows × predicate-eval-cost).
- Handle join ordering (System R / Selinger DP).
- Handle correlated predicates (
x = 1 AND y = 1wherexandyare correlated — uniform-independence is the standard wrong assumption).
See Leis et al. 2015 — bad cardinality estimates dominate bad cost models.
Cascades / Volcano-style search
Move from a single-pass rule-based planner to a search over plan-space:
- Represent equivalent plan trees with a memo table.
- Apply transformation rules (push filter below scan, merge filters).
- Score each candidate; pick lowest-cost.
- This is what SQL Server, CockroachDB, Apache Calcite do.
Index variants
- Hash index — O(1) point lookup, no range.
- Bitmap index — efficient AND/OR of many low-cardinality predicates; great for analytics, bad for OLTP.
- Covering index — include extra columns to skip the heap read entirely (index-only scan).
- Partial index —
WHERE x > 100predicate baked into the index, smaller but only usable when the predicate is implied by the query. - Functional index — index on
lower(name)rather thanname. - Multi-column index — order matters; left-most prefix rule.
Statistics
A real optimiser maintains:
- Histograms (equi-depth, equi-width, or compressed) per column for range selectivity.
- NDV (number of distinct values) per column for equality selectivity.
- Correlation metrics between column pairs.
- MCVs (most common values) for skewed distributions.
These need to be refreshed periodically — ANALYZE in PostgreSQL,
sqlite_stat1 table in SQLite.
Adaptive query execution
- Spark / Snowflake re-plan at operator boundaries based on observed row counts.
- PostgreSQL has parallel-plan re-decisions; Vertica re-optimises mid-query.
Hardware angles
- Pointer chasing through an in-memory B-tree is bound by L2 cache misses. Cache-oblivious B-trees and trie-based indexes (ART, HOT) reduce that.
- SSDs make sequential scan competitive with random index reads up to surprisingly high selectivity (~10 % of the table).
- GPUs and vector instructions favour columnar storage + vectorised scans over row-at-a-time indexing.
What I'd build next
- Add a third index type — a hash index — and let the planner compare
estimates across index families (
O(1)hash beatsO(log N)Btree on pure equality, ties broken by index size). - Add a
nestedloop_joinoperator and extend the cost model so the planner picks build vs probe side. - Add a tiny
ANALYZEstep that snapshotsdistinct_keysand letsPlanner::estimateconsult cached stats instead of walking the index each call.