Broader Ideas

Beyond this lab

The cost model here estimates a single number per predicate. A real optimiser must:

Estimate cardinality and CPU/IO cost per operator.
Compose costs across operators (a Filter after an IndexScan costs scan-rows × predicate-eval-cost).
Handle join ordering (System R / Selinger DP).
Handle correlated predicates (x = 1 AND y = 1 where x and y are correlated — uniform-independence is the standard wrong assumption).

See Leis et al. 2015 — bad cardinality estimates dominate bad cost models.

Move from a single-pass rule-based planner to a search over plan-space:

Hash index — O(1) point lookup, no range.
Bitmap index — efficient AND/OR of many low-cardinality predicates; great for analytics, bad for OLTP.
Covering index — include extra columns to skip the heap read entirely (index-only scan).
Partial index — WHERE x > 100 predicate baked into the index, smaller but only usable when the predicate is implied by the query.
Functional index — index on lower(name) rather than name.
Multi-column index — order matters; left-most prefix rule.

A real optimiser maintains:

Histograms (equi-depth, equi-width, or compressed) per column for range selectivity.
NDV (number of distinct values) per column for equality selectivity.
Correlation metrics between column pairs.
MCVs (most common values) for skewed distributions.

These need to be refreshed periodically — ANALYZE in PostgreSQL, sqlite_stat1 table in SQLite.

Pointer chasing through an in-memory B-tree is bound by L2 cache misses. Cache-oblivious B-trees and trie-based indexes (ART, HOT) reduce that.
SSDs make sequential scan competitive with random index reads up to surprisingly high selectivity (~10 % of the table).
GPUs and vector instructions favour columnar storage + vectorised scans over row-at-a-time indexing.

Add a third index type — a hash index — and let the planner compare estimates across index families (O(1) hash beats O(log N) Btree on pure equality, ties broken by index size).
Add a nestedloop_join operator and extend the cost model so the planner picks build vs probe side.
Add a tiny ANALYZE step that snapshots distinct_keys and lets Planner::estimate consult cached stats instead of walking the index each call.